-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exceptions for dealing with image errors on pods #9
base: main
Are you sure you want to change the base?
Add exceptions for dealing with image errors on pods #9
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice way of doing this, I forgot we copy the watcher's resources. A couple minor code tweaks I spotted.
Also, I think right now you're missing the case where ResourceWaiter
was told to watch a pod. You could add a check right here:
You'll want to check if resource.image_pull_error
is True right before that if self.watch_owned
You could also move the error check out of the _check_owned_resources
function and just put it here in the for
loop: https://github.com/RedHatInsights/ocviapy/pull/9/files#diff-028ea95c402cb62a8d41c1f73660fbfbe0fc44359b8a960ffacb4d54a0abaf2fR652
So I think _observe
could look like this:
# update our records for this resource
self.observed_resources[key] = resource
if resource.image_pull_error:
raise StatusError(f"image pull error for resource {resource.key}/{resource.name}")
if self.watch_owned:
# use .copy() in case dict changes during iteration
for _, r in self.watcher.resources.copy().items():
if r.image_pull_error:
raise StatusError(f"image pull error for resource {r.key}/{r.name}")
self._check_owned_resources(r)
Those are all great suggestions. Incorporated them all. I'm going to mess about in the debugger and see if I can make the exceptions nicer as you mentioned |
Looks like
So might be good to print the 'message', 'reason', and to indicate the container name as well? Something like: raise StatusError(f"{reason} error on {resource.key}/{resource.name} (container {container_name}): {message}") |
You already fetch all this info when you loop through all this stuff in |
I feel pretty good about that last push. Simplified / DRY things a bit and better messaging:
|
Saw your feedback after I pushed. Even better error message:
|
for owner_ref in resource.data["metadata"].get("ownerReferences", []): | ||
restype_matches = owner_ref["kind"].lower() == self.restype | ||
owner_uid_matches = owner_ref["uid"] == self.resource.uid | ||
if restype_matches and owner_uid_matches: | ||
# this resource is owned by "self" | ||
previously_observed = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code block is unchanged, just moved into the new _check_status_if_owned
func below for better readability
if self.watch_owned: | ||
# use .copy() in case dict changes during iteration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code block was moved into the _check_owned_resources
func. The only new addition to this block of code is https://github.com/RedHatInsights/ocviapy/pull/9/files#diff-028ea95c402cb62a8d41c1f73660fbfbe0fc44359b8a960ffacb4d54a0abaf2fR666
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a hard time trying to review this code, in part because of lack of familiarity with it . I wish it had tests on it , I was about to request tests along this feature but since the repo doesn't have any ...
The way I tested this was thanks to bonfire
- installing it with this PR's version of ocviapy
seems to show that it is able to detect an Image Pull error ... unfortunately, if there's a pod in the namespace in this state already, it makes a bonfire deploy to crash immediately after creating all the resources (thanks for pointing this out @bsquizz ).
I don't know - I don't feel confident enough to give this a 👍 - so I'd just comment - not explicitly approve, and I'm not requesting changes to add tests simply because the project itself doesn't have tests whatsoever - that doesn't mean though it could use some tests, as it would help for future reviews and feature work.
It is a good point -- if you deploy a bad config and you want to re-deploy over that bad config -- this feature will cause the "wait for ready" steps to immediately raise an exception on that second re-deploy. Need to think through this a bit further ... |
This patch adds exceptions for image pull backoff and image errors. But, at the time of this writing, I haven't adequately confirmed that it works.