You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
How was this change tested?
Unit testing & deployed.
Sample log:
karpenter-74bdb86d9c-kc77c controller 2022-11-10T15:56:42.861Z INFO controller.inflightchecks Inflight check failed for node ip-192-168-85-164.us-west-2.compute.internal, Can't drain node, pod default/my-shell is not owned {"commit": "f691533-dirty", "node": "ip-192-168-85-164.us-west-2.compute.internal"}
Sample event:
51s Warning FailedInflightCheck node/ip-192-168-85-164.us-west-2.compute.internal Can't drain node, pod default/my-shell is not owned
There is some duplication here as we also report failures to evict. I haven't come to an opinion on if that's ok. I'm leaning towards it's better to have the duplication as this is rate limited to once per 10 minutes per node and also logs at the info level in addition to creating the event.
Warning FailedInflightCheck 2m34s (x2 over 12m) karpenter Can't drain node, pod default/my-shell is not owned
Warning FailedDraining 32s (x7 over 12m) karpenter Failed to drain node, pod default/my-shell does not have any owner references
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
karpenter-74bdb86d9c-kc77c controller 2022-11-10T15:56:42.861Z INFO controller.inflightchecks Inflight check failed for node ip-192-168-85-164.us-west-2.compute.internal, Can't drain node, pod default/my-shell is not owned {"commit": "f691533-dirty", "node": "ip-192-168-85-164.us-west-2.compute.internal"}
The latest from @dewjam is that we shouldn't block on unowned pods. Worth sequencing these PRs? I'm happy either way.
karpenter-74bdb86d9c-kc77c controller 2022-11-10T15:56:42.861Z INFO controller.inflightchecks Inflight check failed for node ip-192-168-85-164.us-west-2.compute.internal, Can't drain node, pod default/my-shell is not owned {"commit": "f691533-dirty", "node": "ip-192-168-85-164.us-west-2.compute.internal"}
The latest from @dewjam is that we shouldn't block on unowned pods. Worth sequencing these PRs? I'm happy either way.
There's no PR for that is there? No issues removing the controllerless checks if that code goes in, but I think this check is worthwhile till it does.
karpenter-74bdb86d9c-kc77c controller 2022-11-10T15:56:42.861Z INFO controller.inflightchecks Inflight check failed for node ip-192-168-85-164.us-west-2.compute.internal, Can't drain node, pod default/my-shell is not owned {"commit": "f691533-dirty", "node": "ip-192-168-85-164.us-west-2.compute.internal"}
The latest from @dewjam is that we shouldn't block on unowned pods. Worth sequencing these PRs? I'm happy either way.
There's no PR for that is there? No issues removing the controllerless checks if that code goes in, but I think this check is worthwhile till it does.
Removed the controller-less pod check since @dewjam 's change went in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Report as events and logs the reasons why a node is stuck terminating or failing to initialize.
Fixes aws/karpenter-provider-aws#2829
Description
How was this change tested?
Unit testing & deployed.
Sample log:
Sample event:
There is some duplication here as we also report failures to evict. I haven't come to an opinion on if that's ok. I'm leaning towards it's better to have the duplication as this is rate limited to once per 10 minutes per node and also logs at the info level in addition to creating the event.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.