Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add node label selector to Terminator #625

Merged
merged 4 commits into from
Apr 19, 2022
Merged

Conversation

cjerad
Copy link
Contributor

@cjerad cjerad commented Apr 18, 2022

Issue #, if available:

#556 Initial NTH v2 design proposal

Description of changes:

Add matchLabels property to the Terminator spec.

NTH will take no action when an SQS message targets a node that does not have the requisite labels and values.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@cjerad cjerad requested a review from snay2 April 18, 2022 14:00
@cjerad cjerad marked this pull request as ready for review April 18, 2022 14:04
@cjerad cjerad requested a review from a team as a code owner April 18, 2022 14:04
Expect(drainedNodes).To(BeEmpty())
})

It("does not delete the message from the SQS queue", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long do messages stay on the SQS queue? It seems like there could be a failure mode where NTH spends all its time re-processing messages on the queue (because, e.g., the labels didn't match) and isn't able to respond to new messages in time for k8s to gracefully handle a node termination. That's probably uncommon, but it is basically what mechanical-fish described in #494.

Is that a concern with this way of handling node labels? It seems like the philosophy of the terminator is "I only respond to requests that match my parameters; anything else I will leave for someone else to handle". So it may not be possible to prevent that situation. Curious what your thoughts are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Queues have a message retention period setting. A dead-letter queue can also be configured.

Leaving the message in the queue gives another Terminator instance a chance to match and handle it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
V2
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

2 participants