Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration via annotations #64

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jdharmon
Copy link
Contributor

Resolves #44.

Rule configuration may be overridden by annotations on individual pods. For single-value rules, the configured rule value will be replaced by the annotation value. For multi-value rules, annotations will be added to the configured rule values. See Implemented Rules for available annotations.

For backwards compatibility and minimum possible side effects, I opted to load configured rules as usual. I added a RULES environment variable to load rules explicitly without a configuration. See RULES. I'm open to other ideas for how to load rules, but this was the quickest way forward. There was some discussion in #44 (comment) on this.

Example:

This configuration will load the Unready rule with 5m duration and the Duration rule with no configuration. Only pods decorated with pod-reaper/max-duration will be reaped by the Duration rule.

MAX_UNREADY=5m
RULES=duration,unready

Comment on lines 30 to +45
func (rule *chaos) load() (bool, string, error) {
value, active := os.LookupEnv(envChaosChance)
if !active {
explicit := explicitLoad(ruleChaos)
value, hasDefault := os.LookupEnv(envChaosChance)
if !explicit && !hasDefault {
return false, "", nil
}
chance, err := strconv.ParseFloat(value, 64)
if err != nil {
if !explicit && err != nil {
return false, "", fmt.Errorf("invalid chaos chance %s", err)
}
rule.chance = chance
return true, fmt.Sprintf("chaos chance %s", value), nil

if rule.chance != 0 {
return true, fmt.Sprintf("chaos chance %s", value), nil
}
return true, fmt.Sprint("chaos (no default)"), nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure I understand the desired behavior here, as there are 4 possible cases based on explicit load and default value.

  1. NO explicit load and NO default value -- don't load the rule
  2. NO explicit load but default value -- load the rule (previous behavior) -- this case honors the annotation overrides?
  3. explicit load but NO default value -- load the rule, but the rule would also say "don't reap" unless the pod had an annotation override and the condition for that override was met?
  4. explicit load but default value -- loads the rule and and default value, but honors the annotation overrides.

Wondering if we need a way to determine whether or not the reaper should honor the annotations? I guess the concern is making sure that if the reaper isn't intended to be overwritten that it can't be. Maybe I'm missing that piece here (I'm still looking!)

There might also be a case where explicit loading of the rule is set but there's also a value that isn't parsible as a float (i.e. lizard) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct for all 4 cases. If a rule is loaded (explicitly or with a default value), it will always check for an annotation and use the annotation's value if found. I don't think we need an option to explicitly enable annotations. It should be safe to assume that if a user added a pod-reaper/* annotation, they intend to opt-in to annotation override. If a pod is not annotated, then pod-reaper will behave the same as it does now.

The annotation is parsed in ShouldReap(). If a pod is annotated with pod-reaper/chaos-chance: lizard, a warning will be logged, and the behavior will fall back to default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chaos rule might be a weird edge case since all enabled rules must return true. If you were to enable chaos without a default, it would prevent any pod which was not annotated from being reaped. I would prefer that a pod is reaped if any rule returns true, but that would be a potentially breaking change. We could also implement reap/spare logic like you mentioned in #44 (comment), but I wanted to keep things simple if possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, reaping if any rule returns true would be breaking. If it operator by reaping when any rule returned true there wouldn't be a way to get an "or" function between multiple rules. How I've done the "reaping on any rule = true" situation was to make multiple reapers, each with only one rule.

@brianberzins
Copy link
Collaborator

I'm looking at #45 and wondering why I didn't merge that... I think it was from lack of a code review.
That change is definitely less simple, but after it's made... I think there could be a really clean way forward with this. It'd be an alternate approach (basically skipping loading all together).

Every way I look at this I see crazy edge cases, which makes me really apprehensive. But I think I'm going to take a stab at what it might look like off of #45 (I try stuff like this a lot, and most of it I just throw out!)

@jdharmon jdharmon closed this Jan 21, 2021
@jdharmon jdharmon reopened this Jan 21, 2021
@jdharmon
Copy link
Contributor Author

I like where #45 was going. I had hoped to start with it, but I had an immediate need, so I started with master. I'd be happy to refactor this.

It would be nice to merge #50 as well. I think automated end-to-end tests would really increase the confidence when merging large changes.

@brianberzins
Copy link
Collaborator

pushed up a container image: brianberzins/pod-reaper:alpha-annotation-0121-1

This was based off #45 and has an environment variable called HONOR_ANNOTATIONS that, when set to true (or anything https://golang.org/pkg/strconv/#ParseBool accepts as true) will attempt to honor annotations.

I have some higher level functional tests that I still need to run, but I might not be able to get to that tonight.

@jdharmon
Copy link
Contributor Author

The HONOR_ANNOTATIONS setting is unnecessary. Any rule loaded with a default will use the default if the pod is not annotated. In other words, behavior will be exactly the same as it is today.

@brianberzins
Copy link
Collaborator

I think I figured out why I'm having a hard time with this one... Not from a difficulty to implement perspective, but from a product perspective: the complexity of this feature is higher than anything I every thought pod-reaper would do...

Each time I try to implement this, I end up with something that works for most cases, but there always seems to be one that doesn't fit.

  • What does this change mean for someone that is managing the pod-reaper?
  • What does this change mean for someone managing a pod that is within the sight of the reaper?
  • What if those people are the same, different, or completely unaware of each-other?

If pod annotations aren't opt-in, then the manager of the reaper won't know (outside of logs and a lot of manual comparing) what's being overridden. This would be true for the very first use case of pod-reaper. It was originally designed to cleanup CI jobs running on a shared cluster when they were taking way to long or got stuck in an crash loop. In that case, we did not want anyone to every be able to override it.

But I think there's a more fundamental disagreement with allowing annotations to override. It just splits who's responsible for what. In some cases, the configuration is annotations, the other it's in the environment of the reaper. So when we're using annotations, there are the following things that have to considered for at least 1 (but probably more) rules:

  • the configuration of the reaper
  • the configuration of the pod
  • which one should take precedence (because there are use cases for both)
  • whether or not the pod actually meets the criteria.

I guess, in my head, this is the feature that crossed the line of "simple" to "complex" -- which brings me to another... not great point. It very much looks like I'm the only one maintaining this... and I honestly don't use it my day to day anymore. Changes in jobs has me predominantly working with teams that are behind on the kubernetes adoption train. That's leaving me feeling more and more like I don't have "skin in this game" and shouldn't be making the product level decisions anyway :(

@jdharmon
Copy link
Contributor Author

Even though annotations are always checked, I would argue that they are still opt-in.

  1. Pod owner opts in by annotation a pod.
  2. pod-reaper owner opts in by specifying a rule in RULES to load without default.
  3. If neither 1 or 2, pod-reaper behavior is the same as is today.

The case where the pod and pod-reaper owners are different and the pod-reaper owner does not want to allow override is interesting. If you are using pod reaper with multiple rules activated, then an annotation could allow the pod owner to evade getting killed. In my opinion, the pod owner should get the final say, but I guess that's ultimately a decision for the product owner.

We are running our own fork of pod-reaper with the annotations already compiled in. If you want to leave this open until someone else adopts it, you're not holding us back. If the project is no longer actively being maintained, we may continue adding features in our fork without pushing them back upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow default configuration override with annotations
2 participants