fix: separate out the grace_period for ALB and NLB#4734
fix: separate out the grace_period for ALB and NLB#4734mergify[bot] merged 11 commits intoaws:mainlinefrom
Conversation
|
🍕 Here are the new binary sizes!
|
Codecov Report
@@ Coverage Diff @@
## mainline #4734 +/- ##
============================================
+ Coverage 69.93% 69.95% +0.02%
============================================
Files 284 284
Lines 40782 40856 +74
Branches 272 272
============================================
+ Hits 28519 28581 +62
- Misses 10885 10894 +9
- Partials 1378 1381 +3
... and 1 file with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
| AdditionalRoutingRules []RoutingRule `yaml:"additional_rules"` | ||
| Main RoutingRule `yaml:",inline"` | ||
| TargetContainerCamelCase *string `yaml:"targetContainer"` // Deprecated. Maintained for backwards compatibility, use [RoutingRule.TargetContainer] instead. | ||
| GracePeriod *time.Duration `yaml:"grace_period"` |
There was a problem hiding this comment.
For ALB, grace_period field was under http.healthcheck but grace_period is a service level field, so it should be under http and not http.healthcheck.
Just in case this wasn't discussed before this PR. I recall there was a discussion around this, when grace_period was first introduced in #2576: grace_period is a health-check related configuration, therefore we put in under http.health_check. Although in ECS it is configured as a service attribute, but we tried to model our manifest "in the way that developers think", instead of "in the way how ECS models it".
There was a problem hiding this comment.
I agree with your point, but it is becoming super weird with our current structure if we keep it inside healthcheck.
If we keep it inside healthcheck then:
- Either it has to be under both the main as well as additional routing rules. But it is not a routing rule parameter, so having this parameter in the main or additional rules sounds wrong to me.
- Or it has to be under the only Main routing rule, which is also something that is less descriptive of the purpose of that parameter.
Do let me know your thoughts on this.
There was a problem hiding this comment.
If multiple containers are handling requests routed from an ELB, I'd assume customers would want to set the grace period to accommodate for the container that is the slowest to start 💭
An alternative that we can consider is to get the max of the grace periods for individual rules.
http:
path: 'api'
healthcheck:
grace_period: 1m
additional_rules:
- path: 'user'
healthcheck:
path: 'user'
- path: 'admin'
healthcheck:
path: 'admin'
grace_period: 10m # The grace period is 10m because 10m > 1m.I see a similar logic in this PR where you get the max of nlb and http's grace_periods.
There was a problem hiding this comment.
Yeah that could be another solution, but then we choose max of the grace periods for individual rules. And at the same time if nlb's grace_period is also specified then shall we take max of nlb and http ?
Edit - that makes me wonder if nlb listeners should also have this grace_period parameter?
There was a problem hiding this comment.
I'd imagine we'd want to take the max, if we were to be consistent in the design logic of the "alternative approach".
The pros of the original approach that you purposed by this PR is that it fits nicer with ECS's model. The cons is that it might be too specifically designed for ECS, and misses the abstraction that we've tried to achieve.
For the alternative approach, the pros & cons are the opposite.
I'd also love to hear from the others on this 👍🏼
There was a problem hiding this comment.
As per the discussion with @iamhopaul123 and @efekarakus, here is the summary of what we are going to do:
- We agree that healthcheckgraceperiodseconds should have been load balancer level setting in ECS. But until they change it, we will keep
grace_periodas is for Main as well as Additional Listener Rules in ALB. That means we will havegrace_periodonhttp.Main.HealthChecklevel. So that in the future, whenever ECS changes it, we will open upgrace_periodfor additional listener rules as well. For now, we will hide this setting from Additional Rules in our Copilot documents. - In NLB,
grace_periodwill be introduced onnlb.Listener.HealthChecklevel.grace_periodfrom additional listeners will be hidden and not shown in our docs. - When ALB and NLB both have
grace_periodset, we will throw a specific error saying they can set either property but not both. Same goes with listener rules and listeners as well. If customers accidentally setgrace_periodon Main as well as Additional Listener Rules, then we error out.
Do let me know if there are any concerns around this.
| if strings.Contains(e.firstField, "additional_rules") { | ||
| return fmt.Sprintf(`cannot define "grace_period" in "http.additional_rules[%d].healthcheck.grace_period"`, e.index) | ||
| } else if strings.Contains(e.firstField, "additional_listeners") { |
There was a problem hiding this comment.
This is very confusing because it has nothing to do with the name of errGracePeriodSpecifiedMoreThanOnce. I feel like we need a different error for this case. can we use errFieldMutualExclusive if they specify grace_period in both places and a new error type if they specify grace_period in additional rules?
Looking forward the key is really about "decoupling" (so that we can make changes in one place; easier to maintain; and less error-prone): we should avoid mixing things together just because they belong to the same "topic" but focusing on what they do (functionalities). For example, in this errors file errContainersExposingSamePort is a probably ok because it is specific but we have to do it this way (the error message is specifically about port). Whereas errSpecifiedBothIngressFields is a bad one because the error msg has nothing to do with ingress fields and we should've reused errFieldMutualExclusive (even if we need new recommendation we could make it based on errFieldMutualExclusive)
There was a problem hiding this comment.
yeah that sounds good to me. I have changed error message when customer define grace_period in both ALB and NLB to errFieldMutualExclusive. And renamed existing errGracePeriodSpecifiedMoreThanOnce to errGracePeriodSpecifiedInAdditionalField when customer define grace_period in additional rules or listeners. Do let me know how does that look to you?
efekarakus
left a comment
There was a problem hiding this comment.
thanks Parag! Looks good, not much else to add besides the existing comments.
#4734 This PR adds `grace_period` field to the NLB main listener and also removes this field from the additional listener rules for the ALB. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the Apache 2.0 License.
Previously, ALB and NLB's grace period was set from http's
grace_periodmanifest field. This PR separate out the field such that ALB and NLB have their own fields.For ALB,
grace_periodfield was underhttp.healthcheckbut grace_period is a service level field, so it should be underhttpand nothttp.healthcheck. So we have fixed that as well.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the Apache 2.0 License.