Skip to content

fix: separate out the grace_period for ALB and NLB#4734

Merged
mergify[bot] merged 11 commits intoaws:mainlinefrom
paragbhingre:multipleport_gracePeriod
Apr 25, 2023
Merged

fix: separate out the grace_period for ALB and NLB#4734
mergify[bot] merged 11 commits intoaws:mainlinefrom
paragbhingre:multipleport_gracePeriod

Conversation

@paragbhingre
Copy link
Copy Markdown
Contributor

@paragbhingre paragbhingre commented Apr 5, 2023

Previously, ALB and NLB's grace period was set from http's grace_period manifest field. This PR separate out the field such that ALB and NLB have their own fields. 
For ALB, grace_period field was under http.healthcheck but grace_period is a service level field, so it should be under http and not http.healthcheck. So we have fixed that as well. 

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the Apache 2.0 License.

@paragbhingre paragbhingre requested a review from a team as a code owner April 5, 2023 18:42
@paragbhingre paragbhingre requested review from efekarakus and removed request for a team April 5, 2023 18:42
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 5, 2023

🍕 Here are the new binary sizes!

Name New size (kiB) size (kiB) Delta (%)
macOS (amd) 50552 50328 +0.45
macOS (arm) 50760 50516 +0.48
linux (amd) 44500 44304 +0.44
linux (arm) 42820 42564 +0.60
windows (amd) 41376 41204 +0.42

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 5, 2023

Codecov Report

Merging #4734 (d490361) into mainline (0ecf04e) will increase coverage by 0.02%.
The diff coverage is 82.05%.

@@             Coverage Diff              @@
##           mainline    #4734      +/-   ##
============================================
+ Coverage     69.93%   69.95%   +0.02%     
============================================
  Files           284      284              
  Lines         40782    40856      +74     
  Branches        272      272              
============================================
+ Hits          28519    28581      +62     
- Misses        10885    10894       +9     
- Partials       1378     1381       +3     
Impacted Files Coverage Δ
internal/pkg/manifest/svc.go 67.98% <ø> (ø)
internal/pkg/template/workload.go 52.42% <ø> (ø)
internal/pkg/manifest/errors.go 51.57% <33.33%> (-2.64%) ⬇️
...al/pkg/deploy/cloudformation/stack/transformers.go 87.73% <64.70%> (-0.41%) ⬇️
...nal/pkg/deploy/cloudformation/stack/backend_svc.go 83.98% <100.00%> (ø)
...rnal/pkg/deploy/cloudformation/stack/lb_web_svc.go 82.49% <100.00%> (ø)
internal/pkg/manifest/validate.go 79.91% <100.00%> (+0.56%) ⬆️

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Comment thread internal/pkg/deploy/cloudformation/stack/transformers.go Outdated
Comment thread internal/pkg/manifest/http.go Outdated
AdditionalRoutingRules []RoutingRule `yaml:"additional_rules"`
Main RoutingRule `yaml:",inline"`
TargetContainerCamelCase *string `yaml:"targetContainer"` // Deprecated. Maintained for backwards compatibility, use [RoutingRule.TargetContainer] instead.
GracePeriod *time.Duration `yaml:"grace_period"`
Copy link
Copy Markdown
Contributor

@Lou1415926 Lou1415926 Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ALB, grace_period field was under http.healthcheck but grace_period is a service level field, so it should be under http and not http.healthcheck.

Just in case this wasn't discussed before this PR. I recall there was a discussion around this, when grace_period was first introduced in #2576: grace_period is a health-check related configuration, therefore we put in under http.health_check. Although in ECS it is configured as a service attribute, but we tried to model our manifest "in the way that developers think", instead of "in the way how ECS models it".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your point, but it is becoming super weird with our current structure if we keep it inside healthcheck. 
If we keep it inside healthcheck then:

  1. Either it has to be under both the main as well as additional routing rules. But it is not a routing rule parameter, so having this parameter in the main or additional rules sounds wrong to me. 
  2. Or it has to be under the only Main routing rule, which is also something that is less descriptive of the purpose of that parameter. 

Do let me know your thoughts on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple containers are handling requests routed from an ELB, I'd assume customers would want to set the grace period to accommodate for the container that is the slowest to start 💭

An alternative that we can consider is to get the max of the grace periods for individual rules.

http:
  path: 'api'
  healthcheck:
     grace_period: 1m
  additional_rules:
      - path: 'user'
        healthcheck:
            path: 'user' 
      - path: 'admin'
        healthcheck:
            path: 'admin'
            grace_period: 10m # The grace period is 10m because 10m > 1m.

I see a similar logic in this PR where you get the max of nlb and http's grace_periods.

Copy link
Copy Markdown
Contributor Author

@paragbhingre paragbhingre Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that could be another solution, but then we choose max of the grace periods for individual rules. And at the same time if nlb's grace_period is also specified then shall we take max of nlb and http ?

Edit - that makes me wonder if nlb listeners should also have this grace_period parameter?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd imagine we'd want to take the max, if we were to be consistent in the design logic of the "alternative approach".

The pros of the original approach that you purposed by this PR is that it fits nicer with ECS's model. The cons is that it might be too specifically designed for ECS, and misses the abstraction that we've tried to achieve.

For the alternative approach, the pros & cons are the opposite.

I'd also love to hear from the others on this 👍🏼

Copy link
Copy Markdown
Contributor Author

@paragbhingre paragbhingre Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the discussion with @iamhopaul123 and @efekarakus, here is the summary of what we are going to do:

  1. We agree that healthcheckgraceperiodseconds should have been load balancer level setting in ECS. But until they change it, we will keep grace_period as is for Main as well as Additional Listener Rules in ALB. That means we will have grace_period on http.Main.HealthCheck level. So that in the future, whenever ECS changes it, we will open up grace_period for additional listener rules as well. For now, we will hide this setting from Additional Rules in our Copilot documents. 
  2. In NLB, grace_period will be introduced on nlb.Listener.HealthCheck level. grace_period from additional listeners will be hidden and not shown in our docs. 
  3. When ALB and NLB both have grace_period set, we will throw a specific error saying they can set either property but not both. Same goes with listener rules and listeners as well. If customers accidentally set grace_period on Main as well as Additional Listener Rules, then we error out. 

Do let me know if there are any concerns around this.

Comment thread internal/pkg/manifest/errors.go Outdated
Comment thread internal/pkg/manifest/validate.go Outdated
Comment thread internal/pkg/deploy/cloudformation/stack/transformers.go Outdated
Comment thread internal/pkg/deploy/cloudformation/stack/transformers.go Outdated
Comment thread internal/pkg/manifest/errors.go Outdated
Comment on lines +108 to +110
if strings.Contains(e.firstField, "additional_rules") {
return fmt.Sprintf(`cannot define "grace_period" in "http.additional_rules[%d].healthcheck.grace_period"`, e.index)
} else if strings.Contains(e.firstField, "additional_listeners") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very confusing because it has nothing to do with the name of errGracePeriodSpecifiedMoreThanOnce. I feel like we need a different error for this case. can we use errFieldMutualExclusive if they specify grace_period in both places and a new error type if they specify grace_period in additional rules?

Looking forward the key is really about "decoupling" (so that we can make changes in one place; easier to maintain; and less error-prone): we should avoid mixing things together just because they belong to the same "topic" but focusing on what they do (functionalities). For example, in this errors file errContainersExposingSamePort is a probably ok because it is specific but we have to do it this way (the error message is specifically about port). Whereas errSpecifiedBothIngressFields is a bad one because the error msg has nothing to do with ingress fields and we should've reused errFieldMutualExclusive (even if we need new recommendation we could make it based on errFieldMutualExclusive)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that sounds good to me. I have changed error message when customer define grace_period in both ALB and NLB to errFieldMutualExclusive. And renamed existing errGracePeriodSpecifiedMoreThanOnce to errGracePeriodSpecifiedInAdditionalField when customer define grace_period in additional rules or listeners. Do let me know how does that look to you?

Copy link
Copy Markdown
Contributor

@efekarakus efekarakus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Parag! Looks good, not much else to add besides the existing comments.

Comment thread internal/pkg/deploy/cloudformation/stack/transformers.go Outdated
Comment thread internal/pkg/manifest/validate.go
Comment thread internal/pkg/manifest/errors.go Outdated
Comment thread internal/pkg/manifest/errors.go Outdated
Comment thread internal/pkg/manifest/errors.go Outdated
Comment thread internal/pkg/manifest/errors.go Outdated
Comment thread internal/pkg/deploy/cloudformation/stack/transformers.go Outdated
Comment thread internal/pkg/manifest/errors.go Outdated
Comment thread internal/pkg/manifest/errors.go Outdated
Comment thread internal/pkg/manifest/errors.go Outdated
Comment thread internal/pkg/manifest/validate_test.go Outdated
Comment thread internal/pkg/manifest/validate_test.go Outdated
Comment thread internal/pkg/manifest/validate_test.go Outdated
@Lou1415926 Lou1415926 added the do-not-merge Pull requests that mergify shouldn't merge until the requester allows it. label Apr 20, 2023
@paragbhingre paragbhingre removed the do-not-merge Pull requests that mergify shouldn't merge until the requester allows it. label Apr 24, 2023
Copy link
Copy Markdown
Contributor

@iamhopaul123 iamhopaul123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

mergify Bot pushed a commit that referenced this pull request Apr 26, 2023
#4734
This PR adds `grace_period` field to the NLB main listener and also removes this field from the additional listener rules for the ALB.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the Apache 2.0 License.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants