Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add --term-delay option to CLI #2494

Merged
merged 6 commits into from
May 24, 2022
Merged

Feature: Add --term-delay option to CLI #2494

merged 6 commits into from
May 24, 2022

Conversation

donna-156
Copy link
Contributor

@donna-156 donna-156 commented May 16, 2022

What this PR does / why we need it:
This PR adds a --term-delay option to the CLI for the ingress-controller.

Context:
There are cases where due to network load balancer (NLB) configuration that a delay needs to be introduced to the containers, this delay is configurable for the Kong Proxy container through the helm chart, however it is not possible to configure this delay for the ingress-controller. This can result in stale upstreams during NLB draining and downtime for services. Making this delay configurable seems the easiest approach to addressing this limitation.

With the ingress-controller gone, any changes to the underlying services that the proxy is routing to would be missed. Due to limitations with AWS NLB deregistration delays, we've had to configure a sleep of up to 4 minutes for the proxy container before shutting down. This currently means when we are scaling down, or rolling Kong pods there is a 3-4 minute window where the proxy container is running with no ingress-controller to monitor for updates with our underlying services.

Impact: This can result in stale upstreams, and errors for clients during this window. 馃槥

This PR makes multiple changes:

  1. Adds a new --term-delay argument to the CLI
    This makes it possible for users to configure the SIGTERM delay value by passing a CLI argument, or an environment variable. The default is set to zero so it won't impact existing users.

  2. Adjusts where the SignalHandler is called:
    Originally this was in the entry point to the controller manager, however in order to pass through the CLI argument and make the term-delay configurable, this needed moved down into the rootcmd package.

  3. Supports adding a delay to the SignalHandler
    This takes the original SetupSignalHandler, and modifies the logic to support setting a time delay. It should still support overriding that time-delay by sending a second signal. This involved copying and altering the existing k8s controller-runtime signal package.

Special notes for your reviewer:

Alternative option considered:

  • Add a lifecycle value for the ingress-controller container to kong/charts so that we can pass a preStop command similar to the proxy container. Naming this would be hard because the existing value is kong.lifecycle rather than kong.proxy.lifecycle. It would also require a sleep binary in the proxy container which doesn鈥檛 currently exist from the distroless/static base image - we could mount one from an initContainer but it would be fiddly.

Testing
I've tested this locally, running with the following commands:

go run -tags gcp ./internal/cmd/main.go \
--kubeconfig ~/.kube/config \
--publish-service=kong/kong-proxy \
--apiserver-host=http://localhost:8002 \
--kong-admin-url https://localhost:8444 \
--kong-admin-tls-skip-verify blah \
--term-delay 10s

From this I've been able to exercise trying to CTRL + C to kill the process, and validated that the delay is working as expected. I've also validated that by sending a second signal it does exit with code 1.

PR Readiness Checklist:

Complete these before marking the PR as ready to review:

  • the CHANGELOG.md release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR

This will make it possible to pass in a time duration to be used as a
SIGTERM signal delay, which will make it possible to configure if the
ingress-controller should wait before shutting down.
@donna-156 donna-156 requested a review from a team as a code owner May 16, 2022 13:58
@CLAassistant
Copy link

CLAassistant commented May 16, 2022

CLA assistant check
All committers have signed the CLA.

donna-156 and others added 3 commits May 16, 2022 15:30
Context: there are cases where due to network load balancer
configuration that a delay needs to be introduced to the containers,
this delay is configurable for the Kong Proxy container through the helm
chart, however it is not possible to configure this delay for the
ingress-controller. This can result in stale upstreams during NLB
draining and downtime for services. Making this delay configurable seems
the easiest approach to addressing this limitation.

This makes multiple changes:
1. Adjusts where the SignalHandler is called:
Originally this was in the entry point to the controller manager,
however in order to pass through the CLI argument and make the
term-delay configurable, this needed moved down into the rootcmd
package.

2. Supports adding a delay to the SignalHandler
This takes the original SetupSignalHandler, and modifies the logic to
support setting a time delay. It should still support overriding that
timedelay by sending a second signal.
Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution, the PR is looking good!

I have a couple of comments for your considerations, let me know what you think.

internal/cmd/rootcmd/signal.go Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
internal/cmd/rootcmd/signal.go Outdated Show resolved Hide resolved
internal/cmd/rootcmd/signal.go Outdated Show resolved Hide resolved
internal/manager/config.go Outdated Show resolved Hide resolved
internal/manager/config.go Outdated Show resolved Hide resolved
@shaneutt shaneutt self-assigned this May 17, 2022
@shaneutt shaneutt added area/feature New feature or request priority/low labels May 17, 2022
@chazdnato
Copy link
Contributor

Thank you for your contribution, the PR is looking good!

I have a couple of comments for your considerations, let me know what you think.

This is great stuff! Instead of doing it via GH, I'll add all these to my local branch and push a change. I'll also make sure the linter runs successfully locally first (I'm using gofmt, not sure why it's sad). Appreciate the quick and helpful feedback!

Addressed some PR comments on wording/docs
@shaneutt
Copy link
Member

Just checking in?

@donna-156
Copy link
Contributor Author

Hi @shaneutt, thanks for following up! We'll have the remaining updates available shortly, our team has had some vacation time which has resulted in a longer turnaround from us. 馃檪

A second signal can come in during the termination delay, or after the
`cancel()` has run and the rest of the ingress controller's graceful
shutdown process has started. This will log when those take place.

Note the addition of `<-c` for the second case, as we were not correctly
capturing a signal here.
@chazdnato
Copy link
Contributor

Just checking in?

Apologies for the delay! Wanted to ensure I had the necessary helm chart changes tested (a PR in that repo forthcoming). We can rebase / squash these commit as well if you'd like, just let me know your preference.

We believe we have this in the state that we'd like!

chazdnato added a commit to chazdnato/kong-charts that referenced this pull request May 24, 2022
There are cases where due to network load balancer (NLB) configuration
that a delay needs to be introduced to the containers during shutdown.
This delay is configurable for the Kong Proxy container through the helm
chart, however it is not possible to configure this delay for the
ingress controller. This can result in stale upstreams during NLB
draining and downtime for services. Making this delay configurable seems
the easiest approach to addressing this limitation.

We have a PR for the kong-ingress-controller which will [update its
behaviour](Kong/kubernetes-ingress-controller#2494)
so we can configure the termination delay.

This PR adds the ability to set a concomitant value in the helm chart.
Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, thank you for the contribution 馃憤

If you haven't done so before, please do feel free to grab yourself a free Kong contributor's t-shirt (each of you), we give these out once per contributor: https://github.com/Kong/kong/blob/master/CONTRIBUTING.md#contributor-t-shirt

Thank you for your contribution, we appreciate it 馃枛

@shaneutt shaneutt enabled auto-merge (rebase) May 24, 2022 14:42
@shaneutt shaneutt merged commit 7d08771 into Kong:main May 24, 2022
@donna-156
Copy link
Contributor Author

Thank you so much!

Two more questions @shaneutt 馃檪

  1. Do we also need to modify the Kong docs repo? i.e. here: https://github.com/Kong/docs.konghq.com
  2. What is the timeframe for this being available in a release? (both ingress-controller and Kong)

@donna-156 donna-156 deleted the feat/add-term-delay-optional-config branch May 24, 2022 15:43
@shaneutt
Copy link
Member

Thank you so much!

Two more questions @shaneutt slightly_smiling_face

1. Do we also need to modify the Kong docs repo? i.e. here: https://github.com/Kong/docs.konghq.com

Yes please.

2. What is the timeframe for this being available in a release? (both ingress-controller and Kong)

Most likely this is going into v2.4.0 which is expected to release in the next week or two.

chazdnato added a commit to chazdnato/docs.konghq.com that referenced this pull request May 25, 2022
This was introduced in
Kong/kubernetes-ingress-controller#2494 and
should be released as part of v2.4.0
shaneutt added a commit to Kong/charts that referenced this pull request Jun 1, 2022
* feat(ingress) support terminationGracePeriodSeconds

There are cases where due to network load balancer (NLB) configuration
that a delay needs to be introduced to the containers during shutdown.
This delay is configurable for the Kong Proxy container through the helm
chart, however it is not possible to configure this delay for the
ingress controller. This can result in stale upstreams during NLB
draining and downtime for services. Making this delay configurable seems
the easiest approach to addressing this limitation.

We have a PR for the kong-ingress-controller which will [update its
behaviour](Kong/kubernetes-ingress-controller#2494)
so we can configure the termination delay.

This PR adds the ability to set a concomitant value in the helm chart.

* Add PR to CHANGELOG.md (unreleased)

* Update charts/kong/CHANGELOG.md

Co-authored-by: Shane Utt <shane@shaneutt.com>

* Update CHANGELOG.md

Co-authored-by: Shane Utt <shane@shaneutt.com>
lena-larionova added a commit to Kong/docs.konghq.com that referenced this pull request Jun 1, 2022
* Copy 2.3.x directory for upcoming 2.4.x release

* Documenting --term-delay parameter

This was introduced in
Kong/kubernetes-ingress-controller#2494 and
should be released as part of v2.4.0

* Make 2.4.x latest version

* remove duplicate directory for 2.4.x;  move 2.3.x to /src/; create 2.3.x and 2.4.x navigation files

* set 2.4.x filter on term-delay

* backticks for cli flags so they render accurately; capitalization/punctuation

* Fix if_version space rendering

Co-authored-by: lena.larionova <yelena.larionova@gmail.com>
Co-authored-by: Michael Heap <m@michaelheap.com>
Co-authored-by: lena-larionova <54370747+lena-larionova@users.noreply.github.com>
andrewgkew pushed a commit to andrewgkew/charts that referenced this pull request Jun 6, 2022
* feat(ingress) support terminationGracePeriodSeconds

There are cases where due to network load balancer (NLB) configuration
that a delay needs to be introduced to the containers during shutdown.
This delay is configurable for the Kong Proxy container through the helm
chart, however it is not possible to configure this delay for the
ingress controller. This can result in stale upstreams during NLB
draining and downtime for services. Making this delay configurable seems
the easiest approach to addressing this limitation.

We have a PR for the kong-ingress-controller which will [update its
behaviour](Kong/kubernetes-ingress-controller#2494)
so we can configure the termination delay.

This PR adds the ability to set a concomitant value in the helm chart.

* Add PR to CHANGELOG.md (unreleased)

* Update charts/kong/CHANGELOG.md

Co-authored-by: Shane Utt <shane@shaneutt.com>

* Update CHANGELOG.md

Co-authored-by: Shane Utt <shane@shaneutt.com>
lena-larionova added a commit to Kong/docs.konghq.com that referenced this pull request Jun 16, 2022
* Copy 2.3.x directory for upcoming 2.4.x release

* Documenting --term-delay parameter

This was introduced in
Kong/kubernetes-ingress-controller#2494 and
should be released as part of v2.4.0

* Make 2.4.x latest version

* remove duplicate directory for 2.4.x;  move 2.3.x to /src/; create 2.3.x and 2.4.x navigation files

* set 2.4.x filter on term-delay

* backticks for cli flags so they render accurately; capitalization/punctuation

* Fix if_version space rendering

Co-authored-by: lena.larionova <yelena.larionova@gmail.com>
Co-authored-by: Michael Heap <m@michaelheap.com>
Co-authored-by: lena-larionova <54370747+lena-larionova@users.noreply.github.com>
rainest pushed a commit to Kong/docs.konghq.com that referenced this pull request Jun 16, 2022
* Copy 2.3.x directory for upcoming 2.4.x release

* Documenting --term-delay parameter

This was introduced in
Kong/kubernetes-ingress-controller#2494 and
should be released as part of v2.4.0

* Make 2.4.x latest version

* remove duplicate directory for 2.4.x;  move 2.3.x to /src/; create 2.3.x and 2.4.x navigation files

* set 2.4.x filter on term-delay

* backticks for cli flags so they render accurately; capitalization/punctuation

* Fix if_version space rendering

Co-authored-by: lena.larionova <yelena.larionova@gmail.com>
Co-authored-by: Michael Heap <m@michaelheap.com>
Co-authored-by: lena-larionova <54370747+lena-larionova@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants