Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bosh-dns-adapter/sdcclient: delay retries #81

Merged

Conversation

tlwr
Copy link
Contributor

@tlwr tlwr commented Jun 25, 2020

What

Introduce a delay between bosh-dns-adapter retries, to reduce the risk of denial-of-service

Context

See alphagov/paas-cf#2358 for more context

We experienced pathological retry behaviour from bosh-dns-adapter when talking to the service-discovery-controller

If a request did not return 200 it immediately retried without any delay or backoff

This caused a spike in load that then cascaded to another instance of service-discovery-controller

Adding a 0/500/1000 delay with jitter, to reduce the risk of cascading failure

Checklist

  • Signed the CLA
  • Added tests

We experienced pathological retry behaviour from bosh-dns-adapter when
talking to the service-discovery-controller

If a request did not return 200 it immediately retried without any delay
or backoff

This caused a spike in load that then cascaded to another instance of
service-discovery-controller

Adding a 0/500/1000 delay with jitter, to reduce the risk of cascading
failure

Signed-off-by: toby lorne <toby@toby.codes>
@KauzClay
Copy link
Contributor

Hey @tlwr ,

Thanks for the PR! We ran your branch locally and tests pass, so we'll merge this and run it through our pipelines. We will let you know how that goes, and eventually when it is ready to release.

@KauzClay KauzClay merged commit 0d52098 into cloudfoundry:develop Jun 25, 2020
@tlwr tlwr deleted the bosh-dns-adapter-sdcclient-retry branch June 25, 2020 19:46
tlwr pushed a commit to alphagov/paas-cf that referenced this pull request Jul 7, 2020
silk and cf-networking are released together

cf-networking 2.31 contains:

cloudfoundry/cf-networking-release#80
cloudfoundry/cf-networking-release#81

which were from our recent dns outage

and

cloudfoundry/cf-networking-release#76

which has caused one (1) support ticket in the past

Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
tlwr pushed a commit to alphagov/paas-cf that referenced this pull request Jul 7, 2020
silk and cf-networking are released together

cf-networking 2.31 contains:

cloudfoundry/cf-networking-release#80
cloudfoundry/cf-networking-release#81

which were from our recent dns outage

and

cloudfoundry/cf-networking-release#78

which has caused one (1) support ticket in the past

Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
tlwr pushed a commit to alphagov/paas-cf that referenced this pull request Jul 7, 2020
silk and cf-networking are released together

cf-networking 2.31 contains:

cloudfoundry/cf-networking-release#80
cloudfoundry/cf-networking-release#81

which were from our recent dns outage

and

cloudfoundry/cf-networking-release#78

which has caused one (1) support ticket in the past

Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants