Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specs to make maxRetries configurable for endpoints and route-services #298

Merged
merged 3 commits into from
Dec 20, 2022

Conversation

domdom82
Copy link
Contributor

@domdom82 domdom82 commented Nov 11, 2022

  • A short explanation of the proposed change:

Fixes #285

This is the companion PR for making max-retries configurable in gorouter

  • It needs to be merged together with its companion to make use of the new properties.
  • Gorouter submodule has to be bumped as well
  • A default value for max-retries is set to 3, retaining the original setting
  • For backends, a value of 0 may be set, which implies "retry until success or no more endpoints remain
  • For route services, a minimum value of 1 must be maintained, because there is no endpoint list and thus no natural end to retries.
  • An explanation of the use cases your change solves

See companion PR

  • Instructions to functionally test the behavior change using operator interfaces (BOSH manifest, logs, curl, and metrics)

See companion PR

  • Expected result after the change

See companion PR

  • Current result before the change

See companion PR

  • Links to any other associated PRs

cloudfoundry/gorouter#335

  • I have viewed signed and have submitted the Contributor License Agreement

  • I have made this pull request to the develop branch

  • I have run all the unit tests using scripts/run-unit-tests-in-docker

  • (Optional) I have run Routing Acceptance Tests and Routing Smoke Tests on bosh lite

  • (Optional) I have run CF Acceptance Tests on bosh lite

@domdom82 domdom82 force-pushed the max-retries-configurable-spec branch from af679a1 to 1346d6b Compare November 14, 2022 14:19
@domdom82 domdom82 marked this pull request as ready for review November 14, 2022 14:21
@ameowlia ameowlia self-requested a review November 15, 2022 16:00
@ameowlia ameowlia self-assigned this Nov 15, 2022
@ameowlia
Copy link
Member

Hi @domdom82,

Thank you for submitting this! I am deploying it now to test. I appreciate that you split up backend retries and route-service retries. Good catch.

My main critique is the use of the word "retries". You use the word "retries" here, but what I actually think you mean is "tries" or "attempts". To me, zero retries would imply that gorouter tries once, and then never tries again.

I think there are two options here:

  1. change "retries" to "connection attempts" or something similar. I prefer this option.
    • In this case I think having 0 mean unlimited makes sense.
  2. use the word "retries" and change the code to actually mean retries.
    • make the default for backends 2 (3 attempts total)
    • however then you can't use 0 to mean unlimited (or you could, but it would get rid of the option for 1 try only)

@domdom82
Copy link
Contributor Author

@ameowlia thanks for your review. funny enough, we just had the same discussion and came up with "attempts" instead of "retries". so 👍 to that!

…ices (maxAttempts)

Co-authored-by: Soha Alboghdady <soha.alboghdady@sap.com>
Co-authored-by: Nicolas Regez <reg@algosys.ch>
@nicregez
Copy link
Contributor

@ameowlia We renamed maxRetries to maxAttempts (see today's second commit on the branch). The rename is aligned with gorouter PR-335 of course.

For backend calls, maxAttempts value of "0" means infinite. For route-services, a value of "0" is prevented.

We thoroughly tested the behaviour of gorouter in one of our CF deployments and did not encounter any unexpected behaviour.

May we kindly as you to merge both PRs? Thank you!

@ameowlia ameowlia merged commit 1803ebf into cloudfoundry:develop Dec 20, 2022
domdom82 pushed a commit to domdom82/routing-release that referenced this pull request Jul 12, 2023
This provides a solution to issues where stale routes containing out of date/failing route-service
information are never detected, and ended up causing errors for all app
instances, when the stale route was at index 0 of an endpoint pool.

This was previously fixed with cloudfoundry/gorouter#298, but
that caused issues with loadbalancing fairness, and sticky sessions.

Instead of pruning the endpoint with the failing route-service, to force
it to be re-registered with an up-to-date route-service, gorouter now
treats the route-service as an endpoint-pool property, and updates the
route-service whenever a new registration occurs.

Also adds integration tests to ensure we didn't regress on the behavior
resolved in cloudfoundry#298.

Signed-off-by: Amelia Downs <adowns@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry mechanism insufficient in dropped / missing NATS messages scenario
3 participants