-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operators should be able to configure gorouters
to prefer routing to AZ-local backends
#356
Comments
Hi @jrussett, we are considering a very similar change and have the same motivation. One point from our notes that we see as important as well: Load balancing and retry selection can both use the AZ information.
In the near future we also want to follow up on the idea of app-configured load balancing algorithms per route, similar to how AZ information is introduced. This would also encourage easier experimentation with additional load balancing algorithms besides the trusty round-robin and least-connections. |
OK, this will be an interesting discussion. I've had a couple of thoughts on this in the past. The notion of "az-local" vs. "non-az-local" routing should not be a yes/no decision. Why? As @peanball pointed out there may be scenarios where the local AZ as a partial or complete outage, so gathering only endpoints from the local AZ can end up in a list of all broken endpoints that cause dial timeouts for a long time. This will affect latency and availability, because you end up sending a 502 for that particular request, even though there may have been working endpoints in different zones. Also, if only local-AZ endpoints are chosen this may overload those endpoints while others may never receive traffic from that gorouter at all. So, IMHO the setting should be a gradual one between latency and availability:
So, what I mean by "az-local-weight" is a gradient value between 0% and 100% that applies to the endpoints gorouter will use for doing the load balancing. So, if we set this value to e.g. 75% this means out of a 100 requests to this route, 75 will go to the local zone. This will have several advantages:
So, I would go with Option 2 "adding a new spec property" but I would not make it a boolean but a percentage or numerical weight factor instead. |
Hey y'all 👋 @domdom82 I think the advantages that you have laid out make sense and could be compatible with the original direction that our team had decided on. New Spec InterfaceSo maybe the spec interface would look something like this: router.balancing_algorithm:
description: "Algorithm used to distribute requests for a route across backends. Supported values are round-robin and least-connection"
default: round-robin
router.az_local_weight:
description: |
Percentage of requests that should be routed to a backend local to the AZ of the proxying router.
For example, a `router.az_local_weight` of `75` means that out of a 100 requests to a given route, 75 will go to a backend that is in the same availability zone as the router handling the request.
Configurable as an integer value between `0` and `100`; e.g. `25`, NOT `27.456`. Defaults to `0`.
default: 0 Retry Logic
@peanball I agree that this makes sense, especially from the fail-over scenario. Unfortunately, we know there is appetite for the opposite behavior. We know there are people who would like to configure the retry logic to attempt all AZ-local backends before trying any of the other backends. We can revisit this as we begin to implement this feature to see if anything needs to be done or if the AZ-local weights is sufficient. |
Hello again 👋 Thinking about concernsWe've been talking about this internally and had some more thoughts. It sounds like there a few concerns we'd like to make sure we address going forward:
A new possible designWhat if we instead make a new bosh spec property that handles the desires around AZ preferences instead? router.balancing_algorithm:
description: "Algorithm used to distribute requests for a route across backends. Supported values are round-robin and least-connection"
default: round-robin
router.az_preference:
description: |
Configuration option used in conjunction with the `router.balancing_algorithm` to decide from which availability zone to pick a suitable backend. Defaults to "None".
"None" - There is no preference regarding availability zones. The router uses the `router.balancing_algorithm` across all possible backends in all existing AZs.
"Local" - The router will prefer backends in the same availability zone as the router proxying the request. It will use the `router.balancing_algorithm` across all backends in its local AZ. Only if there are no backends available, or there are no backend left after multiple failed retries, in the local AZ, then will the router proxy to backends in other AZs.
"Local-First" - On the initial attempt to pick a backend, the router will use `router.balancing_algorithm` across all backends in the same AZ as the router itself. Subsequent retries, in the case of failure or unavailability, will use _all_ available AZs.
"Cross-AZ-Fallback" - On the initial attempt to pick a backend, there is no preference regarding availability zone. The router uses the `router.balancing_algorithm` across all backends in all AZs. Subsequent retries, in the case of failure or unavailability, will use `router.balancing_algorithm` against all backends in a _different_ AZ than the first attempt (from the rest of the AZs).
default: "None"
Options like What do y'all think? |
I like it in general. A few points:
So maybe we can call "local" -> "local-strict" or "local-only" to emphasize that non-local endpoints will never be used? |
Hi @domdom82,
I agree with both of these things. As for the second point, I agree that we could totally use the weight on the local zone but I think it would be confusing to think about the system when y'all start implementing per-route configuration. At that point, what does the weight become and how do you think about how the weight affects other configurations and inputs to the routing algorithm. I feel like it would become unruly, while it might be easier to more abstractly think about something like: My platform level algorithm is I think you kind of alluded to that in this statement:
I could easily see this platform-level config to be more abstract with the az preference as I've laid out, and then down the road when y'all are working on route-level configs, y'all can add more granular control/config.
Sounds fair 👍
The tricky part is that the non-local endpoints are used, but only after all of the available local endpoints have been exhausted. Maybe it would make sense to change router.az_preference:
description: |
Configuration option used in conjunction with the `router.balancing_algorithm` to decide from which
availability zone to pick a suitable backend. Defaults to "None".
"None" - There is no preference regarding availability zones. The router uses the
`router.balancing_algorithm` across all possible backends in all existing AZs.
"Only-Local-First" - The router will prefer backends in the same availability zone as the router
proxying the request. It will use the `router.balancing_algorithm` across all backends in its
local AZ. Only if there are no backends available, or there are no backends left after multiple
failed retries, in the local AZ, then will the router proxy to backends in other, non-local AZs.
"Locally-Optimistic" - On the initial attempt to pick a backend, the router will use
`router.balancing_algorithm` across all backends in the same AZ as the router itself. Subsequent
retries, in the case of failure or unavailability, will use _all_ available AZs.
default: "None" Names are hard 😅 |
I would vote for simplicity and have 2 options for
With In that case, maybe just have it as a boolean |
@mariash we could get by zonal problems by reducing the dial timeout to a very low value. Usually, wenn you have a zone issue the packets are dropped and you run into connection timeouts. If your dial timeout is high (routing-release default is 5s) and you have lots of instances in the local zone, you quickly run into 30s+ response times. For us, we have reduced the dial timeout to 2s (which is still way high, considering that gorouter usually sits next door to the diego cell, average dial times are a few ms). So we could accept a "local-only" option, however I would still vote for not making it a boolean switch, because that will shut the door for any future extensions in that area. |
So, in general, it sounds like we all could agree on having a new bosh spec named First StartHow about we initially start with the options that make the most logical sense:
Example bosh spec: router.az_preference:
description: |
Configuration option used in conjunction with the `router.balancing_algorithm` to decide from which
availability zone to pick a suitable backend. Defaults to "None".
"None" - There is no preference regarding availability zones. The router uses the
`router.balancing_algorithm` across all possible backends in all existing AZs.
"Locally-Optimistic" - On the initial attempt to pick a backend, the router will use
`router.balancing_algorithm` across all backends in the same AZ as the router itself. Subsequent
retries, in the case of failure or unavailability, will use _all_ available AZs.
default: "None" Requirements Double CheckAt the same time, we will double check the requirements to see if We had some more discussions, were reviewing the retry logic, and figured out that we don't actually know if every single AZ-local endpoint needs to be exhausted before retrying instances in other AZs. My apologies. If Bosh Spec with
|
Thanks for putting this together @jrussett. I like this path forward. That way once you add the |
Hey, @domdom82, would you want to switch any of your environments over to that proposed |
Not dom, but I think I can answer for him: Yes, at the very least we would want to try it to see whether it improves the overall handling of AZ related issues. Whether we end up deploying it to production depends on the outcome of our tests. |
@emalm The one issue we had was with a specific app that had lots of instances in one AZ which failed. So it would have helped to do a quick switch to other AZs after the first one failed. However, we discussed that in general the tcp connect to a host on the same network (gorouter -> diego cell) should be extremely fast (< 50ms order). So we might get by with using "Only-Local-First" in combination with a low dial timeout. This way we would quickly iterate over all the failed local endpoints and tried the remote ones without causing too much latency for the end user. tl; dr |
@jrussett I've closed my original issue on the matter in favor of this one. There is one bit there that may be worthwhile your consideration. @ameowlia brought up a potential overload scenario where app instances are unevenly distributed across zones. The discussion starts here My idea for avoiding a local routing "overload" was outlined here:
e.g. This is a cheap mechanism that prevents us from overloading a single app instance. It should be noted that this is a rather theoretical problem. In most cases, both gorouters and apps are evenly distributed across zones. |
- Adds documentation related to the AZ-local routing feature introduced in [routing-release 0.288.0](https://github.com/cloudfoundry/routing-release/releases/tag/v0.288.0) See this Github issue for more information: - cloudfoundry/routing-release#356 [#186117321](https://www.pivotaltracker.com/story/show/186117321)
This is now available with the following release versions: Documentation changes are now live, see: |
Background / Why
It's possible to set up a foundation in a stretch-cluster architecture, where availability zones can span many geographic miles from each other. The current Cloud Foundry
round-robin
andleast-connection
routing algorithms have thegorouters
forward HTTP requests to ready application backends across all availability zones. In stretched-cluster foundations, having agorouter
send a request to a remote AZ where there is a ready application instance in its local AZ can make the round-trip time for the request to the application less than optimal. We should allow operators to configure theirgorouter
routing algorithms to favor AZ-local application backend before proxying to backends in other AZs.Goals
Anti-goals
Proposed Changes
Configuration Layer
This feature will be configured at the platform level. Once this is toggled on, all HTTP requests to the configured
gorouter
will use the newly configured routing algorithm option.Configuration will only happen in the
gorouter
bosh spec. I see two potential designs:Option 1: Extend current
spec
propertyOption 2: A a new
spec
propertyPotential Implementation
There are two main portions of work to make this feature happen.
1. Advertise the availability zone of the backend in diego
In order to make routing decisions about AZ-local backends, the routing layer needs information about which availability zone a backend is currently running in. Most likely, the diego
route-emitter
will need to change the format of the RegistryMessages that diego emits to accommodate a new optional field with the AZ.The new registry messages might look something like this:
[#32] Received on [router.register] : { "host": "10.0.1.12", "port": 61012, "tls_port": 61014, "uris": [ "proxy.thulianpink.cf-app.com" ], "app": "6856799f-aebf-4e2b-81a5-28c74dfb6162", "private_instance_id": "a0d2b217-fa7d-4ac1-65a2-7b19", "private_instance_index": "0", "server_cert_domain_san": "a0d2b217-fa7d-4ac1-65a2-7b19", + "availability_zone": "my-super-cool-AZ-name", 👈 New optional field! "tags": { "component": "route-emitter" } }
2. Consume AZ information when making a routing decision
Now that the
gorouters
have the AZ information for each backend available, the routing algorithms need to route in clever ways. In general, the basic algorithms themselves are not going to change, instead the set of endpoints that the router operates on will change.When
gorouter
has been configured to favor AZ-local instances, the flow looks something like:The text was updated successfully, but these errors were encountered: