Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operators should be able to configure gorouters to prefer routing to AZ-local backends #356

Closed
jrussett opened this issue Oct 5, 2023 · 15 comments

Comments

@jrussett
Copy link
Contributor

jrussett commented Oct 5, 2023

Background / Why

It's possible to set up a foundation in a stretch-cluster architecture, where availability zones can span many geographic miles from each other. The current Cloud Foundry round-robin and least-connection routing algorithms have the gorouters forward HTTP requests to ready application backends across all availability zones. In stretched-cluster foundations, having a gorouter send a request to a remote AZ where there is a ready application instance in its local AZ can make the round-trip time for the request to the application less than optimal. We should allow operators to configure their gorouter routing algorithms to favor AZ-local application backend before proxying to backends in other AZs.

Goals

  • Platform-level config. One toggle changes the behavior the group of gorouters and the routing logic to the diego cells behind them.
  • Isolation segment independent config
    • The configuration should only affect associated diego cells and should not override the configuration of other isolated routers and their diego cells

Anti-goals

  • Route-level routing-algorithm configuration

Proposed Changes

Configuration Layer

This feature will be configured at the platform level. Once this is toggled on, all HTTP requests to the configured gorouter will use the newly configured routing algorithm option.

Configuration will only happen in the gorouter bosh spec. I see two potential designs:

Option 1: Extend current spec property
 router.balancing_algorithm:
    description: |
      Algorithm used to distribute requests for a route across backends. Supported values are 'round-robin', 'least-connection', 'az-local-round-robin', and 'az-local-least-connection'.
      The 'az-local' algorithms will attempt to proxy to an application backend local to the availability zone of the gorouter proxying the request, if said backends are available, before proxying the request to backends located in other availability zones.
    default: round-robin
Option 2: A a new spec property
  router.balancing_algorithm:
    description: "Algorithm used to distribute requests for a route across backends. Supported values are round-robin and least-connection"
    default: round-robin
  router.favor_az_local_instances_when_balancing:
    description: "Toggle to modify the logic of the selected router.balancing_algorithm by attempting to proxy to an application backend local to the availability zone of the gorouter proxying the request, if said backend is available, before proxying the request to backends located in other availability zones. Defaults to `false`"
    default: false

Potential Implementation

There are two main portions of work to make this feature happen.

1. Advertise the availability zone of the backend in diego

In order to make routing decisions about AZ-local backends, the routing layer needs information about which availability zone a backend is currently running in. Most likely, the diego route-emitter will need to change the format of the RegistryMessages that diego emits to accommodate a new optional field with the AZ.

The new registry messages might look something like this:
[#32] Received on [router.register] :
{
  "host": "10.0.1.12",
  "port": 61012,
  "tls_port": 61014,
  "uris": [
    "proxy.thulianpink.cf-app.com" 
  ],
  "app": "6856799f-aebf-4e2b-81a5-28c74dfb6162",
  "private_instance_id": "a0d2b217-fa7d-4ac1-65a2-7b19",
  "private_instance_index": "0",
  "server_cert_domain_san": "a0d2b217-fa7d-4ac1-65a2-7b19",
+ "availability_zone": "my-super-cool-AZ-name", 👈 New optional field!
  "tags": {
    "component": "route-emitter"
  }
}
2. Consume AZ information when making a routing decision

Now that the gorouters have the AZ information for each backend available, the routing algorithms need to route in clever ways. In general, the basic algorithms themselves are not going to change, instead the set of endpoints that the router operates on will change.

When gorouter has been configured to favor AZ-local instances, the flow looks something like:
1. Gather all endpoints that exist in the same AZ as the router instance
2. If there is at least one AZ-local endpoint
  → perform the balancing algorithm logic on that set
    → receive the AZ-local backend
3. If there were no AZ-local endpoints
  → get all of the possible endpoints for that app
  → perform the normal balancing algorithm logic
    → receive a backend from any AZ
@peanball
Copy link
Contributor

peanball commented Oct 6, 2023

Hi @jrussett, we are considering a very similar change and have the same motivation.

One point from our notes that we see as important as well:

Load balancing and retry selection can both use the AZ information.

  • For load balancing you'd want to be in the same AZ as proposed.
  • For a retries you might want to favor retrying in a different AZ to avoid any AZ-local issues to affect the retry.

In the near future we also want to follow up on the idea of app-configured load balancing algorithms per route, similar to how AZ information is introduced. This would also encourage easier experimentation with additional load balancing algorithms besides the trusty round-robin and least-connections.

@domdom82
Copy link
Contributor

domdom82 commented Oct 6, 2023

OK, this will be an interesting discussion. I've had a couple of thoughts on this in the past.

The notion of "az-local" vs. "non-az-local" routing should not be a yes/no decision. Why? As @peanball pointed out there may be scenarios where the local AZ as a partial or complete outage, so gathering only endpoints from the local AZ can end up in a list of all broken endpoints that cause dial timeouts for a long time. This will affect latency and availability, because you end up sending a 502 for that particular request, even though there may have been working endpoints in different zones.

Also, if only local-AZ endpoints are chosen this may overload those endpoints while others may never receive traffic from that gorouter at all.

So, IMHO the setting should be a gradual one between latency and availability:


 Availability                                  Latency
   ◄───────────────AZ-Local Weight────────────────►
   0%                    50%                    100%

So, what I mean by "az-local-weight" is a gradient value between 0% and 100% that applies to the endpoints gorouter will use for doing the load balancing.

So, if we set this value to e.g. 75% this means out of a 100 requests to this route, 75 will go to the local zone.
It's basically a weighted algorithm where the weight is determined by the zone of an endpoint.

This will have several advantages:

  • We can keep the existing pool structure, we don't need "virtual pools" that contain only a subset of local endpoints. Makes pruning and retries a lot easier.
  • We still prefer local endpoints, yet keep non-local ones as potential fallback in case the local zone is jeopardized.
  • Only adding weight to endpoints based on some criterion will allow the easier addition of more load balancing algorithms in the future. These algorithms will probably have also have configuration options that will change weights and could be integrated will relative ease. In the end, there could be multiple factors influencing backend weight with the AZ being only one of them.

So, I would go with Option 2 "adding a new spec property" but I would not make it a boolean but a percentage or numerical weight factor instead.

@jrussett
Copy link
Contributor Author

Hey y'all 👋

@domdom82 I think the advantages that you have laid out make sense and could be compatible with the original direction that our team had decided on.

New Spec Interface

So maybe the spec interface would look something like this:

  router.balancing_algorithm:
    description: "Algorithm used to distribute requests for a route across backends. Supported values are round-robin and least-connection"
    default: round-robin
  router.az_local_weight:
    description: |
      Percentage of requests that should be routed to a backend local to the AZ of the proxying router. 
      For example, a `router.az_local_weight` of `75` means that out of a 100 requests to a given route, 75 will go to a backend that is in the same availability zone as the router handling the request.
      Configurable as an integer value between `0` and `100`; e.g. `25`, NOT `27.456`. Defaults to `0`.
    default: 0
Retry Logic

For a retries you might want to favor retrying in a different AZ to avoid any AZ-local issues to affect the retry.

@peanball I agree that this makes sense, especially from the fail-over scenario. Unfortunately, we know there is appetite for the opposite behavior. We know there are people who would like to configure the retry logic to attempt all AZ-local backends before trying any of the other backends. We can revisit this as we begin to implement this feature to see if anything needs to be done or if the AZ-local weights is sufficient.

@jrussett
Copy link
Contributor Author

Hello again 👋

Thinking about concerns

We've been talking about this internally and had some more thoughts. It sounds like there a few concerns we'd like to make sure we address going forward:

  1. AZ choice upon failure
    • In your presented use cases, y'all would like to understandably fail over to other AZs in case the backends for a particular AZ are down or don't exist
    • In at least one of our main use cases, we need to retry and exhaust every single AZ-local backend before failing over to backends in other AZs
  2. Granularity in configuration
    • @domdom82 pushed for being about to set percentage weights of the number of requests that must be routed locally
    • I'm not sure if having a percentage is the most important aspect of that concern, or if the ability to not solely use one particular AZ is most important. That is to say, if the operator could configure the balancing algorithms to try an AZ-local backend first and then, upon a failure, try any other AZ's backend, that flow might be sufficient.
  3. Future expansion
    • Y'all would like to implement per-route based algorithm configurations
    • We need to implement this in the near term
    • These implementations have to play nice together, make sense, and not be convoluted to understand when walking through the code
  4. An explosion of configuration
    • Adding a percentage weight will add more parameter data to stick onto a route when y'all do the per-route algorithms
    • How many other arbitrary numerical or other parameters are we going to add to influence all of this logic?
    • We're just worried this will become real messy, real quick.

A new possible design

What if we instead make a new bosh spec property that handles the desires around AZ preferences instead?

  router.balancing_algorithm:
    description: "Algorithm used to distribute requests for a route across backends. Supported values are round-robin and least-connection"
    default: round-robin

  router.az_preference:
    description: |
      Configuration option used in conjunction with the `router.balancing_algorithm` to decide from which availability zone to pick a suitable backend. Defaults to "None".
      "None" - There is no preference regarding availability zones. The router uses the `router.balancing_algorithm` across all possible backends in all existing AZs.
      "Local" - The router will prefer backends in the same availability zone as the router proxying the request. It will use the `router.balancing_algorithm` across all backends in its local AZ. Only if there are no backends available, or there are no backend left after multiple failed retries, in the local AZ, then will the router proxy to backends in other AZs.
      "Local-First" - On the initial attempt to pick a backend, the router will use `router.balancing_algorithm` across all backends in the same AZ as the router itself. Subsequent retries, in the case of failure or unavailability, will use _all_ available AZs.
      "Cross-AZ-Fallback" - On the initial attempt to pick a backend, there is no preference regarding availability zone. The router uses the `router.balancing_algorithm` across all backends in all AZs. Subsequent retries, in the case of failure or unavailability, will use `router.balancing_algorithm` against all backends in a _different_ AZ than the first attempt (from the rest of the AZs).
    default: "None"

None would essentially be the behavior as it stand now.
For the work that we intend on doing, we would most likely implement the Local and maybe the Local-First options.

Options like Cross-AZ-Fallback would address your desires to fail over to a different AZ in the case that the first attempt fails and don't need to be implemented immediately. We could also think of other permutations that could be helpful.

What do y'all think?

@domdom82
Copy link
Contributor

I like it in general. A few points:

  • It seems like your requirement is the "exhaust local before everything else" while ours is "fall back to different az if local is down"
  • Technically, your case could still be implemented using a 100% weight on the local zone.
  • In my experience, there is rarely a "one size fits all" solution. Apps with only a handful instances will want to stick to local-AZ as much as possible while apps with lots of instances (e.g. 10 per zone) will want to fail over to avoid trying 10 bad instances in a row and running into a timeout on each. So IMO the whole setting should be "ready for being configurable by users" later on. I can think of having an "az_preference" attribute on a route, next to the load balancing algorithm as you showed.
  • in my mind, the we could merge option 3 and 4. I can't think of a case where you wouldn't want to try local the first time. I would just use "none" in that case.

So maybe we can call "local" -> "local-strict" or "local-only" to emphasize that non-local endpoints will never be used?
This could help remove confusion about "local" vs. "local-first". You would then have "local-only" and "local-first".

@jrussett
Copy link
Contributor Author

jrussett commented Oct 23, 2023

Hi @domdom82,

• It seems like your requirement is the "exhaust local before everything else" while ours is "fall back to different az if local is down"
• Technically, your case could still be implemented using a 100% weight on the local zone.

I agree with both of these things. As for the second point, I agree that we could totally use the weight on the local zone but I think it would be confusing to think about the system when y'all start implementing per-route configuration. At that point, what does the weight become and how do you think about how the weight affects other configurations and inputs to the routing algorithm. I feel like it would become unruly, while it might be easier to more abstractly think about something like:

My platform level algorithm is round_robin and platform level az preference is local, but then the dev has configured their app manifest to specify an app/route level az preference of none, so, really the logic to proxy my request will be a round robin with no preference of AZ locality.

I think you kind of alluded to that in this statement:

In my experience, there is rarely a "one size fits all" solution.... ... ... So IMO the whole setting should be "ready for being configurable by users" later on. I can think of having an "az_preference" attribute on a route, next to the load balancing algorithm as you showed.

I could easily see this platform-level config to be more abstract with the az preference as I've laid out, and then down the road when y'all are working on route-level configs, y'all can add more granular control/config.

• in my mind, the we could merge option 3 and 4. I can't think of a case where you wouldn't want to try local the first time. I would just use "none" in that case

Sounds fair 👍

So maybe we can call "local" -> "local-strict" or "local-only" to emphasize that non-local endpoints will never be used?
This could help remove confusion about "local" vs. "local-first". You would then have "local-only" and "local-first".

The tricky part is that the non-local endpoints are used, but only after all of the available local endpoints have been exhausted.

Maybe it would make sense to change LocalOnly-Local-First and change Local-FirstLocally-Optimistic like so:

  router.az_preference:
    description: |
      Configuration option used in conjunction with the `router.balancing_algorithm` to decide from which
      availability zone to pick a suitable backend. Defaults to "None".
      
      "None" - There is no preference regarding availability zones. The router uses the 
        `router.balancing_algorithm` across all possible backends in all existing AZs.
      "Only-Local-First" - The router will prefer backends in the same availability zone as the router
        proxying the request. It will use the `router.balancing_algorithm` across all backends in its
        local AZ. Only if there are no backends available, or there are no backends left after multiple
        failed retries, in the local AZ, then will the router proxy to backends in other, non-local AZs.
      "Locally-Optimistic" - On the initial attempt to pick a backend, the router will use
        `router.balancing_algorithm` across all backends in the same AZ as the router itself. Subsequent
        retries, in the case of failure or unavailability, will use _all_ available AZs.
    default: "None"

Names are hard 😅

@mariash
Copy link
Member

mariash commented Oct 25, 2023

I would vote for simplicity and have 2 options for router.az_preference:

  1. None
  2. Local where Local is the same as the Locally-Optimistic described by @jrussett above.

With Local in most cases requests will be directed to instances in the same AZ. This option will be configured by CF operators who want to provide the best platform experience by prioritizing speed if possible, if not possible then prioritizing availability. I mean as an operator I don't see reasons why would I pick the Only-Local-First option if it can compromise availability?

In that case, maybe just have it as a boolean router.prefer_local_az (true/false)

@domdom82
Copy link
Contributor

@mariash we could get by zonal problems by reducing the dial timeout to a very low value. Usually, wenn you have a zone issue the packets are dropped and you run into connection timeouts. If your dial timeout is high (routing-release default is 5s) and you have lots of instances in the local zone, you quickly run into 30s+ response times.

For us, we have reduced the dial timeout to 2s (which is still way high, considering that gorouter usually sits next door to the diego cell, average dial times are a few ms). So we could accept a "local-only" option, however I would still vote for not making it a boolean switch, because that will shut the door for any future extensions in that area.

@jrussett
Copy link
Contributor Author

jrussett commented Oct 26, 2023

So, in general, it sounds like we all could agree on having a new bosh spec named router.az_preference that corresponds to named constants (None, etc...), instead of booleans or numbers.

First Start

How about we initially start with the options that make the most logical sense:

  • no preference, or the logic as it stands now
  • try an AZ-local instance first, retry with instances from any AZ

Example bosh spec:

  router.az_preference:
    description: |
      Configuration option used in conjunction with the `router.balancing_algorithm` to decide from which
      availability zone to pick a suitable backend. Defaults to "None".
      
      "None" - There is no preference regarding availability zones. The router uses the 
        `router.balancing_algorithm` across all possible backends in all existing AZs.
      "Locally-Optimistic" - On the initial attempt to pick a backend, the router will use
        `router.balancing_algorithm` across all backends in the same AZ as the router itself. Subsequent
        retries, in the case of failure or unavailability, will use _all_ available AZs.
    default: "None"
Requirements Double Check

At the same time, we will double check the requirements to see if Locally-Optimistic is sufficient for the desired use case.

We had some more discussions, were reviewing the retry logic, and figured out that we don't actually know if every single AZ-local endpoint needs to be exhausted before retrying instances in other AZs. My apologies.

If Locally-Optimistic is not sufficient and we do need to exhaust every single AZ local instance, then we will switch to the Only-Local-First option:

Bosh Spec with Only-Local-First
 router.az_preference:
   description: |
     Configuration option used in conjunction with the `router.balancing_algorithm` to decide from which
     availability zone to pick a suitable backend. Defaults to "None".
     
     "None" - There is no preference regarding availability zones. The router uses the 
       `router.balancing_algorithm` across all possible backends in all existing AZs.
     "Only-Local-First" - The router will prefer backends in the same availability zone as the router
       proxying the request. It will use the `router.balancing_algorithm` across all backends in its
       local AZ. Only if there are no backends available, or there are no backends left after multiple
       failed retries, in the local AZ, then will the router proxy to backends in other, non-local AZs.
   default: "None"

How does that sound?

@ameowlia
Copy link
Member

Thanks for putting this together @jrussett. I like this path forward. That way once you add the router.az_preference property, anyone can extend it and add their own custom algorithm.

@emalm
Copy link
Member

emalm commented Oct 30, 2023

Hey, @domdom82, would you want to switch any of your environments over to that proposed Locally-Optimistic mode if it existed? On our side, we're investigating whether that matches our customers' requests (or if they're insisting on something like the Only-Local-First mode), but if it would be useful independently then we could move forward with that as an option in the meantime.

@maxmoehl
Copy link
Member

Hey, @domdom82, would you want to switch any of your environments over to that proposed Locally-Optimistic mode if it existed? On our side, we're investigating whether that matches our customers' requests (or if they're insisting on something like the Only-Local-First mode), but if it would be useful independently then we could move forward with that as an option in the meantime.

Not dom, but I think I can answer for him: Yes, at the very least we would want to try it to see whether it improves the overall handling of AZ related issues. Whether we end up deploying it to production depends on the outcome of our tests.

@domdom82
Copy link
Contributor

@emalm The one issue we had was with a specific app that had lots of instances in one AZ which failed. So it would have helped to do a quick switch to other AZs after the first one failed. However, we discussed that in general the tcp connect to a host on the same network (gorouter -> diego cell) should be extremely fast (< 50ms order). So we might get by with using "Only-Local-First" in combination with a low dial timeout. This way we would quickly iterate over all the failed local endpoints and tried the remote ones without causing too much latency for the end user.

tl; dr
We are OK with the "Only-Local-First" setting and low(er) dial timeouts.

@domdom82
Copy link
Contributor

@jrussett I've closed my original issue on the matter in favor of this one. There is one bit there that may be worthwhile your consideration. @ameowlia brought up a potential overload scenario where app instances are unevenly distributed across zones. The discussion starts here

My idea for avoiding a local routing "overload" was outlined here:

  • Gorouter knows how many zones there are
  • Gorouter knows how many app instances are in each zone
    -> Only do zone-local routing if there are at least ceil(nr_apps / nr_zones) in the local zone.

e.g.
if there are three app instances but only two zones, we only do zone-local routing if we are in the zone that contains at least ceil(3/2) = 2 instances.

This is a cheap mechanism that prevents us from overloading a single app instance.

It should be noted that this is a rather theoretical problem. In most cases, both gorouters and apps are evenly distributed across zones.

jrussett added a commit to cf-routing/docs-cloudfoundry-concepts that referenced this issue Jan 25, 2024
- Adds documentation related to the AZ-local routing feature introduced
  in [routing-release 0.288.0](https://github.com/cloudfoundry/routing-release/releases/tag/v0.288.0)

See this Github issue for more information:
- cloudfoundry/routing-release#356

[#186117321](https://www.pivotaltracker.com/story/show/186117321)
@jrussett
Copy link
Contributor Author

jrussett commented Jan 25, 2024

This is now available with the following release versions:

Documentation changes are now live, see:

@jrussett jrussett closed this as completed Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants