New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Weighted load balancing policy #1922

Draft

akhaku wants to merge 1 commit into apache:4.x from akhaku:loadBalancer

+374 −0

Contributor

akhaku commented Mar 25, 2024

This PR open-sources the work Joey Lynch and I did and presented at ApacheCon 2022. It requires a bit of work before it's mergeable, but hoping to get some feedback on a couple areas before I put in much more work. A couple major things left to do:

Port the tests we added internally. We also have some jmh tests but I'm not sure if those will fit in the driver codebase
Pass the local rack through to the driver. Internally we passed it through the context, for OSS perhaps the right thing it to implement something like the optional/required local DC pattern.
The documentation could probably use some more work, but I'd love guidance on where to focus that: the implementation class? Or reference.conf?

Contributor Author

akhaku commented Mar 25, 2024

Additionally, this will tackle JAVA-3040

Contributor

aratno commented Mar 25, 2024

The documentation could probably use some more work, but I'd love guidance on where to focus that: the implementation class? Or reference.conf?

I'd put configuration documentation in reference.conf. For more context on why someone would benefit from this LBP over the default, put that in manual/core/load_balancing/README.md.

aratno reviewed

View reviewed changes

...om/datastax/oss/driver/internal/core/loadbalancing/RackAwareWeightedLoadBalancingPolicy.java Outdated

+               *
+               * The default weights are good for the vast majority of use cases, but you can tweak them to get different behavior.
+               */
+              public class RackAwareWeightedLoadBalancingPolicy extends DefaultLoadBalancingPolicy {

Contributor

aratno Mar 25, 2024

It looks like this is going to be most useful in environments like AWS, where Cassandra racks correspond to AWS AZs, and there are relatively few racks, and a high likelihood that an application has some instances running in the same rack as Cassandra. In other environments where racks correspond to actual racks in a datacenter, there are many more racks, and the likelihood of an application running in the same rack is low, there's not as much benefit to prioritizing rack-alignment as heavily.

Does that sound right based on your experience?

It would be useful to have a metric for the count of requests that go to each category (rack-aligned, replica-aligned, instance-starting, instance-unhealthy), so users can know whether they're setting local rack correctly, set weights correctly, etc.

Contributor Author

akhaku Mar 25, 2024

Yes, it's certainly more valuable for cases where the rack->rack latency is higher. However, being rack-aware is just one part of it, I think the mechanism around preferring nodes with fewer in-flight requests is an improvement over the round-robin mechanism (assuming non-token-aware, or all replicas) in DefaultLoadBalancingPolicy/BasicLoadBalancingPolicy. Additionally, the getAndUpdate in those balancers ends up being a bottleneck in high-throughput situations. It's been a while since our ApacheCon talk and there's no recording but the slides are available here https://www.apachecon.com/acna2022/slides/04-Khaku-Lynch_Improving-Cassandra-Client.pdf and they go into the problems a bit.

My eventual goal is to make this the default load balancing policy in the driver but for now I'll settle for getting it in and making it an option.

Regarding the metrics - that's perhaps a little tricky since we're creating a query plan here rather than firing off requests. Additionally, the scoring/ordering is decoupled from the characteristics that resulted in the score - perhaps some trace logging with the characteristics? In practice we saw latencies drop immediately when we deployed this.

Contributor

aratno Mar 25, 2024

Yeah, I think the changes outside of rack-awareness make this worth recommending broadly. I wouldn't support making it the default until it's been available for at least a release, but worth pushing it in that direction.

...om/datastax/oss/driver/internal/core/loadbalancing/RackAwareWeightedLoadBalancingPolicy.java Outdated

+                // By default we will only score this many nodes, the rest will get added on without scoring.
+                // We don't usually need to score every single node if there are more than a few.
+                static final int DEFAULT_SCORED_PLAN_SIZE = 8;

Contributor

aratno Mar 25, 2024

It seems like it would make sense to increase this value if using a large cluster and lots of token-unaware queries (ALLOW FILTERING, WHERE TOKEN, WHERE pk IN), since replica-local does not apply but rack-local does, and in a larger cluster the likelihood of an 8-node random sample including a rack-local Cassandra host is lower than some might like. Increasing the scored plan size will increase the likelihood that a rack-local host is used out of the sample.

Does that sound right to you? Any other situations to tune this based on your experience?

Contributor Author

akhaku Mar 25, 2024

That's fair, this default was chosen to fit with our requirements but a larger default may make sense for a more general use case.

Contributor

aratno Mar 25, 2024

It's a reasonable default - just thinking through what we should document for tuning.

...om/datastax/oss/driver/internal/core/loadbalancing/RackAwareWeightedLoadBalancingPolicy.java Outdated

+                private final double weightNonRack;
+                private final double weightNonReplica;
+                private final double weightStarting;
+                private final double weightUnhealthy;

Contributor

aratno Mar 25, 2024

Looks like something that was lost in the 4.x rewrite was the error-aware LBP: https://github.com/apache/cassandra-java-driver/blob/3.x/driver-core/src/main/java/com/datastax/driver/core/policies/ErrorAwarePolicy.java

Not requesting any change here - just realized that this doesn't exist anymore. Could be useful to incorporate those ideas in the future.

...om/datastax/oss/driver/internal/core/loadbalancing/RackAwareWeightedLoadBalancingPolicy.java Outdated

+                }
+                protected double getWeightedScore(Node node, Session session, long nowMillis, long nowNanos, boolean isReplica) {
+                  int base = Math.min(32768, 1 + getInFlight(node, session));

Contributor

aratno Mar 25, 2024

getInFlight calls InFlightHandler#getInFlight, which isn't thread-safe. StreamIdGenerator#maxAvailableIds is a plain int, so could be cached on a calling thread. Risks of this value being inaccurate are low, could update maxAvailableIds to be a volatile, either way not a huge concern.

Contributor Author

akhaku Mar 25, 2024

maxAvailableIds is a final instance variable on StreamIdGenerator so I think we're safe here, unless I misunderstand how that works. And availableIds() returns the value of an AtomicInteger, so I think that means that InFlightHandler#getInFlight is thread-safe

Contributor

aratno Mar 25, 2024

You're right - it's final, missed that.

...om/datastax/oss/driver/internal/core/loadbalancing/RackAwareWeightedLoadBalancingPolicy.java Outdated

+                  }
+                  // Take a copy of nodes and reference to replicas since the node map is concurrent
+                  Set<Node> dcNodeSet = getLiveNodes().dc(getLocalDatacenter());

Contributor

aratno Mar 25, 2024

If a user sets a local DC, the current implementation only plans and scores for Cassandra hosts in the local DC. If a user doesn't set a local DC, there's no weighting to encourage the query plan to select a coordinator in the local DC over one in a remote DC, right? If there's no local DC configured, getLocalDatacenter is empty, and dcNodeSet ends up being all nodes.

If a user uses all non-local consistency levels (QUORUM, etc.) and doesn't want all traffic to go to a coordinator in the local DC but does want to encourage the query planner to pick coordinators in the local DC, how would they configure that?

Contributor Author

akhaku Mar 25, 2024

Currently the RackAwareWeightedLoadBalancingPolicy extends DefaultLoadBalancingPolicy, which uses MandatoryLocalDcHelper and so requires a local DC to be set. I can however see an argument where it instead extends BasicLoadBalancingPolicy, in a world where RAWLBP is a more general solution with sensible defaults (kinda like UCS is for Cassandra compaction?) and you can tweak weights to get what you want.

Regarding non-local consistency level - if we allow scoring non-local-DC nodes, I think we'd still want to weight the local ones higher and rely on the server-side snitch to figure out a quorum partner if needed.


          Weighted load balancing policy

8980e15

The weighted load balancing policy uses a number of factors
as weights along with the number of in-flight requests for
each host to score and then sort the list of known live
hosts.

akhaku force-pushed the loadBalancer branch from 3280315 to 8980e15 Compare

March 25, 2024 21:10

akhaku changed the title ~~Rack-aware weighted load balancing policy~~ Weighted load balancing policy

akhaku marked this pull request as draft

April 9, 2024 18:21

tolbertam self-requested a review

April 12, 2024 18:45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment