-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEODE-10056: Improve gateway-receiver load balance #7378
Conversation
2fabf45
to
0bdee7d
Compare
0bdee7d
to
272cfef
Compare
d68095e
to
ee8d485
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment deleted
Hi reviewers, With this solution, each server will now send CacheServerLoadMessage containing the correct connection load of the gateway-receiver to all locators in the cluster. This action will happen every 5 seconds as configured with the load-poll-interval parameter. Additionally, the coordinator locator will increase the load each time it provides the server location to the remote gateway-sender in ClientConnectionRequest/ClientConnectionResponse. Locator only maintains load temporarily until CacheServerLoadMessage is received. This behavior makes sense as the server tracks connection load more accurately than the locator. Locator only increases connection load based on the received connection requests while server adjusts the connection load each time connection is established and disconnected. ClientConnectionRequest messages are usually sent to the locator in bursts when the gateway-sender is establishing connections due to traffic. This behavior results in the locator's connection load being way ahead of the server connection load because servers did not establish those connections yet. Suppose during these bursts CacheServerLoadMessage message come to locator carrying low load value for one of the gateway-receivers. In that case, that receiver will be picked more frequently (will have the lowest load), resulting in unbalanced gateway-sender connections. In order for this to have a big impact on load-balancing of sender connections the gateway-receivers must be started with some small delay, so that CacheServerLoadMessages are sent with some delay that is enough to cause imbalance. If CacheServerLoadMessages were sent at the similar time then this would not be a problem as all messages would have similar load and would update locator at similar time. I would be really grateful if you could share your opinion on this matter? |
The problem is that servers send incorrect gateway-receiver connection load to locators within CacheServerLoadMessage. Additionally, locators do not refresh gateway-receivers load with the load received in CacheServerLoadMessage. The only time locator increments gateway-receiver load is after it receives ClientConnectionRequest{group=__recv_group...} and returns selected server in ClientConnectionResponse message. This is done only by coordinator, so that means that other locators will have load with initial values, since it is never updated. The solution is to correctly track gateway-receiver acceptor connection count and then based on it correctly calculate the load when sending CacheServerLoadMessage. Additionally each locator will read the load received from CacheServerLoadMessage and update load for gateway-receiver location id in group __recv__group accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only a codeowner for one of the files changed in this PR, so I can't give much feedback on the overall approach, but there are a few general clean-up changes that would be good to make.
geode-core/src/main/java/org/apache/geode/distributed/internal/LocatorLoadSnapshot.java
Outdated
Show resolved
Hide resolved
geode-core/src/main/java/org/apache/geode/distributed/internal/LocatorLoadSnapshot.java
Show resolved
Hide resolved
geode-core/src/main/java/org/apache/geode/distributed/internal/ServerLocator.java
Outdated
Show resolved
Hide resolved
...e-core/src/test/java/org/apache/geode/distributed/internal/LocatorLoadSnapshotJUnitTest.java
Outdated
Show resolved
Hide resolved
...e-core/src/test/java/org/apache/geode/distributed/internal/LocatorLoadSnapshotJUnitTest.java
Outdated
Show resolved
Hide resolved
...e-core/src/test/java/org/apache/geode/distributed/internal/LocatorLoadSnapshotJUnitTest.java
Outdated
Show resolved
Hide resolved
...e/internal/cache/wan/parallel/ParallelGatewaySenderConnectionLoadBalanceDistributedTest.java
Outdated
Show resolved
Hide resolved
...e/internal/cache/wan/parallel/ParallelGatewaySenderConnectionLoadBalanceDistributedTest.java
Outdated
Show resolved
Hide resolved
...e/internal/cache/wan/parallel/ParallelGatewaySenderConnectionLoadBalanceDistributedTest.java
Outdated
Show resolved
Hide resolved
...e/internal/cache/wan/parallel/ParallelGatewaySenderConnectionLoadBalanceDistributedTest.java
Outdated
Show resolved
Hide resolved
ee8d485
to
cea93c6
Compare
I'm not sure how to resolve the race condition you mention, but I see similar behavior with client/server connections. If a burst of connections is requested and none of those are made before the next load is received from the server, then the locator's load for that server gets reset back to zero. A burst of connections (10 in this case) causes the load to go from 0.0 to 0.012499998:
If none of those connections are made before the next load is sent by that server, its load goes from 0.012499998 to 0.0:
The load for the next request starts is 0.0 again:
One thing to note is that the load is only sent load-poll-interval (default=5 seconds) if it has changed. If it hasn't changed then it only gets sent every update frequency (which is 10 * 5 seconds by default). There is a boolean to control that frequency too:
The load-poll-interva is configurable, but currently only for the cache server not the gateway receiver. It probably wouldn't be too hard to add this support to gateway receiver. Also, there is a gfsh load-balance gateway-sender command that could help alleviate this condition. I'm still reviewing the PR. |
I ran a few tests with some extra logging on these changes. They look good. The receiver exchanges profiles with the locator:
The connectionLoadMap shows 2 groups, namely the null group (default) and the __recv__group group (gateway receiver), each with load=0.0:
Sender connects to the receiver:With the default of 5 dispatcher threads, 5 connections are made to the receiver. The load goes from 0.0 to 0.0062499996:
The connectionLoadMap shows the same 2 groups but now the __recv__group group load is 0.0062499996 for the gateway receiver:
Update the load:Periodically, the server sends an updated load to the locator.
Update the load after ping connection has been made:After another connection is made, the load is updated again.
Connect another sender:Another sender with 5 dispatcher threads connects, and the load is updated again.
Disconnect one sender:When a sender disconnects, the load is updated again.
Start another receiver:When another receiver is started, an entry for it is added to the connectionLoadMap with load=0.0.
Two receivers and two senders:When two receivers are started and two senders are connected, the load is updated (and balanced). In this case, the extra connections are pingers - one from each sender to each receiver.
Load balance senders:This feature does not seem to be working properly. These changes seem to make it work better. I have another bunch of analysis on this that I will either post separately or file a JIRA on. |
Hi @boglesby, I also assumed that the same race condition is possible for the client connections, but I haven't tried to reproduce it. Thanks for pointing this out and lots of other valuable information. Also, thank you for the extensive testing you have done. If we decide to go with this solution, I agree that we should make the load-poll-interval parameter configurable for gateway receivers. Changing it to the lower value would slightly mitigate race condition effects. The load-balance gateways command is working on server this way:
This command will result again in the burst of connection requests that could hit an issue caused by a race condition. Maybe instead of sending load information periodically from the servers, the locator could scrape it (perhaps using CacheServerMXBean) from the servers and apply it simultaneously for all receivers in the locator. The locator could get load when it receives a connection request, and the current connection load is stale (e.g., older than 200 ms), as we don't expect many connections from gateway-senders. This way, the locator would at least have an up-to-date connection load taken at a similar time on all servers. This solution should even catch the change in connection load when the load-balance command destroys all connections. Maybe, an algorithm that could work this way:
Not sure if this makes any sense as I don't know how fast locator can scrape the load. I can create a prototype if you see that this could maybe work? |
Thats a pretty cool idea. I'm not sure whether the CacheServerMXBean has that behavior, but I guess it could be added. In any event, I think this change is good. I'm approving this change, but you need to address the ParallelGatewaySenderConnectionLoadBalanceDistributedTest failure. |
Hi @Bill , @echobravopapa , @kamilla1201 and @pivotal-jbarrett , this PR requires reviews from your side to merge it. Could you please review it? |
|
||
@TestOnly | ||
public synchronized Map<ServerLocationAndMemberId, ServerLoad> getGatewayReceiverLoadMap() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be public for the test or would package private be sufficient.
@@ -631,28 +633,93 @@ public void testFindBestServersCalledWithNegativeCount() { | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upgrade to JUnit 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pivotal-jbarrett , thanks for the review. Upgrading this test to JUnit 5 would be tricky because it uses Rule annotation, which is replaced by ExtendWith annotation. It would be necessary to implement the new interfaces (AfterEachCallback and BeforeEachCallback) to GfshCommandRule and ClusterStartupRule classes. I think that this should be a part of a separate ticket since this is not just a minor adjustment. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just realized that you referenced LocatorLoadSnapshotJUnitTest.java and not a ParallelGatewaySenderConnectionLoadBalanceDistributedTest.java). I will make this change and sorry again for misunderstanding your comment.
@@ -14,6 +14,7 @@ | |||
*/ | |||
package org.apache.geode.distributed.internal; | |||
|
|||
import static org.assertj.core.api.Assertions.assertThat; | |||
import static org.junit.Assert.assertEquals; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert to AssertJ.
I think I would like @upthewaterspout to take a look at this too for good measure. |
da3a03d
to
30038f4
Compare
30038f4
to
6477c6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use collections based AssertJ methods.
...e-core/src/test/java/org/apache/geode/distributed/internal/LocatorLoadSnapshotJUnitTest.java
Show resolved
Hide resolved
...e-core/src/test/java/org/apache/geode/distributed/internal/LocatorLoadSnapshotJUnitTest.java
Outdated
Show resolved
Hide resolved
...e-core/src/test/java/org/apache/geode/distributed/internal/LocatorLoadSnapshotJUnitTest.java
Outdated
Show resolved
Hide resolved
...e-core/src/test/java/org/apache/geode/distributed/internal/LocatorLoadSnapshotJUnitTest.java
Outdated
Show resolved
Hide resolved
...e-core/src/test/java/org/apache/geode/distributed/internal/LocatorLoadSnapshotJUnitTest.java
Outdated
Show resolved
Hide resolved
The test case testMultiUser failed because Wan service is available in geode-core distributed tests, and therefore test now throws: org.apache.geode.internal.cache.wan.GatewaySenderConfigurationException : Locators must be configured before starting gateway-sender. instead of: java.lang.IllegalStateException: WAN service is not available.
This PR has been hanging for a long time now, and we should decide whether to close it or merge it. I think this PR adds value to Apache geode if we at least "synchronize" sending of CacheServerLoadMessage on all servers. Current 5 seconds possible difference is just too much. I think this could be done with the following simple not so smart algorithm:
@boglesby what do you think about this? |
This commit synchronizes the getting and sending of gateway-receiver load (CacheServerLoadMessage) on all servers.
Changes are implemented, and there is no reply for several months.
* GEODE-10056: Improve gateway-receiver load balance The problem is that servers send incorrect gateway-receiver connection load to locators within CacheServerLoadMessage. Additionally, locators do not refresh gateway-receivers load with the load received in CacheServerLoadMessage. The only time locator increments gateway-receiver load is after it receives ClientConnectionRequest{group=__recv_group...} and returns selected server in ClientConnectionResponse message. This is done only by coordinator, so that means that other locators will have load with initial values, since it is never updated. The solution is to correctly track gateway-receiver acceptor connection count and then based on it correctly calculate the load when sending CacheServerLoadMessage. Additionally each locator will read the load received from CacheServerLoadMessage and update load for gateway-receiver location id in group __recv__group accordingly. * Updates after the review * Fix for the flaky test cases * Updates after review * Empty commit to trigger test * Updates after review * Fix failed distributed test The test case testMultiUser failed because Wan service is available in geode-core distributed tests, and therefore test now throws: org.apache.geode.internal.cache.wan.GatewaySenderConfigurationException : Locators must be configured before starting gateway-sender. instead of: java.lang.IllegalStateException: WAN service is not available. * Synchronize handling of receiver load This commit synchronizes the getting and sending of gateway-receiver load (CacheServerLoadMessage) on all servers.
The problem is that servers send incorrect gateway-receiver connection
load to locators within CacheServerLoadMessage. Additionally, locators
do not refresh gateway-receivers load with the load received in
CacheServerLoadMessage. The only time locator increments
gateway-receiver load is after it receives
ClientConnectionRequest{group=__recv_group...} and returns selected
server in ClientConnectionResponse message. The client sends
a ClientConnectionRequest to the one locator from list received
in RemoteLocatorJoinResponse (initial list of locators) or
LocatorListRequest (periodically updated list of locators).
The received list is always sorted by the host address and port.
The client will send ClientConnectionRequest following the
sorted list of locators (from first to last) until a successful outcome.
That means that the same locator (first one in the list)
will handle all connection requests in normal conditions, and other
locators will not update their gateway-receivers connection load.
The solution is to track the gateway-receiver acceptor connection count
correctly and, based on it, accurately calculate the load
when sending CacheServerLoadMessage. Additionally, each locator will
read the load received from CacheServerLoadMessage and update the
gateway-receivers connection load in group __recv__group accordingly
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?
Has your PR been rebased against the latest commit within the target branch (typically
develop
)?Is your initial contribution a single, squashed commit?
Does
gradlew build
run cleanly?Have you written or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?