-
Notifications
You must be signed in to change notification settings - Fork 470
Optimized logic for getting a random TabletServer connection #4309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized logic for getting a random TabletServer connection #4309
Conversation
The previous logic in this class would gather all of the Tserver ZNodes in ZooKeeper, then get the data for each ZNode and validate their ServiceLock. Then, after all of that it would randomly pick one of the TabletServers to connect to. It did this through the ZooCache object which on an initial connection would be empty and causes a lot of back and forth to ZooKeeper. The side effect of this is that the ZooCache would be populated with TabletServer information. This change modifies the ClientContext to populate the ZooCache in a single-shot background task and modifies the default logic for getting a connection to a TabletServer. The new logic will make 2 calls to ZooKeeper in the best case scenario, one to get the list of TServer ZNodes in Zookeeper and another to get the ZNode data for one of them. Fixes apache#4303
keith-turner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the changes in this PR this method may only be called by test code now, so may be able to clean up and adjust the test code.
core/src/main/java/org/apache/accumulo/core/rpc/clients/TServerClient.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/clientImpl/ClientContext.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/rpc/clients/TServerClient.java
Outdated
Show resolved
Hide resolved
| byte[] data = zc.getLockData(zLocPath); | ||
| if (data != null) { | ||
| String strData = new String(data, UTF_8); | ||
| if (!strData.equals("manager")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pre-existing problem unrelated to this PR. Is there a constant we could use here instead of the literal "manager"? I took a quick look around and did not find anything. I can open an issue about this if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need an issue for this then.
core/src/main/java/org/apache/accumulo/core/rpc/clients/TServerClient.java
Outdated
Show resolved
Hide resolved
I got rid of the method in aecd790 |
ctubbsii
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still in process of reviewing this, but based on the description, I'm not sure it needs to populate ZK in a single shot background process. It should only populate it as-needed, on access. But, I do think there's value in changing the logic, so it doesn't require accessing ZK so much to pick a random server. It should be able to pick a random potential server first, check to see if it has a lock and other checks (basically by doing the same thing that the pre-populated retrieval was doing), and then pick a different one if that first random one is not acceptable. That will still eventually populate the cache, on access, but not waste time pre-populating it before randomly picking one, since there's a high probability that the first one chosen will be acceptable.
I will have to finish reviewing this PR to see if the changes here implement that behavior, and if not, how it differs from what I described. If it differs, I'd be curious to know in what way, and whether that is better than what I described.
core/src/main/java/org/apache/accumulo/core/clientImpl/ThriftTransportKey.java
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/rpc/clients/TServerClient.java
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/rpc/clients/TServerClient.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/accumulo/core/clientImpl/ThriftTransportKeyTest.java
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/rpc/clients/TServerClient.java
Outdated
Show resolved
Hide resolved
|
Full IT build based off commit aecd790 passed successfully. |
The previous logic in this class would gather all of the Tserver ZNodes in ZooKeeper, then get the data for each ZNode and validate their ServiceLock. Then, after all of that it would randomly pick one of the TabletServers to connect to. It did this through the ZooCache object which on an initial connection would be empty and causes a lot of back and forth to ZooKeeper. The side effect of this is that the ZooCache would be populated with TabletServer information.
This change modifies the ClientContext to populate the ZooCache in a single-shot background task and modifies the default logic for getting a connection to a TabletServer. The new logic will make 2 calls to ZooKeeper in the best case scenario, one to get the list of TServer ZNodes in Zookeeper and another to get the ZNode data for one of them.
Fixes #4303