New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-25307 ThreadLocal pooling leads to NullPointerException #2685
Conversation
* PoolMap does not discard any elements anymore. If an element is put, it always stores it. The reason: it stores expensive resources (rpc connections) which would lead to resource leak if we simple discard it. RpcClients can reference netty ByteBufs which are reference counted. Resource cleanup is done by AbstractRpcClient.cleanupIdleConnections(). * Removed concurrency from PoolMap. It is called only from AbstractRpcClient in synchronized blocks, so it doesn't have to be thread-safe. * Max size is a hint for RoundRobinPool. Until maxSize is not reached, it will force users to create new resources. So it works the same as before, but it does not prevent putting more elements then maxSize. * ThreadLocalPool doesn't uses ThreadLocal class anymore. It stores resources on thread basis, but it doesn't remove values when a thread exits. Again, proper cleanup is done by cleanupIdleConnections().
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
hbase-client/src/main/java/org/apache/hadoop/hbase/util/PoolMap.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/util/PoolMap.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the connections member variable in AbstractRpcConnection private and include a comment above it that all access must be synchronized. (at least on glance it doesn’t look like anything outside the class needs the current protected access modifier.)
I was trying to figure out why the AbstractRpcConnection#close
method isn’t broken
Collection<T> connToClose;
synchronized (connections) {
if (!running) {
return;
}
running = false;
connToClose = connections.values();
connections.clear();
}
cleanupIdleConnectionTask.cancel(true);
for (T conn : connToClose) {
conn.shutdown();
}
closeInternal();
for (T conn : connToClose) {
conn.cleanupConnection();
}
the contact for Map
says that the returned collection from Map.values
should reflect changes in the map and vice versa. So after we store a reference to the connToClose
collection and then call connections.clear
that collection should be empty and the following attempts to shutdown and cleanup will be no-ops.
looking at the implementation of PoolMap#values
we make a copy of the values and any changes to the underlying map are definitely not reflected in the collection. This also explains why we’re not running into a problem with cleanupIdleConnections and cancelConnections calling remove on the PoolMap while iterating over the collection returned by PoolMap#values. (related: the fact that we aren’t using it as a general Map explains why PoolMap#remove(key) always returning null isn’t a problem, or why PoolMap.remove(key, value) and PoolMap.entrySet behave like a multimap)
Does PoolMap
actually need to implement java.util.Map
? PoolMap does not look general purpose and this deviation from the interface contract makes it harder to look for correctness problems. I’d like to see us define the expected behavior for the methods we actually use.
hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AbstractRpcClient.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/util/PoolMap.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/util/PoolMap.java
Outdated
Show resolved
Hide resolved
HBASE-25307 ThreadLocal pooling leads to NullPointerException * PoolMap does not discard any elements anymore. If an element is put, it always stores it. The reason: it stores expensive resources (rpc connections) which would lead to resource leak if we simple discard it. RpcClients can reference netty ByteBufs which are reference counted. Resource cleanup is done by AbstractRpcClient.cleanupIdleConnections(). * PoolMap does not implement Map interface anymore, so ensuring thread-safety become easier. Put method is replaced with getOrCreate(). * ThreadLocalPool doesn't uses ThreadLocal class anymore. It stores resources on thread basis, but it doesn't remove values when a thread exits. Again, proper cleanup is done by cleanupIdleConnections().
It is thread-safe again.
I interpreted this comment as we can do a bigger refactor. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
hbase-client/src/main/java/org/apache/hadoop/hbase/util/PoolMap.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/util/PoolMap.java
Outdated
Show resolved
Hide resolved
in addition to the above, looks generally like we could prune some methods off of PoolMap? |
hbase-client/src/main/java/org/apache/hadoop/hbase/util/PoolMap.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for TestThreadLocalPoolMap
that confirms if we make multiple requests for the same key in a single thread that we'll get back a single answer and not new ones?
If the test calls poolMap.getOrCreate
in a loop from the test method directly that should do it, I think?
hbase-client/src/test/java/org/apache/hadoop/hbase/util/TestThreadLocalPoolMap.java
Outdated
Show resolved
Hide resolved
hbase-client/src/test/java/org/apache/hadoop/hbase/util/TestRoundRobinPoolMap.java
Outdated
Show resolved
Hide resolved
hbase-client/src/test/java/org/apache/hadoop/hbase/util/TestThreadLocalPoolMap.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great. one nit below, I think you're fine to delete the comment line and merge.
assertEquals(POOL_SIZE, poolMap.values().size()); | ||
|
||
// pool is filled, get() should return elements round robin order | ||
// starting from 1, because the first get was called by runThread() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this second line is wrong. the first value is now 0 and not 1, and runThread doesn't exist.
// when the pool becomes full, we expect the value we get back to be | ||
// what we put earlier, in round-robin order | ||
runThread(randomKey, randomValue, randomValues.get((i - POOL_SIZE + 1) % POOL_SIZE)); | ||
public void testMultiThreadedRoundRobin() throws ExecutionException, InterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great test.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
…#2685) * PoolMap does not discard any elements anymore. If an element is put, it always stores it. The reason: it stores expensive resources (rpc connections) which would lead to resource leak if we simple discard it. RpcClients can reference netty ByteBufs which are reference counted. Resource cleanup is done by AbstractRpcClient.cleanupIdleConnections(). * PoolMap does not implement Map interface anymore, so ensuring thread-safety has become easier. Put method is replaced with getOrCreate(). * ThreadLocalPool doesn't use ThreadLocal class anymore. It stores resources on thread basis, but it doesn't remove values when a thread exits. Again, proper cleanup is done by cleanupIdleConnections(). Signed-off-by: Sean Busbey <busbey@apache.org> Signed-off-by: Wellington Chevreuil <wellington.chevreuil@gmail.com>
…#2685) * PoolMap does not discard any elements anymore. If an element is put, it always stores it. The reason: it stores expensive resources (rpc connections) which would lead to resource leak if we simple discard it. RpcClients can reference netty ByteBufs which are reference counted. Resource cleanup is done by AbstractRpcClient.cleanupIdleConnections(). * PoolMap does not implement Map interface anymore, so ensuring thread-safety has become easier. Put method is replaced with getOrCreate(). * ThreadLocalPool doesn't use ThreadLocal class anymore. It stores resources on thread basis, but it doesn't remove values when a thread exits. Again, proper cleanup is done by cleanupIdleConnections(). Signed-off-by: Sean Busbey <busbey@apache.org> Signed-off-by: Wellington Chevreuil <wellington.chevreuil@gmail.com>
…#2685) * PoolMap does not discard any elements anymore. If an element is put, it always stores it. The reason: it stores expensive resources (rpc connections) which would lead to resource leak if we simple discard it. RpcClients can reference netty ByteBufs which are reference counted. Resource cleanup is done by AbstractRpcClient.cleanupIdleConnections(). * PoolMap does not implement Map interface anymore, so ensuring thread-safety has become easier. Put method is replaced with getOrCreate(). * ThreadLocalPool doesn't use ThreadLocal class anymore. It stores resources on thread basis, but it doesn't remove values when a thread exits. Again, proper cleanup is done by cleanupIdleConnections(). Signed-off-by: Sean Busbey <busbey@apache.org> Signed-off-by: Wellington Chevreuil <wellington.chevreuil@gmail.com>
it always stores it. The reason: it stores expensive resources (rpc
connections) which would lead to resource leak if we simple discard it.
RpcClients can reference netty ByteBufs which are reference counted.
Resource cleanup is done by AbstractRpcClient.cleanupIdleConnections().
AbstractRpcClient in synchronized blocks, so it doesn't have to be
thread-safe.
it will force users to create new resources. So it works the same as
before, but it does not prevent putting more elements then maxSize.
resources on thread basis, but it doesn't remove values when a thread
exits. Again, proper cleanup is done by cleanupIdleConnections().