-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27710 ByteBuff ref counting is too expensive for on-heap buffers #5104
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@Apache9 thank you very much for the review. I had an idea this morning, and wonder if you have any opinion. Currently we do this: protected void checkRefCount() {
ObjectUtil.checkPositive(refCnt(), REFERENCE_COUNT_NAME);
} Calling I think what we really care about is "has this buffer been recycled". In which case, what if we added a volatile boolean to our RefCnt class which gets set to true when the Recycler is called? We don't care about synchronization since it always goes from false to true. The above method could become: //
// in RefCnt.java
//
private volatile boolean recycled;
public boolean isRecycled() {
return recycled;
}
@Override
protected final void deallocate() {
this.recycler.free();
this.recycled = true; // of note
if (leak != null) {
this.leak.close(this);
}
}
//
// In ByteBuff.java
//
protected void checkRefCount() {
Preconditions.checkState(!refCnt.isRecycled(), "ByteBuff has been recycled");
} Of course we'd also rename the method. I plugged this into our test case and it performs similarly to this PR. The benefit of this approach is it might also speed up off-heap usages while still providing protection. |
What you proposed is the trick in netty's CompositeByteBuf, where they introduce a freed flag to indicate whether the ByteBuf is still valid. And for AbstractReferenceCountedByteBuf, the code is like this
But seems we do not have access to the updater field so I think we could go with your current approach. The down side is we will add one more boolean for each ByteBuff but should be OK? |
I downloaded So it's effectively like adding a long... I guess most ByteBufferAllocators are configured in the 10s of thousands, so not a huge issue there. RefCnt is also used in BucketCache where imagine this will only matter for very large bucket cache sizes? We give 75gb to bucket cache in some cases, which equals 2-5M blocks. That'd be 40mb of space for us, which might be worth the performance tradeoff. If someone uses TB of file cache (i.e. when using object store like s3 for main storage), then it might be a lot more. This solution is equivalent in performance to my original memory-free solution for on-heap, which is where we noticed the regression. The potential benefit is for off-heap, which I don't have performance numbers on. For my company's case, I'd be fine to add the Boolean. For the more general case, it might make sense to only add the boolean if we can back it up with benchmarks. In that case, it might make sense to do that in a separate jira so we can solve the specific regression here first. Let me know if that changes your opinion at all before I merge this as-is. |
Just commit it as is for now, can open another issue for improvement performance for off heap ByteBuff. |
Agreed, thanks for discussing and review. I had another idea that we could null out the recycler after calling it, so isAccessible would be a null check on that existing reference, rather than a new boolean field check. This would not take any additional memory. But can investigate these options in another issue. |
This reverts commit 9a727d1.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
apache#5104) Signed-off-by: Duo Zhang <zhangduo@apache.org>
This felt like the cleanest way to solve this case, but open to other opinions. Whether we need to checkRefCount is directly tied to whether we use NONE recycler.
A couple other options I considered:
boolean onHeap
orboolean shouldCheckRefCount
. This felt more error prone because it's too easy for someone to forget to pass the correct boolean value for the corresponding recycler.I added a basic test to validate that we only call checkRefCount for non-NONE recyclers. Beyond that, I think our existing ample coverage should suffice? Let me know if you'd like to see a particular test.