-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-22422 Retain an ByteBuff with refCnt=0 when getBlock from LRUCache #242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java
Show resolved
Hide resolved
| // Decrease the block's reference count, and if refCount is 0, then it'll auto-deallocate. DO | ||
| // NOT move this up because if do that then the victimHandler may access the buffer with | ||
| // refCnt = 0 which is disallowed. | ||
| previous.getBuffer().release(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the problem. Mind explaining more? Why in victimHandler we will access the previous? And is it possible to add a UT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why in victimHandler we will access the previous? And is it possible to add a UT?
For InclusiveCombinedBlockCache , we will move the evicted block from LRUCache to an larger L2 cache (such as MemcachedBlockCache for longer caching I think), So if release in line#596, then victimHandler will cache an block which point to an unknown area because its memory has been free.
Yeah, will provide a UT.
|
💔 -1 overall
This message was automatically generated. |
|
The patch still not fix the refCnt=0 retain issue, because I've applied this patch to my local branch , and deploy to my test cluster. After run some days, the QPS still dropped from 300000 Get/second to 200 Get/second. Need to find out why. |
|
So even after this patch if QPS drops - you stlil get the exception as attached in the original JIRA description? |
|
bq. So even after this patch if QPS drops - you stlil get the exception as attached in the original JIRA description? |
|
Update the patch with the final fix and UT, FYI @Apache9 @ramkrish86 @anoopsjohn |
|
Seems no hadoop QA feedback ? It's strange... |
| // Must initialize it with null here, because if don't and once an exception happen in | ||
| // readBlock, then we'll release the previous assigned block twice in the finally block. | ||
| // (See HBASE-22422) | ||
| block = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
| BlockCacheKey cacheKey, Cacheable newBlock) { | ||
| // NOTICE: The getBlock has retained the existingBlock inside. | ||
| Cacheable existingBlock = blockCache.getBlock(cacheKey, false, false, false); | ||
| if (existingBlock == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this means we may get NPE in the past?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see the LruBlockCache#cacheBlock:
LruCachedBlock cb = map.get(cacheKey);
if (cb != null && !BlockCacheUtil.shouldReplaceExistingCacheBlock(this, cacheKey, buf)) {
return;
}
The existence pre-check and accessing block in shouldReplaceExistingCacheBlock is not atomic op. so if any eviction happen between them, the NPE will happen. I added a UT testMultiThreadGetAndEvictBlock to address this.
| return absent.get() ? null : re; | ||
| } | ||
|
|
||
| public boolean remove(BlockCacheKey key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While removing the atomicity is not needed? Because we do computeIfAbsent and that is already now atomically guarded?
| absent.set(true); | ||
| return entry; | ||
| }); | ||
| return absent.get() ? null : re; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got it now. You have changed the get() to computeIfPresent(). So if the remove has already removed the entry then the get() cannot do a retain(). So a similar change is also needed for putIfAbsent() also? Rest looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if don't make the put and retain in atomic , then if remove & release happen between the put and retain, finally we 're retain a block with refCnt=0 which is also disallowed. Thanks.
No description provided.