New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-25698 Fixing IllegalReferenceCountException when using TinyLfuBlockCache #3215
Conversation
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@@ -177,6 +177,7 @@ public Cacheable getBlock(BlockCacheKey cacheKey, | |||
} | |||
} else if (updateCacheMetrics) { | |||
stats.hit(caching, cacheKey.isPrimary(), cacheKey.getBlockType()); | |||
value.retain(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should follow the same as in LRU cache no?
LruCachedBlock cb = map.computeIfPresent(cacheKey, (key, val) -> {
// It will be referenced by RPC path, so increase here. NOTICE: Must do the retain inside
// this block. because if retain outside the map#computeIfPresent, the evictBlock may remove
// the block and release, then we're retaining a block with refCnt=0 which is disallowed.
// see HBASE-22422.
val.getBuffer().retain();
return val;
});
And other places
@@ -205,10 +206,11 @@ public void testReaderWithLRUBlockCache() throws Exception { | |||
lru.shutdown(); | |||
} | |||
|
|||
private BlockCache initCombinedBlockCache() { | |||
private BlockCache initCombinedBlockCache(final String l2Cache) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we pass is the l1cache's policy name. So this arg name not appropriate
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java
Show resolved
Hide resolved
4dd7337
to
7e61c95
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@@ -196,13 +202,17 @@ public void cacheBlock(BlockCacheKey key, Cacheable value) { | |||
key.getHfileName(), key.getOffset(), value.heapSize(), DEFAULT_MAX_BLOCK_SIZE)); | |||
} | |||
} else { | |||
value.retain(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might have to have a closer look here. Because now the reads from FS may be to shared buffers, we will have to close here. See the LRU cache. I will also do a detailed read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make use of isSharedMem()
similar to how LRU does it?
private Cacheable asReferencedHeapBlock(Cacheable buf) {
if (buf instanceof HFileBlock) {
HFileBlock blk = ((HFileBlock) buf);
if (blk.isSharedMem()) {
return HFileBlock.deepCloneOnHeap(blk);
}
}
// The block will be referenced by this LRUBlockCache, so should increase its refCnt here.
return buf.retain();
}
And instead of directly retaining value here, we can call this method. That seems like the only thing we are missing?
🎊 +1 overall
This message was automatically generated. |
33103d0
to
947c03c
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java
Outdated
Show resolved
Hide resolved
if (hfb.getBlockType().isData()) { | ||
Assert.assertTrue(hfb.isSharedMem()); | ||
} else if (!isTinyLfu) { | ||
Assert.assertFalse(hfb.isSharedMem()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this check? Means in TinyLFU case there is some diff compared to other 2 types when it comes to non data blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what it looks like. I got to know from this test. Shall we continue having this check? Or you think different treatment of non-data blocks is an issue?
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java
Show resolved
Hide resolved
@@ -171,8 +177,10 @@ public Cacheable getBlock(BlockCacheKey cacheKey, | |||
if ((value != null) && caching) { | |||
if ((value instanceof HFileBlock) && ((HFileBlock) value).isSharedMem()) { | |||
value = HFileBlock.deepCloneOnHeap((HFileBlock) value); | |||
cacheBlockUtil(cacheKey, value, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think u can simplify this. No such if else is needed.
Can avoid deepCloneOnHeap and isSharedMem check here. From here call cacheBlock only
In cacheBlock asReferencedHeapBlock() call is newly added and there we have deepClone as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this if/else, we would perform deepCloneOnHeap()
twice right? Just above this line, we do deep clone and then if we just call cacheBlock()
only, it will again perform deepClonedOnHeap
because the same condition as above holds true i.e
if ((value instanceof HFileBlock) && ((HFileBlock) value).isSharedMem())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls refer the code in LRUBlockCache. U can do the deepclone in asReferencedHeapBlock() only based on isSharedMem right? retain() call is anyways needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
U can do the deepclone in asReferencedHeapBlock() only based on isSharedMem right? retain() call is anyways needed
LRUBlockCache does not perform block.retain() if block is cloned:
* 1. if cache the cloned heap block, its refCnt is an totally new one, it's easy to handle; <br>
* 2. if cache the original heap block, we're sure that it won't be tracked in ByteBuffAllocator's
* reservoir, if both RPC and LRUBlockCache release the block, then it can be garbage collected by
* JVM, so need a retain here.
private Cacheable asReferencedHeapBlock(Cacheable buf) {
if (buf instanceof HFileBlock) {
HFileBlock blk = ((HFileBlock) buf);
if (blk.isSharedMem()) {
return HFileBlock.deepCloneOnHeap(blk);
}
}
// The block will be referenced by this LRUBlockCache, so should increase its refCnt here.
return buf.retain();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anoopsjohn This is simplified now.
cacheBlockUtil(key, value, false); | ||
} | ||
|
||
private void cacheBlockUtil(BlockCacheKey key, Cacheable value, boolean deepClonedOnHeap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be in cacheBlock() only as per previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left 2 questions reg deep clone on heap as well as treatment of non-data blocks by TinyLfu.
@@ -171,8 +177,10 @@ public Cacheable getBlock(BlockCacheKey cacheKey, | |||
if ((value != null) && caching) { | |||
if ((value instanceof HFileBlock) && ((HFileBlock) value).isSharedMem()) { | |||
value = HFileBlock.deepCloneOnHeap((HFileBlock) value); | |||
cacheBlockUtil(cacheKey, value, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this if/else, we would perform deepCloneOnHeap()
twice right? Just above this line, we do deep clone and then if we just call cacheBlock()
only, it will again perform deepClonedOnHeap
because the same condition as above holds true i.e
if ((value instanceof HFileBlock) && ((HFileBlock) value).isSharedMem())
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java
Outdated
Show resolved
Hide resolved
if (hfb.getBlockType().isData()) { | ||
Assert.assertTrue(hfb.isSharedMem()); | ||
} else if (!isTinyLfu) { | ||
Assert.assertFalse(hfb.isSharedMem()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what it looks like. I got to know from this test. Shall we continue having this check? Or you think different treatment of non-data blocks is an issue?
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java
Show resolved
Hide resolved
// the block and release, then we're retaining a block with refCnt=0 which is disallowed. | ||
cacheable.retain(); | ||
return cacheable; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smile. Related. I've been studying at this bit of code here.. the flip to computeIfPresent. #get in CHM is lockless but computeIfPresent takes a lock on the bucket (of keys) so seems to slow us down especially if high read load with most data in cache; e.g. meta hits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah locks would slow us down. On the other hand, based on discussion on HBASE-22422 , it seems computeIfPresent (locking) is necessary to prevent concurrency issues with #retain and #release.
Based on @openinx's comment here, wondering if the sawtooth graph of QPS is similar concurrency issue and not resolved yet.
@saintstack Any suggestions? Have you been using Offheap read path with LRU recently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @virajjasani ... I just read over the HBASE-22422. Nice work by @openinx . Agree need locking but currently the lock spans more than the buffer... covering CHM bucket. We might be able to scope the lock to the buffer... but I'm not sure and a little afraid to touch the code here.
On the sawtooth, I've not looked.
Not using offheap. All onheap but in async-profiler, the CHM#computeIfPresent is the main blocking point by far (Trying to drive up the throughput when inmemory meta lookups).
On this patch generally, +1. I'd noticed messing w/ this stuff that tinylfu was missing this bit of code... I'd also noticed that the locking profile with tinylfu in place looked much better .... I'd failed to put the two bits of info together. My sense is that once this goes in, that tinylfu will be the main blocker... just like CHM is now.
Oddly, for me, tinylfu seemed to run slower in my compares which I didn't expect given the nice batching it does and its WAL scheme. Its probably variance in my test rig. Working on it.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that you might be able to switch to an optimistic read, where you validate that the cacheable.retain()
was successful. My naive inspection of the code is that it will throw an IllegalReferenceCountException
, which could be caught and returned as a cache miss. Because it is decremented after the entry was removed from the mapping (in evictBlock
, though I don't see a release in the removal listener), the stale read should be fine as a subsequent read/write would not observe that discarded entry.
In YCSB benchmarks, TinyLfuBlockCache
is a little bit slower because Caffeine adds a small overhead on reads to have O(1) evictions with a higher hit rate. In comparison, the LruBlockCache
does no policy work on read and performs an O(n lg n) operation on eviction. Due to using an SLRU policy, the Zipfian workload has the same optimal hit rate, causing only the small policy cost to on reads to be exposed.
In a realistic workload, however, one will expect that the TinyLfu policy should have an improved hit rate which will reduces latencies (more requests served from cache, reducing I/O and cpu load). The O(1) eviction should be helpful for large caches when the system in under high load, as it avoids a spike of work and amortizes the cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for showing up @ben-manes .... Let me look at your suggestion... Its promising.
On tinylfu being 'slower', I'm trying to repro a production workload... What I have is messy at the moment still in need of work. My test compares were coarse. I didn't want to say too much more until I had solidified the test rig.... (as noted above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @saintstack. This was from my analysis when contributing the original patch.
Zipfian is wonderful for a perf benchmark by stressing locks, etc. to find bottlenecks, but isn't realistic for actual production performance. I'm not sure if there is a great approach other than network record/replay or canarying.
If you have a workload trace we can try to simulate that, where the hit rates should be better (e.g. database trace. That wouldn't show actual system behavior, just the cache's expected hit rates in isolation. HBase's LRU is similar-ish to SLRU, so then ARC might be a good upper bound of expectations.
Between zipfian benchmark and trace simulations, we can get a roughish idea of if there is a benefit. Otherwise canarying is the best that I've seen so far, which is a bit heavy handed but simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sure many of perf regressions reported in HBase 2 (compared to HBase 1) in dev/user mailing lists related to read requests might be related to using CHM#computeIfPresent usages for every onheap and offheap cache hits. Yes, refCount makes code look better but on the other hand, we have perf issues.
I believe we should think about this and see if we really need netty based refCount, or at least continue using CHM#get and ride over IllegalReferenceCountException
by swallowing and evicting the block (I believe that's what @ben-manes's suggestion is). And the final decision should be applicable to all l1Cache strategies: SLRU, TinyLfu, AdaptiveLRU.
Otherwise BlockCache will have clear perf issues in HBase 1 vs HBase 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saintstack @anoopsjohn @ben-manes How about this one? I am yet to benchmark this and perform chaos testing with this, but before I do it, just wanted to see if you are aligned with this rough patch.
diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
index 3e5ba1d19c..bb2b394ccd 100644
--- a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
+++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
@@ -39,6 +39,7 @@ import org.apache.hadoop.hbase.io.HeapSize;
import org.apache.hadoop.hbase.io.encoding.DataBlockEncoding;
import org.apache.hadoop.hbase.util.ClassSize;
import org.apache.hadoop.util.StringUtils;
+import org.apache.hbase.thirdparty.io.netty.util.IllegalReferenceCountException;
import org.apache.yetus.audience.InterfaceAudience;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@@ -510,14 +511,15 @@ public class LruBlockCache implements FirstLevelBlockCache {
@Override
public Cacheable getBlock(BlockCacheKey cacheKey, boolean caching, boolean repeat,
boolean updateCacheMetrics) {
- LruCachedBlock cb = map.computeIfPresent(cacheKey, (key, val) -> {
- // It will be referenced by RPC path, so increase here. NOTICE: Must do the retain inside
- // this block. because if retain outside the map#computeIfPresent, the evictBlock may remove
- // the block and release, then we're retaining a block with refCnt=0 which is disallowed.
- // see HBASE-22422.
- val.getBuffer().retain();
- return val;
- });
+ LruCachedBlock cb = map.get(cacheKey);
+ if (cb != null) {
+ try {
+ cb.getBuffer().retain();
+ } catch (IllegalReferenceCountException e) {
+ // map.remove(cacheKey); ==> not required here
+ cb = null;
+ }
+ }
if (cb == null) {
if (!repeat && updateCacheMetrics) {
stats.miss(caching, cacheKey.isPrimary(), cacheKey.getBlockType());
And this perf improvement is to be followed by all L1 caching, something we can take up as a follow up task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would switch to map.remove(cacheKey, cb)
so that a race doesn't discard a new mapping. If my naive reading is correct, this map.remove(cacheKey)
would already occur after the cb.release()
, so this may not be necessary. That could mean that a new block was computed, so this remove discards the wrong block mistakenly. You might not need the map removal here if you can rely on the release being performed after the map operation completed.
hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/TinyLfuBlockCache.java
Lines 246 to 252 in 947c03c
public boolean evictBlock(BlockCacheKey cacheKey) { | |
Cacheable value = cache.asMap().remove(cacheKey); | |
if (value != null) { | |
value.release(); | |
} | |
return (value != null); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, map.remove(cacheKey, cb)
too should not be required in this case. Thanks @ben-manes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch LGTM. I defer to the experts though (@anoopsjohn ). I think it needs a release note to say that this will slow down tinylfu as I understand it. Will open an issue to address once an idea on how to address perf (this is secondary to fixing refcounting exceptions).
refCount is not used for just L1 cache right? refCount is used only if we use Offheap BucketCache? And if this is the case, perhaps we could achieve CHM#get for L1 onheap cache and CHM#computeIfPresent (as is) for L1+L2 cache? WDYT @anoopsjohn @saintstack ? |
Okk looks like refCount was introduced with ByteBuff only, checking HBASE-22005 |
Ref count is used in all cases now. This has become a part of the ByteBuff which holds the block data. Ya this was added so as to unify the code. We needed refCount at offheap BC. Later we added a way to read into our pooled DBBs when we read from HDFS. So there also ref counting. But then we suffer from perf! Lets take a step back and see what we can do. |
d58bd92
to
f1ebf9c
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
f1ebf9c
to
a3cbc18
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
…lockCache (#3215) Signed-off-by: Andrew Purtell <apurtell@apache.org> Signed-off-by: Anoop Sam John <anoopsamjohn@apache.org> Signed-off-by: Michael Stack <stack@apache.org>
…lockCache (#3215) Signed-off-by: Andrew Purtell <apurtell@apache.org> Signed-off-by: Anoop Sam John <anoopsamjohn@apache.org> Signed-off-by: Michael Stack <stack@apache.org>
…lockCache (#3215) Signed-off-by: Andrew Purtell <apurtell@apache.org> Signed-off-by: Anoop Sam John <anoopsamjohn@apache.org> Signed-off-by: Michael Stack <stack@apache.org>
No description provided.