-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28469: Integration of time-based priority caching into compaction paths #5866
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@@ -555,16 +563,33 @@ private void writeInlineBlocks(boolean closing) throws IOException { | |||
private void doCacheOnWrite(long offset) { | |||
cacheConf.getBlockCache().ifPresent(cache -> { | |||
HFileBlock cacheFormatBlock = blockWriter.getBlockForCaching(cacheConf); | |||
BlockCacheKey key = buildBlockCacheKey(offset, cacheFormatBlock); | |||
if (!shouldCacheBlock(cache, key)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that shouldCacheBlock internally makes a call to isHotData (key), which would go all the way to fetch the file metadata and decide if the file is hot or not. Would this be called per block? If so, would it impact the performance adversely? Can we somehow restrict these traversals to once per file instead of once per block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which would go all the way to fetch the file metadata and decide if the file is hot or not.
Yes, it will look up the Configuration which corresponds to the store to which this file belongs to from the map. However, it won't read any file-related metadata, since we are already passing the maxTimestamp here.
Would this be called per block?
Yes, it is called for every block.
would it impact the performance adversely?
Yes, I think it will add some performance overhead.
Can we somehow restrict these traversals to once per file instead of once per block.
I couldn't think of any better solution at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the work we do inside DataTieringManager to figure out the store config is a problem, the way it is implemented here, we will be doing for every block, even though the config is exactly the same. Worse, we already have that info passed before to HFileWriterImpl. StoreFileWriter already has the HStore config and this is passed along to HFileWriterImpl at initialization time. We should keep this config as global variable in HFileWriterImpl. Then we should change the added shouldCacheBlock method in the BlockCache API to accept the config as parameter, or maybe wrap it with the key itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All we need is the hot-data-age now. the max timestamp is already present now. If we make the hot-data-age available to shouldCacheBlock, the round trips to get the config, can be avoided within DataTieringManager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's minimize the number of line changes. We don't need all these refactoring moving TimeRangeTracker update logic from StoreFileWriter to HFileWriterImpl if we just pass the same TimeRangeTracker instance to HFileWriterImpl.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DataTieringManager.java
Show resolved
Hide resolved
@@ -555,16 +563,33 @@ private void writeInlineBlocks(boolean closing) throws IOException { | |||
private void doCacheOnWrite(long offset) { | |||
cacheConf.getBlockCache().ifPresent(cache -> { | |||
HFileBlock cacheFormatBlock = blockWriter.getBlockForCaching(cacheConf); | |||
BlockCacheKey key = buildBlockCacheKey(offset, cacheFormatBlock); | |||
if (!shouldCacheBlock(cache, key)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the work we do inside DataTieringManager to figure out the store config is a problem, the way it is implemented here, we will be doing for every block, even though the config is exactly the same. Worse, we already have that info passed before to HFileWriterImpl. StoreFileWriter already has the HStore config and this is passed along to HFileWriterImpl at initialization time. We should keep this config as global variable in HFileWriterImpl. Then we should change the added shouldCacheBlock method in the BlockCache API to accept the config as parameter, or maybe wrap it with the key itself.
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, +1.
@jhungund do you have any further concerns?
public Optional<Boolean> shouldCacheBlock(BlockCacheKey key, Configuration conf) { | ||
try { | ||
DataTieringManager dataTieringManager = DataTieringManager.getInstance(); | ||
if (dataTieringManager != null && !dataTieringManager.isHotData(key, conf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinayakphegde, @wchevreuil,
We are only transiently using the timestamp within BlockCacheKey in this API. But, we are keeping this value permanently in the BlockCacheKey increasing the size of the BlockCacheKey and keeping the timestamp redundantly, which will not be used later. This overhead can be avoided by passing the required data as arguments to this function shouldCacheBlock.
All we need is this type of a function:
shouldCacheBlock(long timestamp, long age or conf)
wouldn't it make sense to not increase the size of BlockCacheKey permanently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All we need is a call to
hotDataValidator(maxTimestamp, getDataTieringHotDataAge(configuration));
to decide whether to cache the block or not. We already have the maxTimestamp and age handy at the caller of shouldCacheBlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it will increase the size of each BlockCacheKey by 8 bytes. Assuming we have 10 million BlockCacheKeys on average on each region server (I think, this is what @wchevreuil said the other day), this would increase the heap size by approximately 76 MB.
However, I believe the proposed solution is not correct. The shouldCacheBlock
function should not process information related to data tiering; this responsibility should be delegated to the DataTieringManager
. Additionally, we should not assume that the DataTieringType of the block is TIME_RANGE, as it could be different in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general idea behind storing any data in the memory (as BlockCacheKey will reside in memory) is to use the data in future. Here, we are increasing the size of BlockCacheKey, transiently using the newly added member for making caching decisions, but later keeping the data in the BlockCacheKey without using it for later. Additionally this increase in size is effective even if the feature is not enabled, since we have changed the class layout.
Hence, the general idea about the new proposal is to not store metadata in BlockCacheKey, which will not be used later at all. Instead, store it on the stack (stack variable, function argument) so that the variable will go away after getting used.
You can use the APIs accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I understand the concern. The only thing I am worried about is how we can achieve it. Let me think about a way to achieve this without compromising on OO principles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a file based bucket cache over an ephemeral storage of 1.6TB capacity and a typical block size of 64KB, we may get up to 26M blocks. With 4 extra bytes this will add per block, we are talking about 100MB extra heap usage, which accounts for 0.3% of a 32GB heap on typical RS deployment, so I guess we can relax on this extra variable.
WDYT, @jhungund ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @wchevreuil, In my humble opinion, we usually code with the following order of priorities:
Functional correctness > Performance/Resource utilisation > Coding guidelines/principles
We cannot let go of functional correctness for the sake of performance or for coding principles.
Similarly, we may not want to relax the performance/resource utilisation for the sake of adhering to coding principles.
Here we are repurposing an object that stays in memory forever as an information carrier for our utility function (isHotData). Once the utility function finishes, we will never use that data again.
Can we have a utility defined in DataTieringManager to take timestamps, configuration objects and make decisions for us?
Sorry, I do not want to hold on to this change by being inflexible, but trying to explain that we can achieve the same result by relaxing the OO coding principles instead of relaxing the memory utilisation.
@vinayakphegde and I will spend one more day and try to come up with a solution. If we can't come up with a solution, we can go ahead and submit this change.
Thanks,
Janardhan
🎊 +1 overall
This message was automatically generated. |
DataTieringManager dataTieringManager = DataTieringManager.getInstance(); | ||
if (dataTieringManager != null && !dataTieringManager.isHotData(timeRangeTracker, conf)) { | ||
LOG.debug("Data tiering is enabled for file: '{}' and it is not hot data", | ||
key.getHfileName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do we need this "key" parameter only for logging the details? If so, we could move this debug to the caller function and avoid passing this parameter to this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, it's only used for logging purposes. The caller function, shouldCacheBlock(), isn't aware of data tiering, so I couldn't add that line there. Hence, I kept it here. Additionally, it makes sense to pass the corresponding BlockCacheKey to shouldCacheBlock() to make the decision.
However, I'm okay with removing that parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK got it. I am ok for the parameter key to be there considering the API name is shouldCacheBlock().
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@jhungund , would you have any further concerns with this PR, otherwise I'll merge this into the feature branch. |
LGTM |
Hi @vinayakphegde, I had rebased master into the feature branch, now we got some conflicts within this PR. Could you resolve these conflicts? I'll then merge the PR into the feature branch. |
b661e3f
to
18f1011
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
18f1011
to
9ea998d
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
…on paths (apache#5866) Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org> Reviewed-by: Janardhan Hugund <janardhan.hungund@cloudera.com> Change-Id: I3ca919aed7f1e56ac96b8ade8033d36290be6927
This PR introduces time-based priority caching for the
hbase.rs.cachecompactedblocksonwrite
configuration. This feature enhances block caching efficiency by prioritizing blocks based on their time stamps during the writing of compacted files.