New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26316 Per-table or per-CF compression codec setting overrides #3730
Conversation
🎊 +1 overall
This message was automatically generated. |
Confirmed functionality with a single host cluster:
Loaded one WARC from common crawl. Major compaction takes 11 seconds.
Major compaction now takes 42 seconds. Total size on disk reduced by 11.6% vs level 1.
Major compaction now takes 17 minutes 15 seconds. Total size on disk reduced by 13.3% vs level 1. (Sure, this level is crazy in practice.) |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
3184470
to
c7e6996
Compare
Rebase and update commit message. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
The test failures are not related and appear environmental in nature.
I ran the tests locally with this change applied and they passed. TestQuotaAdmin seems like a resource intensive unit (has a running time of 86 seconds on my macbook) and maybe we should break it up.
|
Rebase. Fix checkstyle warnings. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
More unstable tests, but they do not seem related to this change: One error
Three flakes:
|
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Test failures are not related |
@virajjasani @Apache9 This PR is blocking #3748, which has been approved for merge. I refactored the changes of #3748 and this PR to make #3748 smaller and collect all of the the changes for plumbing There might be another round of checkstyle fixes needed. Will apply them as soon as the precommit report is available. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
This will cause a small merge conflict between apache#3730 and apache#3748 because we need CanReinit here too.
We get and retain Compressor instances in HFileBlockDefaultEncodingContext, and could in theory call Compressor#reinit when setting up the context, to update compression parameters like level and buffer size, but we do not plumb through the CompoundConfiguration from the Store into the encoding context. As a consequence we can only update codec parameters globally in system site conf files. Fine grained configurability is important for algorithms like ZStandard (ZSTD), which offers more than 20 compression levels, where at level 1 it is almost as fast as LZ4, and where at higher levels it utilizes computationally expensive techniques to rival LZMA at compression ratio but trades off significantly for reduced compresson throughput. The ZSTD level that should be set for a given column family or table will vary by use case.
This will cause a small merge conflict between apache#3730 and apache#3748 because we need CanReinit here too.
Rebase |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left few nits, +1 overall
} | ||
|
||
@Override | ||
public HFileBlockDecodingContext newDataBlockDecodingContext(HFileContext meta) { | ||
return new HFileBlockDefaultDecodingContext(meta); | ||
public HFileBlockDecodingContext newDataBlockDecodingContext(Configuration conf, HFileContext meta) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: checkstyle line-length might complain
* @param meta | ||
* @param rawKVs raw KVs | ||
* @param meta hfile context | ||
* @param conf store configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like duplicate @param
, it's already covered above
decompressor = compression.getDecompressor(); | ||
// Some algorithms don't return decompressors and accept null as a valid parameter for | ||
// same when creating decompression streams. We can ignore these cases wrt reinit. | ||
if (decompressor != null && decompressor instanceof CanReinit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify to decompressor instanceof CanReinit
as it covers null check also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that, cool.
try (InputStream is = | ||
compression.createDecompressionStream(dataInputStream, decompressor, 0)) { | ||
BlockIOUtils.readFullyWithHeapBuffer(is, blockBufferWithoutHeader, | ||
uncompressedSizeWithoutHeader); | ||
} | ||
} finally { | ||
if (decompressor != null) { | ||
compression.returnDecompressor(decompressor); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shall we remove Compression#decompress method? After this change, perhaps it is not being used anywhere else, nothing urgent, can be follow up too if it creates complexities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it is not being used, I think removal is a good idea
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
…3730) We get and retain Compressor instances in HFileBlockDefaultEncodingContext, and could in theory call Compressor#reinit when setting up the context, to update compression parameters like level and buffer size, but we do not plumb through the CompoundConfiguration from the Store into the encoding context. As a consequence we can only update codec parameters globally in system site conf files. Fine grained configurability is important for algorithms like ZStandard (ZSTD), which offers more than 20 compression levels, where at level 1 it is almost as fast as LZ4, and where at higher levels it utilizes computationally expensive techniques to rival LZMA at compression ratio but trades off significantly for reduced compresson throughput. The ZSTD level that should be set for a given column family or table will vary by use case. Signed-off-by: Viraj Jasani <vjasani@apache.org> Conflicts: hbase-compression/hbase-compression-zstd/src/main/java/org/apache/hadoop/hbase/io/compress/zstd/ZstdDecompressor.java hbase-server/src/test/java/org/apache/hadoop/hbase/io/compress/HFileTestBase.java hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestDataBlockEncoders.java hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestSeekToBlockWithEncoders.java
We get and retain Compressor instances in HFileBlockDefaultEncodingContext, and could in theory call Compressor#reinit when setting up the context, to update compression parameters like level and buffer size, but we do not plumb through the CompoundConfiguration from the Store into the encoding context. As a consequence we can only update codec parameters globally in system site conf files.
Fine grained configurability is important for algorithms like ZStandard (ZSTD), which offers more than 20 compression levels, where at level 1 it is almost as fast as LZ4, and where at higher levels it utilizes computationally expensive techniques to rival LZMA at compression ratio but trades off significantly for reduced compresson throughput. The ZSTD level that should be set for a given column family or table will vary by use case.