Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-27264 Add options to consider compressed size when delimiting blocks during hfile writes #4675

Merged
merged 11 commits into from
Aug 15, 2022

Conversation

wchevreuil
Copy link
Contributor

Here we propose two additional properties,"hbase.block.size.limit.compressed" and "hbase.block.size.max.compressed" that would allow for consider the compressed size (if compression is in use) for delimiting blocks during hfile writing. When compression is enabled, certain datasets can have very high compression efficiency, so that the default 64KB block size and 10GB max file size can lead to hfiles with very large number of blocks.

In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that switches to compressed size for delimiting blocks, and "hbase.block.size.max.compressed" is an int with the limit, in bytes for the compressed block size, in order to avoid very large uncompressed blocks (defaulting to 320KB).

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 5s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 2m 28s master passed
+1 💚 compile 2m 15s master passed
+1 💚 checkstyle 0m 33s master passed
+1 💚 spotless 0m 44s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 19s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 9s the patch passed
+1 💚 compile 2m 11s the patch passed
+1 💚 javac 2m 11s the patch passed
-0 ⚠️ checkstyle 0m 31s hbase-server: The patch generated 2 new + 1 unchanged - 0 fixed = 3 total (was 1)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 11m 32s Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.
-1 ❌ spotless 0m 36s patch has 65 errors when running spotless:check, run spotless:apply to fix.
+1 💚 spotbugs 1m 24s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 8s The patch does not generate ASF License warnings.
32m 7s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4675
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 08dc61e24ed0 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / e8c14ee
Default Java AdoptOpenJDK-1.8.0_282-b08
checkstyle https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/1/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count 60 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/1/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 3m 36s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 30s master passed
+1 💚 compile 0m 35s master passed
+1 💚 shadedjars 4m 0s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 23s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 14s the patch passed
+1 💚 compile 0m 35s the patch passed
+1 💚 javac 0m 35s the patch passed
+1 💚 shadedjars 3m 59s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 22s the patch passed
_ Other Tests _
-1 ❌ unit 205m 7s hbase-server in the patch failed.
225m 35s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux beda996b50e9 5.4.0-1081-aws #88~18.04.1-Ubuntu SMP Thu Jun 23 16:29:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / e8c14ee
Default Java AdoptOpenJDK-1.8.0_282-b08
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/1/testReport/
Max. process+thread count 2680 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 6m 1s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 49s master passed
+1 💚 compile 0m 47s master passed
+1 💚 shadedjars 3m 48s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 28s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 38s the patch passed
+1 💚 compile 0m 46s the patch passed
+1 💚 javac 0m 46s the patch passed
+1 💚 shadedjars 3m 45s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 26s the patch passed
_ Other Tests _
-1 ❌ unit 206m 18s hbase-server in the patch failed.
229m 28s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux 0e15a50216e9 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / e8c14ee
Default Java AdoptOpenJDK-11.0.10+9
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/1/testReport/
Max. process+thread count 2688 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 14s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 2m 22s master passed
+1 💚 compile 2m 13s master passed
+1 💚 checkstyle 0m 31s master passed
+1 💚 spotless 0m 44s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 22s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 20s the patch passed
+1 💚 compile 2m 12s the patch passed
+1 💚 javac 2m 12s the patch passed
+1 💚 checkstyle 0m 31s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 11m 31s Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.
+1 💚 spotless 0m 44s patch has no errors when running spotless:check.
+1 💚 spotbugs 1m 22s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
32m 9s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4675
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 44bb01a13bd5 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / d734acc
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 64 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/2/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 46s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 20s master passed
+1 💚 compile 0m 34s master passed
+1 💚 shadedjars 3m 59s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 22s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 11s the patch passed
+1 💚 compile 0m 35s the patch passed
+1 💚 javac 0m 35s the patch passed
+1 💚 shadedjars 4m 1s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 21s the patch passed
_ Other Tests _
+1 💚 unit 202m 37s hbase-server in the patch passed.
219m 38s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux f0e617a00992 5.4.0-1081-aws #88~18.04.1-Ubuntu SMP Thu Jun 23 16:29:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / d734acc
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/2/testReport/
Max. process+thread count 2644 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 3s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 29s master passed
+1 💚 compile 0m 49s master passed
+1 💚 shadedjars 3m 44s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 26s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 38s the patch passed
+1 💚 compile 0m 50s the patch passed
+1 💚 javac 0m 50s the patch passed
+1 💚 shadedjars 3m 44s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 26s the patch passed
_ Other Tests _
+1 💚 unit 204m 28s hbase-server in the patch passed.
221m 41s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux 7a7fd5073ed1 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / d734acc
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/2/testReport/
Max. process+thread count 2672 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 40s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 18s master passed
+1 💚 compile 3m 25s master passed
+1 💚 checkstyle 0m 47s master passed
+1 💚 spotless 1m 15s branch has no errors when running spotless:check.
+1 💚 spotbugs 2m 18s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 22s the patch passed
+1 💚 compile 3m 19s the patch passed
+1 💚 javac 3m 19s the patch passed
+1 💚 checkstyle 0m 43s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 15m 39s Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.
+1 💚 spotless 0m 56s patch has no errors when running spotless:check.
+1 💚 spotbugs 1m 48s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 8s The patch does not generate ASF License warnings.
45m 47s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4675
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 81e0073683d3 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 54f2106
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 72 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/3/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 48s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 53s master passed
+1 💚 compile 0m 45s master passed
+1 💚 shadedjars 4m 30s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 31s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 33s the patch passed
+1 💚 compile 0m 43s the patch passed
+1 💚 javac 0m 43s the patch passed
+1 💚 shadedjars 4m 23s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 31s the patch passed
_ Other Tests _
+1 💚 unit 200m 58s hbase-server in the patch passed.
221m 28s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux dd0c49080f0a 5.4.0-1081-aws #88~18.04.1-Ubuntu SMP Thu Jun 23 16:29:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 54f2106
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/3/testReport/
Max. process+thread count 3904 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/3/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 0s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 36s master passed
+1 💚 compile 0m 47s master passed
+1 💚 shadedjars 3m 47s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 26s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 38s the patch passed
+1 💚 compile 0m 47s the patch passed
+1 💚 javac 0m 47s the patch passed
+1 💚 shadedjars 3m 46s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 27s the patch passed
_ Other Tests _
+1 💚 unit 203m 55s hbase-server in the patch passed.
221m 50s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux ca492d459a6e 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 54f2106
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/3/testReport/
Max. process+thread count 2500 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/3/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@@ -886,6 +905,27 @@ void ensureBlockReady() throws IOException {
finishBlock();
}

public boolean shouldFinishBlock() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to have this logic here vs in HFileWriteRImpl with the rest of the shouldfinish logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method involves dealing with some block specifics, like compression, the block content byte array buffer and what to do with compression size when deciding what should be a block limit. Moving it to HFileWriteRImpl would spill some block specific variables and logic into the file writer logic. It just feels to me, putting it here is more cohesive.

@@ -319,6 +323,9 @@ protected void checkBlockBoundary() throws IOException {
shouldFinishBlock = blockWriter.encodedBlockSizeWritten() >= hFileContext.getBlocksize()
|| blockWriter.blockSizeWritten() >= hFileContext.getBlocksize();
}
if (blockWriter.isSizeLimitCompressed()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change this to if (blockWriter.isSizeLimitCompressed() && !shouldFinishBlock)? Just noting your comment on the calculation of compression ratio in the other file, we could further avoid that cost if we already know we need to finish the block for other reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually want to enter here if shouldFinishBlock is true. Because that means the raw/encoded uncompressed size is larger than BLOCK_SIZE. But we don't know if the compressed size is smaller than BLOCK_SIZE, so we'll call blockWriter.shouldFinishBlock() to find that out.

And in the case where shouldFinishBlock is false, we'll not actually calculate the compressed size inside blockWriter.shouldFinishBlock(), because we don't go beyond this point.

Copy link
Contributor

@taklwu taklwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM, just one question about the additional blockWriter.shouldFinishBlock() with original shouldFinishBlock logic.

@@ -319,6 +323,9 @@ protected void checkBlockBoundary() throws IOException {
shouldFinishBlock = blockWriter.encodedBlockSizeWritten() >= hFileContext.getBlocksize()
|| blockWriter.blockSizeWritten() >= hFileContext.getBlocksize();
}
if (blockWriter.isSizeLimitCompressed()) {
shouldFinishBlock &= blockWriter.shouldFinishBlock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to the above comment by @bbeaudreault

in what situation the shouldFinishBlock = true and blockWriter.shouldFinishBlock() = false ? is it possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, shouldFinishBlock could be true at this point because so far here we just checked "raw" uncompressed size or encoded uncompressed against BLOCK_SIZE. It is possible that these sizes are higher than BLOCK_SIZE, but the compressed size might still be less than the BLOCK_SIZE.

@Apache9
Copy link
Contributor

Apache9 commented Aug 9, 2022

I think this is a useful feature. When implementing an in-house LSM tree based storage system, I used to make use of the compression rates for the already written blocks to predicate the compression rate of the next block, to determine whether we should finish a block.

I think the approach here is also OK, compressing once when we reach the default block size to predicate the compression rate.

Maybe we could introduce something like a plugin to predicate the compression rate? The default implementation just always returns 1, and we could introduce different algorithms to predicate the compression rate of the current block size.

What do you guys think? For me, the option names are a bit confusing... What does 'size.limit.compressed' mean?

Thanks.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 20s master passed
+1 💚 compile 3m 18s master passed
+1 💚 checkstyle 0m 40s master passed
+1 💚 spotless 1m 1s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 50s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 14s the patch passed
+1 💚 compile 3m 22s the patch passed
+1 💚 javac 3m 23s the patch passed
+1 💚 checkstyle 0m 50s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 16m 29s Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.
+1 💚 spotless 0m 57s patch has no errors when running spotless:check.
+1 💚 spotbugs 2m 1s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
45m 39s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4675
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 9e5b624931ae 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 5919b30
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 64 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/4/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@wchevreuil
Copy link
Contributor Author

Maybe we could introduce something like a plugin to predicate the compression rate? The default implementation just always returns 1, and we could introduce different algorithms to predicate the compression rate of the current block size.

I thought about it originally, but then concluded this feature made more sense as an on/off (off by default) behaviour. Do you see other variations in between?

What do you guys think? For me, the option names are a bit confusing... What does 'size.limit.compressed' mean?

Thanks.

The properties were a bit confusing indeed. I tried to rename those now, let me know if those are more intuitive.

Copy link
Contributor

@taklwu taklwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wchevreuil
Copy link
Contributor Author

At least I've proposed another way to predicate the compression rate right? But I do not think there is a always win algorithm here, so a pluggable algorithm will be more appropriate here?

Sorry, I think I misunderstood it, previously. What if we leave this current PR as is, for an initial approach on using compressed sizes to determine blocks, then work on the pluggable solution on a separate ticket?

@Apache9
Copy link
Contributor

Apache9 commented Aug 11, 2022

At least I've proposed another way to predicate the compression rate right? But I do not think there is a always win algorithm here, so a pluggable algorithm will be more appropriate here?

Sorry, I think I misunderstood it, previously. What if we leave this current PR as is, for an initial approach on using compressed sizes to determine blocks, then work on the pluggable solution on a separate ticket?

If we expose the configs out then it will be a pain to switch to a pluggable solution because we can not simply remove these configs without a deprecation cycle...

I still prefer we make a pluggable solution first, and then the first approach could be the algorithm in this PR.

@wchevreuil
Copy link
Contributor Author

If we expose the configs out then it will be a pain to switch to a pluggable solution because we can not simply remove these configs without a deprecation cycle...

I still prefer we make a pluggable solution first, and then the first approach could be the algorithm in this PR.

Ok, had just pushed a new commit where the algorithm for defining the compression rate and adjust the block size limit is pluggable.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 9s master passed
+1 💚 compile 2m 50s master passed
+1 💚 checkstyle 0m 32s master passed
+1 💚 spotless 0m 52s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 38s master passed
-0 ⚠️ patch 1m 44s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 59s the patch passed
+1 💚 compile 2m 58s the patch passed
-0 ⚠️ javac 2m 58s hbase-server generated 1 new + 192 unchanged - 1 fixed = 193 total (was 193)
-0 ⚠️ checkstyle 0m 38s hbase-server: The patch generated 9 new + 1 unchanged - 0 fixed = 10 total (was 1)
-0 ⚠️ whitespace 0m 0s The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 hadoopcheck 13m 4s Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.
-1 ❌ spotless 0m 41s patch has 69 errors when running spotless:check, run spotless:apply to fix.
+1 💚 spotbugs 1m 36s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 7s The patch does not generate ASF License warnings.
38m 7s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4675
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 4a5d9dce3542 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 39b496e
Default Java AdoptOpenJDK-1.8.0_282-b08
javac https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/artifact/yetus-general-check/output/diff-compile-javac-hbase-server.txt
checkstyle https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
whitespace https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/artifact/yetus-general-check/output/whitespace-eol.txt
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count 64 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@@ -248,6 +254,8 @@ static class Header {
static final byte[] DUMMY_HEADER_NO_CHECKSUM =
new byte[HConstants.HFILEBLOCK_HEADER_SIZE_NO_CHECKSUM];

public static final String MAX_BLOCK_SIZE_UNCOMPRESSED = "hbase.block.max.size.uncompressed";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this and it's logic to the predicate itself?

// In order to avoid excessive compression size calculations, we do it only once when
// the uncompressed size has reached BLOCKSIZE. We then use this compression size to
// calculate the compression rate, and adjust the block size limit by this ratio.
if (adjustedBlockSize == 0 || uncompressedBlockSize >= adjustedBlockSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are recomputing compression size every time if we are even a little over the previously calculated adjusted block size, which is possible for all the blocks and will result in computation all the time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, whenever we are closing the current block, based on adjusted block size value from previous block. It doesn't add much overhead because we are already closing the block, regardless. Main problem here, is that we don't know yet if it's little over or not, because we have been checking only uncompressed size.

Comment on lines 55 to 59
int compressedSize = EncodedDataBlock.getCompressedSize(context.getCompression(),
context.getCompression().getCompressor(), contents.getBuffer(), 0,
contents.size());
adjustedBlockSize = uncompressedBlockSize / compressedSize;
adjustedBlockSize *= context.getBlocksize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting the algorithm to be

  • Let the first block get closed with the regular adjustedBlockSize == block_size
  • Have a method in this predicate to be called on finishBlock and use the currently closed block statistics to determine the compression ratio and store the adjusted size.
  • Use the above compression ratio/adjusted size to determine the boundaries of the next block.

This way, we will keep on adjusting the ratio for the next block as per the previous block without doing any extra compression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had pushed a new commit with this approach.

Copy link
Contributor

@ankitsinghal ankitsinghal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments on the current alogrithm.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 43s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 17s master passed
+1 💚 compile 0m 35s master passed
+1 💚 shadedjars 4m 0s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 21s master passed
-0 ⚠️ patch 4m 28s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 10s the patch passed
+1 💚 compile 0m 33s the patch passed
+1 💚 javac 0m 33s the patch passed
+1 💚 shadedjars 3m 59s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 21s the patch passed
_ Other Tests _
-1 ❌ unit 199m 33s hbase-server in the patch failed.
216m 58s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux c53ce3e6bdd2 5.4.0-1081-aws #88~18.04.1-Ubuntu SMP Thu Jun 23 16:29:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 39b496e
Default Java AdoptOpenJDK-1.8.0_282-b08
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/testReport/
Max. process+thread count 2628 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 12s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 57s master passed
+1 💚 compile 0m 48s master passed
+1 💚 shadedjars 3m 47s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 29s master passed
-0 ⚠️ patch 4m 25s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 35s the patch passed
+1 💚 compile 0m 46s the patch passed
+1 💚 javac 0m 46s the patch passed
+1 💚 shadedjars 3m 46s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 26s the patch passed
_ Other Tests _
-1 ❌ unit 202m 37s hbase-server in the patch failed.
220m 41s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux a695044eedf7 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 39b496e
Default Java AdoptOpenJDK-11.0.10+9
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/testReport/
Max. process+thread count 2566 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/7/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 42s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 12s master passed
+1 💚 compile 0m 35s master passed
+1 💚 shadedjars 3m 59s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 21s master passed
-0 ⚠️ patch 4m 26s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 11s the patch passed
+1 💚 compile 0m 34s the patch passed
+1 💚 javac 0m 34s the patch passed
+1 💚 shadedjars 4m 0s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 21s the patch passed
_ Other Tests _
-1 ❌ unit 8m 8s hbase-server in the patch failed.
23m 53s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux a369d6f85cc9 5.4.0-1081-aws #88~18.04.1-Ubuntu SMP Thu Jun 23 16:29:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 73759be
Default Java AdoptOpenJDK-1.8.0_282-b08
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/testReport/
Max. process+thread count 562 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 2s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 41s master passed
+1 💚 compile 0m 48s master passed
+1 💚 shadedjars 3m 47s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 28s master passed
-0 ⚠️ patch 4m 24s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 32s the patch passed
+1 💚 compile 0m 47s the patch passed
+1 💚 javac 0m 47s the patch passed
+1 💚 shadedjars 3m 45s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 25s the patch passed
_ Other Tests _
-1 ❌ unit 10m 40s hbase-server in the patch failed.
27m 53s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux 9f6f29948798 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 73759be
Default Java AdoptOpenJDK-11.0.10+9
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/testReport/
Max. process+thread count 1019 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 6s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 2m 15s master passed
+1 💚 compile 2m 12s master passed
+1 💚 checkstyle 0m 31s master passed
+1 💚 spotless 0m 43s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 16s master passed
-0 ⚠️ patch 1m 23s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 18s the patch passed
+1 💚 compile 2m 13s the patch passed
-0 ⚠️ javac 2m 13s hbase-server generated 1 new + 192 unchanged - 1 fixed = 193 total (was 193)
-0 ⚠️ checkstyle 0m 31s hbase-server: The patch generated 10 new + 1 unchanged - 0 fixed = 11 total (was 1)
-0 ⚠️ whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 hadoopcheck 11m 26s Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.
-1 ❌ spotless 0m 37s patch has 69 errors when running spotless:check, run spotless:apply to fix.
+1 💚 spotbugs 1m 23s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
31m 27s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4675
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 88a1ccee7a70 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 73759be
Default Java AdoptOpenJDK-1.8.0_282-b08
javac https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/artifact/yetus-general-check/output/diff-compile-javac-hbase-server.txt
checkstyle https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
whitespace https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/artifact/yetus-general-check/output/whitespace-eol.txt
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count 60 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/8/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@ankitsinghal ankitsinghal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left a few comments that you can complete before committing. +1 so that you don't need to wait for me to commit it.

Comment on lines 52 to 54
if (uncompressedBlockSize >= adjustedBlockSize) {
adjustedBlockSize = context.getBlocksize() * compressionRatio;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the (uncompressedBlockSize >= adjustedBlockSize) check so that we are adjusting size on the basis of previous block compression everytime by calculating the adjustedBlockSize in updateLatestBlockSizes itself, And from this method only return it

Comment on lines 914 to 927
public boolean shouldFinishBlock() throws IOException {
// int uncompressedBlockSize = blockSizeWritten();
int uncompressedBlockSize = baosInMemory.size();
if (uncompressedBlockSize >= fileContext.getBlocksize()) {
if (uncompressedBlockSize < maxSizeUnCompressed) {
int adjustedBlockSize = compressedSizePredicator.
calculateCompressionSizeLimit(fileContext, uncompressedBlockSize);
return uncompressedBlockSize >= adjustedBlockSize;
}
return true;
}
return false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also move this to Predicator and generalize this method

Suggested change
public boolean shouldFinishBlock() throws IOException {
// int uncompressedBlockSize = blockSizeWritten();
int uncompressedBlockSize = baosInMemory.size();
if (uncompressedBlockSize >= fileContext.getBlocksize()) {
if (uncompressedBlockSize < maxSizeUnCompressed) {
int adjustedBlockSize = compressedSizePredicator.
calculateCompressionSizeLimit(fileContext, uncompressedBlockSize);
return uncompressedBlockSize >= adjustedBlockSize;
}
return true;
}
return false;
}
public boolean checkBoundariesWithPredicate() throws IOException {
if(predicator==null){
throw new IllegalArgumentException("Expected at least the default BoundariesCheckPredicate");
}
return predicator.
shouldFinishBlock(fileContext, uncompressedBlockSize);
}

*/
@Override
public void updateLatestBlockSizes(int uncompressed, int compressed) {
compressionRatio = uncompressed/compressed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
compressionRatio = uncompressed/compressed;
compressionFactor = uncompressed/compressed;

Comment on lines 1266 to 1267
System.out.println(">>>> " + block.getUncompressedSizeWithoutHeader());
System.out.println(">>>> " + blockCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove System.out

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 50s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 23s master passed
+1 💚 compile 0m 34s master passed
+1 💚 shadedjars 4m 2s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 24s master passed
-0 ⚠️ patch 4m 32s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
-1 ❌ mvninstall 1m 9s root in the patch failed.
-1 ❌ compile 0m 34s hbase-server in the patch failed.
-0 ⚠️ javac 0m 34s hbase-server in the patch failed.
-1 ❌ shadedjars 3m 6s patch has 14 errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 22s the patch passed
_ Other Tests _
-1 ❌ unit 0m 35s hbase-server in the patch failed.
14m 42s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux 1621206a977c 5.4.0-1081-aws #88~18.04.1-Ubuntu SMP Thu Jun 23 16:29:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9215066
Default Java AdoptOpenJDK-1.8.0_282-b08
mvninstall https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-jdk8-hadoop3-check/output/patch-mvninstall-root.txt
compile https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-jdk8-hadoop3-check/output/patch-compile-hbase-server.txt
javac https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-jdk8-hadoop3-check/output/patch-compile-hbase-server.txt
shadedjars https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-jdk8-hadoop3-check/output/patch-shadedjars.txt
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/testReport/
Max. process+thread count 64 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 12s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 50s master passed
+1 💚 compile 0m 47s master passed
+1 💚 shadedjars 3m 51s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 27s master passed
-0 ⚠️ patch 4m 26s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
-1 ❌ mvninstall 1m 35s root in the patch failed.
-1 ❌ compile 0m 46s hbase-server in the patch failed.
-0 ⚠️ javac 0m 46s hbase-server in the patch failed.
-1 ❌ shadedjars 3m 7s patch has 14 errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 27s the patch passed
_ Other Tests _
-1 ❌ unit 0m 49s hbase-server in the patch failed.
16m 45s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux ad054304246c 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9215066
Default Java AdoptOpenJDK-11.0.10+9
mvninstall https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-jdk11-hadoop3-check/output/patch-mvninstall-root.txt
compile https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-jdk11-hadoop3-check/output/patch-compile-hbase-server.txt
javac https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-jdk11-hadoop3-check/output/patch-compile-hbase-server.txt
shadedjars https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-jdk11-hadoop3-check/output/patch-shadedjars.txt
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/testReport/
Max. process+thread count 78 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 59s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 2m 14s master passed
+1 💚 compile 2m 12s master passed
+1 💚 checkstyle 0m 30s master passed
+1 💚 spotless 0m 39s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 15s master passed
-0 ⚠️ patch 1m 21s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
-1 ❌ mvninstall 1m 14s root in the patch failed.
-1 ❌ compile 1m 15s hbase-server in the patch failed.
-0 ⚠️ javac 1m 15s hbase-server in the patch failed.
+1 💚 checkstyle 0m 29s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
-1 ❌ hadoopcheck 1m 21s The patch causes 14 errors with Hadoop v3.1.2.
-1 ❌ hadoopcheck 2m 43s The patch causes 14 errors with Hadoop v3.2.2.
-1 ❌ hadoopcheck 4m 6s The patch causes 14 errors with Hadoop v3.3.1.
+1 💚 spotless 0m 36s patch has no errors when running spotless:check.
-1 ❌ spotbugs 0m 27s hbase-server in the patch failed.
_ Other Tests _
+1 💚 asflicense 0m 8s The patch does not generate ASF License warnings.
17m 8s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4675
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 9ae751a1263a 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9215066
Default Java AdoptOpenJDK-1.8.0_282-b08
mvninstall https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-general-check/output/patch-mvninstall-root.txt
compile https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-general-check/output/patch-compile-hbase-server.txt
javac https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-general-check/output/patch-compile-hbase-server.txt
hadoopcheck https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-general-check/output/patch-javac-3.1.2.txt
hadoopcheck https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-general-check/output/patch-javac-3.2.2.txt
hadoopcheck https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-general-check/output/patch-javac-3.3.1.txt
spotbugs https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/artifact/yetus-general-check/output/patch-spotbugs-hbase-server.txt
Max. process+thread count 64 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/9/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 1s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 2m 15s master passed
+1 💚 compile 2m 13s master passed
+1 💚 checkstyle 0m 30s master passed
+1 💚 spotless 0m 38s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 16s master passed
-0 ⚠️ patch 1m 23s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 15s the patch passed
+1 💚 compile 2m 13s the patch passed
+1 💚 javac 2m 13s the patch passed
+1 💚 checkstyle 0m 31s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 11m 28s Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.
+1 💚 spotless 0m 39s patch has no errors when running spotless:check.
+1 💚 spotbugs 1m 25s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
31m 32s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/10/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4675
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 4bd6c637d1bd 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9215066
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 60 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/10/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 49s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 45s master passed
+1 💚 compile 0m 40s master passed
+1 💚 shadedjars 3m 56s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 24s master passed
-0 ⚠️ patch 4m 28s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 42s the patch passed
+1 💚 compile 0m 42s the patch passed
+1 💚 javac 0m 42s the patch passed
+1 💚 shadedjars 3m 54s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 23s the patch passed
_ Other Tests _
+1 💚 unit 190m 48s hbase-server in the patch passed.
208m 46s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/10/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux 65ac2d0d2d03 5.4.0-1081-aws #88~18.04.1-Ubuntu SMP Thu Jun 23 16:29:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9215066
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/10/testReport/
Max. process+thread count 2665 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/10/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 44s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 8s master passed
+1 💚 compile 0m 35s master passed
+1 💚 shadedjars 4m 0s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 21s master passed
-0 ⚠️ patch 4m 28s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 7s the patch passed
+1 💚 compile 0m 35s the patch passed
+1 💚 javac 0m 35s the patch passed
+1 💚 shadedjars 3m 59s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 21s the patch passed
_ Other Tests _
+1 💚 unit 202m 38s hbase-server in the patch passed.
218m 42s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/10/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #4675
Optional Tests javac javadoc unit shadedjars compile
uname Linux e3c337c64bd5 5.4.0-1081-aws #88~18.04.1-Ubuntu SMP Thu Jun 23 16:29:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9215066
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/10/testReport/
Max. process+thread count 2708 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4675/10/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@wchevreuil wchevreuil merged commit eaa47c5 into apache:master Aug 15, 2022
wchevreuil added a commit that referenced this pull request Aug 15, 2022
…locks during hfile writes (#4675)

Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
Signed-off-by: Ankit Singhal <ankit@apache.org>
vinayakphegde pushed a commit to vinayakphegde/hbase that referenced this pull request Apr 4, 2024
…locks during hfile writes (apache#4675)

Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
Signed-off-by: Ankit Singhal <ankit@apache.org>

Change-Id: I0d969c83f9bb6a2a8d24e60e6b76b412b659e2b0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants