Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-25929 RegionServer JVM crash when compaction #3318

Merged
merged 1 commit into from
Jun 3, 2021

Conversation

mymeiyi
Copy link
Contributor

@mymeiyi mymeiyi commented May 27, 2021

No description provided.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 4m 7s master passed
+1 💚 compile 3m 21s master passed
+1 💚 checkstyle 1m 5s master passed
+1 💚 spotbugs 2m 7s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 40s the patch passed
+1 💚 compile 3m 14s the patch passed
+1 💚 javac 3m 14s the patch passed
+1 💚 checkstyle 1m 1s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 20m 49s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 2m 49s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 15s The patch does not generate ASF License warnings.
53m 25s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3318
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux a07eb9aebbed 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a22e418
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/1/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache9
Copy link
Contributor

Apache9 commented May 27, 2021

Mind explaing a bit about the fix?

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 31s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 7s master passed
+1 💚 compile 1m 3s master passed
+1 💚 shadedjars 8m 11s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 40s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 47s the patch passed
+1 💚 compile 1m 1s the patch passed
+1 💚 javac 1m 1s the patch passed
+1 💚 shadedjars 8m 9s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 37s the patch passed
_ Other Tests _
-1 ❌ unit 153m 56s hbase-server in the patch failed.
185m 12s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3318
Optional Tests javac javadoc unit shadedjars compile
uname Linux 8ab55bf9860a 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a22e418
Default Java AdoptOpenJDK-1.8.0_282-b08
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/1/testReport/
Max. process+thread count 3869 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor Author

@mymeiyi mymeiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind explaing a bit about the fix?

Sure, it is not easy to explain. Let me draw a simple picture to show how this error could happen.

// may clear prevBlocks list.
kvs.shipped();
bytesWrittenProgressForShippedCall = 0;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this out of "for loop", because the "kvs.shipped()" will release the pre blocks in HFileReader, but some cells may have not been shipped by 'writer.append(c)' in line 441.

import org.junit.rules.TestName;

@Category(LargeTests.class)
public class TestCompactionWithByteBuff {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test guarantee that early release of BB happens and so possibly hit that issue? I dont think so. May be am missing something!

conf.setInt(ByteBuffAllocator.BUFFER_SIZE_KEY, 1024 * 5);
conf.setInt(CompactSplit.SMALL_COMPACTION_THREADS, REGION_COUNT * 2);
conf.setInt(CompactSplit.LARGE_COMPACTION_THREADS, REGION_COUNT * 2);
conf.set(HConstants.BUCKET_CACHE_IOENGINE_KEY, "offheap");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually Bucket Cache is coming into pic here? I think No. We write some data and flush. 2 files are created. And then compact that. What we wanted is that the compaction is reading the file blocks from BC. But cache data on write is by default false. (hbase.rs.cacheblocksonwrite).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not related to BC.
Set to offheap to make sure the blocks are read to OffheapByteBuffer, see HFileReaderImpl#shouldUseHeap.

@@ -451,25 +451,25 @@ protected boolean performCompaction(FileDetails fd, InternalScanner scanner, Cel
progress.cancel();
return false;
}
if (kvs != null && bytesWrittenProgressForShippedCall > shippedCallSizeLimit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : Seems like only format change. Can u avoid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not format change? Moved this code out of the for loop.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 14m 33s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 6m 54s master passed
+1 💚 compile 1m 51s master passed
+1 💚 shadedjars 11m 35s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 1m 2s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 6m 36s the patch passed
+1 💚 compile 1m 40s the patch passed
+1 💚 javac 1m 40s the patch passed
+1 💚 shadedjars 9m 10s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 41s the patch passed
_ Other Tests _
-1 ❌ unit 205m 28s hbase-server in the patch failed.
261m 41s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3318
Optional Tests javac javadoc unit shadedjars compile
uname Linux 4e42f9079134 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a22e418
Default Java AdoptOpenJDK-11.0.10+9
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/1/testReport/
Max. process+thread count 2686 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@mymeiyi
Copy link
Contributor Author

mymeiyi commented May 27, 2021

I write a case how this error could happen in the doc: https://docs.google.com/document/d/1_3HXgOSGHsHFqLiOWUsE3m6pHKjebSwrRObcPTqkSTE/edit?usp=sharing, please have a look, thanks @Apache9 @anoopsjohn

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 9s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 5m 5s master passed
+1 💚 compile 1m 17s master passed
+1 💚 shadedjars 9m 6s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 43s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 44s the patch passed
+1 💚 compile 1m 18s the patch passed
+1 💚 javac 1m 18s the patch passed
+1 💚 shadedjars 8m 58s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 41s the patch passed
_ Other Tests _
-1 ❌ unit 12m 3s hbase-server in the patch failed.
46m 25s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3318
Optional Tests javac javadoc unit shadedjars compile
uname Linux 6227e03e4dcc 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 63141bf
Default Java AdoptOpenJDK-11.0.10+9
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/2/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/2/testReport/
Max. process+thread count 646 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 34s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 5m 0s master passed
+1 💚 compile 1m 12s master passed
+1 💚 shadedjars 10m 29s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 48s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 5m 9s the patch passed
+1 💚 compile 1m 14s the patch passed
+1 💚 javac 1m 14s the patch passed
+1 💚 shadedjars 9m 46s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 42s the patch passed
_ Other Tests _
-1 ❌ unit 13m 10s hbase-server in the patch failed.
50m 29s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3318
Optional Tests javac javadoc unit shadedjars compile
uname Linux 593c58ccff3a 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 63141bf
Default Java AdoptOpenJDK-1.8.0_282-b08
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/2/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/2/testReport/
Max. process+thread count 650 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 38s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 4m 29s master passed
+1 💚 compile 3m 39s master passed
+1 💚 checkstyle 1m 13s master passed
+1 💚 spotbugs 2m 18s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 16s the patch passed
+1 💚 compile 3m 28s the patch passed
+1 💚 javac 3m 28s the patch passed
+1 💚 checkstyle 1m 13s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 20m 36s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 2m 37s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 15s The patch does not generate ASF License warnings.
53m 48s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3318
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 33bf26907363 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 63141bf
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/2/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 27s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 4m 7s master passed
+1 💚 compile 3m 36s master passed
+1 💚 checkstyle 1m 9s master passed
+1 💚 spotbugs 2m 9s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 54s the patch passed
+1 💚 compile 3m 34s the patch passed
+1 💚 javac 3m 34s the patch passed
+1 💚 checkstyle 1m 9s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 20m 21s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 2m 17s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
52m 7s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3318
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 6011e13b8a6c 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 63141bf
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/3/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 44s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 33s master passed
+1 💚 compile 1m 14s master passed
+1 💚 shadedjars 10m 4s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 41s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 29s the patch passed
+1 💚 compile 1m 18s the patch passed
+1 💚 javac 1m 18s the patch passed
+1 💚 shadedjars 9m 55s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 40s the patch passed
_ Other Tests _
-1 ❌ unit 215m 28s hbase-server in the patch failed.
252m 9s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3318
Optional Tests javac javadoc unit shadedjars compile
uname Linux 6a3daed19729 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 63141bf
Default Java AdoptOpenJDK-1.8.0_282-b08
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/3/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/3/testReport/
Max. process+thread count 2813 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/3/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 24s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 5m 11s master passed
+1 💚 compile 1m 21s master passed
+1 💚 shadedjars 9m 13s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 46s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 5m 8s the patch passed
+1 💚 compile 1m 20s the patch passed
+1 💚 javac 1m 20s the patch passed
+1 💚 shadedjars 9m 18s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 46s the patch passed
_ Other Tests _
-1 ❌ unit 240m 57s hbase-server in the patch failed.
277m 23s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3318
Optional Tests javac javadoc unit shadedjars compile
uname Linux 8f0dee8df4f1 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 63141bf
Default Java AdoptOpenJDK-11.0.10+9
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/3/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/3/testReport/
Max. process+thread count 2712 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/3/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@anoopsjohn
Copy link
Contributor

I write a case how this error could happen in the doc: https://docs.google.com/document/d/1_3HXgOSGHsHFqLiOWUsE3m6pHKjebSwrRObcPTqkSTE/edit?usp=sharing, please have a look, thanks @Apache9 @anoopsjohn

The fix is simple, move the shipped method out of the for loop

Ya the issue is very clear for me. Actually in code, we clone the cells for which we keep the ref. Seems missed in this place.. BTW in ur 1st patch that clone of Cell was there but removed now? And instead did this move of shipped out of that for loop?

Copy link
Contributor Author

@mymeiyi mymeiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW in ur 1st patch that clone of Cell was there but removed now? And instead did this move of shipped out of that for loop?

Thanks for your reply.
In the 1st patch, I modified two places:

  1. move the ship method to avoid RS crash in StoreFileWriter.append method;
  2. copy the first cell in block to avoid RS crash in HFileWriterImpl.getMidpoin method.

But I found out that, with fix1, the first cell must be copied by ((ShipperListener) writer).beforeShipped() then call kvs.shipped() to release read blocks, the problem2 can not happen anymore, so I remove fix2.

@@ -451,25 +451,25 @@ protected boolean performCompaction(FileDetails fd, InternalScanner scanner, Cel
progress.cancel();
return false;
}
if (kvs != null && bytesWrittenProgressForShippedCall > shippedCallSizeLimit) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not format change? Moved this code out of the for loop.

conf.setInt(ByteBuffAllocator.BUFFER_SIZE_KEY, 1024 * 5);
conf.setInt(CompactSplit.SMALL_COMPACTION_THREADS, REGION_COUNT * 2);
conf.setInt(CompactSplit.LARGE_COMPACTION_THREADS, REGION_COUNT * 2);
conf.set(HConstants.BUCKET_CACHE_IOENGINE_KEY, "offheap");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not related to BC.
Set to offheap to make sure the blocks are read to OffheapByteBuffer, see HFileReaderImpl#shouldUseHeap.

@mymeiyi
Copy link
Contributor Author

mymeiyi commented Jun 1, 2021

Is there any other problems about this issue? @Apache9 @anoopsjohn

@anoopsjohn
Copy link
Contributor

So the latest version change is moving the check for possible shipped() called out of the for loop. The cons here is that when we have wide rows with so many cells in it, this will delay the process of release of blocks and so release from cache. That is why it was kept in for loop.
But the bug is also a big concern.
Now seeing why and when we use this 'lastCleanCell'. This is to reset the seqId on a cell. We set that to be 0 while we pass the cell to the writer (the output writer for the new compacted file(s)). But we try reset that value on the original cell (See details in HBASE-16931).
PrivateCellUtil.setSequenceId(lastCleanCell, lastCleanCellSeqId) is being done inside the for loop before doing shipped call as well as outside the for loop. If the call is happening only within the for() loop, there is no chance of the corruption. But now it can so happen that the call can happen on SAME cell object inside and outside of the loop.
So if we keep the code as is (not moving the shipped call out of for loop) and do like
if (kvs != null && bytesWrittenProgressForShippedCall > shippedCallSizeLimit) {
if (lastCleanCell != null) {
// HBASE-16931, set back sequence id to avoid affecting scan order unexpectedly.
// ShipperListener will do a clone of the last cells it refer, so need to set back
// sequence id before ShipperListener.beforeShipped
PrivateCellUtil.setSequenceId(lastCleanCell, lastCleanCellSeqId);
lastCleanCell = null; // The reset of the seqId for this cell object happened already. Just nullify it
}
We wont get to any issue. Correct? Seems that will be the best way?

@mymeiyi
Copy link
Contributor Author

mymeiyi commented Jun 3, 2021

So the latest version change is moving the check for possible shipped() called out of the for loop. The cons here is that when we have wide rows with so many cells in it, this will delay the process of release of blocks and so release from cache. That is why it was kept in for loop.

Yes, the block release may be delayed. There is a configuration named "hbase.hstore.compaction.kv.max" and default value is 10 to limit scan cell count in compaction. But if a cell size is huge, we can not avoid the delay.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 44s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 4m 4s master passed
+1 💚 compile 3m 20s master passed
+1 💚 checkstyle 1m 3s master passed
+1 💚 spotbugs 2m 8s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 39s the patch passed
+1 💚 compile 3m 8s the patch passed
+1 💚 javac 3m 8s the patch passed
+1 💚 checkstyle 1m 3s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 20m 42s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 2m 51s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 17s The patch does not generate ASF License warnings.
53m 21s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3318
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 41ba0197825e 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 335305e
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/4/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 40s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 3s master passed
+1 💚 compile 1m 4s master passed
+1 💚 shadedjars 8m 22s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 40s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 43s the patch passed
+1 💚 compile 1m 1s the patch passed
+1 💚 javac 1m 1s the patch passed
+1 💚 shadedjars 8m 15s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 37s the patch passed
_ Other Tests _
+1 💚 unit 152m 14s hbase-server in the patch passed.
183m 48s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/4/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3318
Optional Tests javac javadoc unit shadedjars compile
uname Linux 12cbad7d7ee3 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 335305e
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/4/testReport/
Max. process+thread count 4636 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/4/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache9
Copy link
Contributor

Apache9 commented Jun 3, 2021

Hi, Anoop, after reviewin the code, I do not think the problem is lastCleanCell.

The problem here is that, after calling scanner.next(cells, scannerContext), we then write these cells out to writer. These cells will reference HFileBlock inside the kvs, and after calling kvs.shipped(), all these cells will be invalid because we will release all the HFileBlock which are referenced by them.

So here we can not call kvs.shipped inside the loop. The writer.beforeShipped call will only clone the cells which have been written to the writer but haven't been flushed out by the writer. But kvs.shipped will also release the HFileBlock for the cells which haven't been written to the writer yet, so calling writer.beforeShipped does not help here, in the next loop round we will write an invalid cell to writer and crash the regionserver.

@mymeiyi Do I understand correctly?

Thanks.

@anoopsjohn
Copy link
Contributor

Thanks Duo. Ya I got it.. Was looking around that lastCleanCell ref as that was initially been referred. I got it very clear from @mymeiyi reply. Ya change looks good.

@mymeiyi
Copy link
Contributor Author

mymeiyi commented Jun 3, 2021

Hi, Anoop, after reviewin the code, I do not think the problem is lastCleanCell.

The problem here is that, after calling scanner.next(cells, scannerContext), we then write these cells out to writer. These cells will reference HFileBlock inside the kvs, and after calling kvs.shipped(), all these cells will be invalid because we will release all the HFileBlock which are referenced by them.

So here we can not call kvs.shipped inside the loop. The writer.beforeShipped call will only clone the cells which have been written to the writer but haven't been flushed out by the writer. But kvs.shipped will also release the HFileBlock for the cells which haven't been written to the writer yet, so calling writer.beforeShipped does not help here, in the next loop round we will write an invalid cell to writer and crash the regionserver.

@mymeiyi Do I understand correctly?

Thanks.

Yes, this explanation is very clear.

Copy link
Contributor

@Apache9 Apache9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, nice catch!

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 45s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 43s master passed
+1 💚 compile 3m 17s master passed
+1 💚 checkstyle 1m 1s master passed
+1 💚 spotbugs 2m 4s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 43s the patch passed
+1 💚 compile 3m 19s the patch passed
+1 💚 javac 3m 19s the patch passed
+1 💚 checkstyle 1m 2s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 20m 32s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 2m 46s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
52m 53s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3318
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 8093e724d6e6 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 426c3c1
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 96 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/5/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@mymeiyi mymeiyi merged commit 4671cb1 into apache:master Jun 3, 2021
mymeiyi added a commit that referenced this pull request Jun 3, 2021
Signed-off-by: Duo Zhang <zhangduo@apache.org>
mymeiyi added a commit that referenced this pull request Jun 3, 2021
Signed-off-by: Duo Zhang <zhangduo@apache.org>
mymeiyi added a commit that referenced this pull request Jun 3, 2021
Signed-off-by: Duo Zhang <zhangduo@apache.org>
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 43s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 46s master passed
+1 💚 compile 1m 2s master passed
+1 💚 shadedjars 8m 15s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 37s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 55s the patch passed
+1 💚 compile 0m 59s the patch passed
+1 💚 javac 0m 59s the patch passed
+1 💚 shadedjars 8m 17s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s the patch passed
_ Other Tests _
+1 💚 unit 151m 17s hbase-server in the patch passed.
182m 47s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/5/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3318
Optional Tests javac javadoc unit shadedjars compile
uname Linux 3f0ca3f7c040 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 426c3c1
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/5/testReport/
Max. process+thread count 4256 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/5/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 26s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 5m 39s master passed
+1 💚 compile 1m 40s master passed
+1 💚 shadedjars 10m 37s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 49s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 5m 42s the patch passed
+1 💚 compile 1m 34s the patch passed
+1 💚 javac 1m 34s the patch passed
+1 💚 shadedjars 9m 46s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 46s the patch passed
_ Other Tests _
+1 💚 unit 228m 42s hbase-server in the patch passed.
268m 40s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/5/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3318
Optional Tests javac javadoc unit shadedjars compile
uname Linux 93428b748912 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 426c3c1
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/5/testReport/
Max. process+thread count 3006 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3318/5/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants