Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-23601: OutputSink.WriterThread exception gets stuck and repeated indefinietly #956

Merged
merged 4 commits into from Jan 9, 2020

Conversation

BukrosSzabolcs
Copy link
Contributor

clear exception after logged
try to restart writer threads if needed

…d indefinietly

clear exception after logged
try to restart writer threads if needed
Copy link
Contributor

@wchevreuil wchevreuil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a test that reproduces the problem?

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-0 ⚠️ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ branch-2.2 Compile Tests _
+1 💚 mvninstall 5m 13s branch-2.2 passed
+1 💚 compile 0m 56s branch-2.2 passed
+1 💚 checkstyle 1m 18s branch-2.2 passed
+1 💚 shadedjars 4m 3s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 36s branch-2.2 passed
+0 🆗 spotbugs 3m 11s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 10s branch-2.2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 42s the patch passed
+1 💚 compile 0m 56s the patch passed
+1 💚 javac 0m 56s the patch passed
+1 💚 checkstyle 1m 18s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedjars 4m 4s patch has no errors when building our shaded downstream artifacts.
+1 💚 hadoopcheck 14m 57s Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2.
+1 💚 javadoc 0m 35s the patch passed
+1 💚 findbugs 3m 19s the patch passed
_ Other Tests _
+1 💚 unit 162m 5s hbase-server in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
214m 26s
Subsystem Report/Notes
Docker Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-956/1/artifact/out/Dockerfile
GITHUB PR #956
JIRA Issue HBASE-23601
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname Linux f429b7fb81ec 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-slave/workspace/HBase-PreCommit-GitHub-PR_PR-956/out/precommit/personality/provided.sh
git revision branch-2.2 / 5ec99b4
Default Java 1.8.0_181
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-956/1/testReport/
Max. process+thread count 4984 (vs. ulimit of 10000)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-956/1/console
versions git=2.11.0 maven=2018-06-17T18:33:14Z) findbugs=3.1.11
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@saintstack saintstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good find.Left a few notes.

for (WriterThread t : writerThreads) {
if (!t.isAlive()){
LOG.debug("restarting thread" + t);
t.start();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think you could restart a dead thread. You have to create a new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely. Thanks for pointing it out.


public Throwable clearError(){
return thrown.getAndSet(null);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we make the pipelinecontroller once and use it thereafter -- it goes down into the WriterThread.... . I was going to suggest just create a new Controller on each write but looks like too much of a refactor.

Would it be cleaner if checkForErrors cleared any errors found... ; i.e. checkForErrors throws but before it throws it clears any exception in Controller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just cleaner it would also cover more usecases. I moved that line into the checkForErrors.

Copy link
Contributor

@HorizonNet HorizonNet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this problem only occur in HBase 2.2.x? If not it would be better to target the master branch.

…d indefinietly

remove dead writer threads and create new ones reusing the old names
remove the stored exception before re-throwing it
add unit test
…d indefinietly

add missing changes from prev commit
@BukrosSzabolcs
Copy link
Contributor Author

@wchevreuil > Is it possible to add a test that reproduces the problem?
Added a small test. Please let me know if you had something else in mind.

@HorizonNet > Does this problem only occur in HBase 2.2.x? If not it would be better to target the master branch.
Duo re-wrote most of the region replication on master so it should only affect branch-2 and branch-2.2

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ branch-2.2 Compile Tests _
+1 💚 mvninstall 5m 15s branch-2.2 passed
+1 💚 compile 0m 55s branch-2.2 passed
+1 💚 checkstyle 1m 19s branch-2.2 passed
+1 💚 shadedjars 4m 1s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 36s branch-2.2 passed
+0 🆗 spotbugs 3m 9s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 7s branch-2.2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 46s the patch passed
+1 💚 compile 0m 55s the patch passed
+1 💚 javac 0m 55s the patch passed
-1 ❌ checkstyle 1m 17s hbase-server: The patch generated 19 new + 3 unchanged - 0 fixed = 22 total (was 3)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
-1 ❌ shadedjars 3m 15s patch has 10 errors when building our shaded downstream artifacts.
+1 💚 hadoopcheck 15m 0s Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2.
+1 💚 javadoc 0m 33s the patch passed
+1 💚 findbugs 3m 11s the patch passed
_ Other Tests _
+1 💚 unit 144m 16s hbase-server in the patch passed.
-1 ❌ asflicense 0m 30s The patch generated 1 ASF License warnings.
195m 48s
Subsystem Report/Notes
Docker Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-956/2/artifact/out/Dockerfile
GITHUB PR #956
JIRA Issue HBASE-23601
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname Linux 514341326d81 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-slave/workspace/HBase-PreCommit-GitHub-PR_PR-956/out/precommit/personality/provided.sh
git revision branch-2.2 / 8d22b7e
Default Java 1.8.0_181
checkstyle https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-956/2/artifact/out/diff-checkstyle-hbase-server.txt
shadedjars https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-956/2/artifact/out/patch-shadedjars.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-956/2/testReport/
asflicense https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-956/2/artifact/out/patch-asflicense-problems.txt
Max. process+thread count 4065 (vs. ulimit of 10000)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-956/2/console
versions git=2.11.0 maven=2018-06-17T18:33:14Z) findbugs=3.1.11
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@saintstack saintstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Few notes.

while(writerIterator.hasNext()){
WriterThread t = writerIterator.next();
if (!t.isAlive()){
names.add(t.getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easier to move through the writerThread array using an index replacing dead threads with new live ones as you go? Should log when you replace a thread I'd say at DEBUG level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Added it to the next commit.

this.entryBuffers = entryBuffers;
outputSink = sink;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have the constructor just above this call this one passing the constructed thread name? nit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

…d indefinietly

fix checkstyle validations
add license to test class
code improvements
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ branch-2.2 Compile Tests _
+1 💚 mvninstall 5m 15s branch-2.2 passed
+1 💚 compile 0m 54s branch-2.2 passed
+1 💚 checkstyle 1m 20s branch-2.2 passed
+1 💚 shadedjars 4m 9s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 35s branch-2.2 passed
+0 🆗 spotbugs 3m 3s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 2s branch-2.2 passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 46s the patch passed
+1 💚 compile 0m 57s the patch passed
+1 💚 javac 0m 57s the patch passed
+1 💚 checkstyle 1m 17s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedjars 4m 3s patch has no errors when building our shaded downstream artifacts.
+1 💚 hadoopcheck 14m 57s Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2.
+1 💚 javadoc 0m 33s the patch passed
+1 💚 findbugs 3m 23s the patch passed
_ Other Tests _
+1 💚 unit 151m 24s hbase-server in the patch passed.
+1 💚 asflicense 0m 30s The patch does not generate ASF License warnings.
204m 0s
Subsystem Report/Notes
Docker Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-956/3/artifact/out/Dockerfile
GITHUB PR #956
JIRA Issue HBASE-23601
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname Linux 882e4cb2ce67 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-slave/workspace/HBase-PreCommit-GitHub-PR_PR-956/out/precommit/personality/provided.sh
git revision branch-2.2 / aa07d0a
Default Java 1.8.0_181
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-956/3/testReport/
Max. process+thread count 4420 (vs. ulimit of 10000)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-956/3/console
versions git=2.11.0 maven=2018-06-17T18:33:14Z) findbugs=3.1.11
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@saintstack saintstack merged commit ce7c559 into apache:branch-2.2 Jan 9, 2020
asfgit pushed a commit that referenced this pull request Jan 9, 2020
…d indefinietly (#956)

* HBASE-23601: OutputSink.WriterThread exception gets stuck and repeated indefinietly

clear exception after logged
try to restart writer threads if needed
asfgit pushed a commit that referenced this pull request Jan 9, 2020
…d indefinietly (#956)

* HBASE-23601: OutputSink.WriterThread exception gets stuck and repeated indefinietly

clear exception after logged
try to restart writer threads if needed
asfgit pushed a commit that referenced this pull request Jan 9, 2020
asfgit pushed a commit that referenced this pull request Jan 9, 2020
infraio added a commit that referenced this pull request Jan 10, 2020
BukrosSzabolcs added a commit to BukrosSzabolcs/hbase that referenced this pull request Jan 13, 2020
…d indefinietly (apache#956)

* HBASE-23601: OutputSink.WriterThread exception gets stuck and repeated indefinietly

clear exception after logged
try to restart writer threads if needed
@BukrosSzabolcs
Copy link
Contributor Author

created a new PR for branch-2 #1028

symat pushed a commit to symat/hbase that referenced this pull request Feb 17, 2021
…d indefinietly (apache#956)

* HBASE-23601: OutputSink.WriterThread exception gets stuck and repeated indefinietly

clear exception after logged
try to restart writer threads if needed

(cherry picked from commit ce7c559)

Change-Id: I62ede7ebc02213357678e50b9b995b00dfac4a5d
symat pushed a commit to symat/hbase that referenced this pull request Feb 17, 2021
… repeated indefinietly (apache#956)"

This reverts commit ce7c559.

(cherry picked from commit fd6eb38)

Change-Id: I67ab28228bec55a17d2b8c01946be264723ad2a6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants