Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-1603. Handle Ratis Append Failure in Container State Machine. Contributed by Supratim Deka #1019

Merged
merged 2 commits into from Jul 10, 2019

Conversation

supratimdeka
Copy link
Contributor

https://issues.apache.org/jira/browse/HDDS-1603

The scope of this jira is to build on https://issues.apache.org/jira/browse/RATIS-573
and define the handling for Ratis log append failure in Ozone Container State Machine.

  1. Enqueue pipeline unhealthy action to SCM, add a reason code to the message.
  2. Trigger immediate heartbeat to SCM

Ratis-573 is not available in trunk. So this patch starts with an entry point in XceiverServerRatis which will be hooked up to notifyLogFailed() callback defined in StateMachine as part of RATIS-573.

Notify Ratis volume unhealthy to the Datanode is not implemented in this patch

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 32 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 27 Maven dependency ordering for branch
+1 mvninstall 476 trunk passed
+1 compile 260 trunk passed
+1 checkstyle 74 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 884 branch has no errors when building and testing our client artifacts.
+1 javadoc 166 trunk passed
0 spotbugs 313 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 504 trunk passed
_ Patch Compile Tests _
0 mvndep 34 Maven dependency ordering for patch
+1 mvninstall 442 the patch passed
+1 compile 265 the patch passed
+1 cc 265 the patch passed
+1 javac 265 the patch passed
-0 checkstyle 38 hadoop-hdds: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 683 patch has no errors when building and testing our client artifacts.
+1 javadoc 162 the patch passed
+1 findbugs 517 the patch passed
_ Other Tests _
+1 unit 258 hadoop-hdds in the patch passed.
-1 unit 1371 hadoop-ozone in the patch failed.
+1 asflicense 51 The patch does not generate ASF License warnings.
6452
Reason Tests
Failed junit tests hadoop.ozone.client.rpc.TestOzoneAtRestEncryption
hadoop.ozone.client.rpc.TestFailureHandlingByClient
hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis
hadoop.ozone.client.rpc.TestOzoneRpcClient
hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException
hadoop.ozone.client.rpc.TestSecureOzoneRpcClient
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/1/artifact/out/Dockerfile
GITHUB PR #1019
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc
uname Linux d1a7aea63de8 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 062eb60
Default Java 1.8.0_212
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/1/artifact/out/diff-checkstyle-hadoop-hdds.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/1/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/1/testReport/
Max. process+thread count 5018 (vs. ulimit of 5500)
modules C: hadoop-hdds/container-service hadoop-ozone/integration-test U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/1/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 34 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 68 Maven dependency ordering for branch
+1 mvninstall 481 trunk passed
+1 compile 248 trunk passed
+1 checkstyle 61 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 790 branch has no errors when building and testing our client artifacts.
+1 javadoc 156 trunk passed
0 spotbugs 312 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 501 trunk passed
_ Patch Compile Tests _
0 mvndep 29 Maven dependency ordering for patch
+1 mvninstall 424 the patch passed
+1 compile 255 the patch passed
+1 cc 255 the patch passed
+1 javac 255 the patch passed
-0 checkstyle 37 hadoop-hdds: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 674 patch has no errors when building and testing our client artifacts.
+1 javadoc 156 the patch passed
+1 findbugs 508 the patch passed
_ Other Tests _
+1 unit 237 hadoop-hdds in the patch passed.
-1 unit 1635 hadoop-ozone in the patch failed.
+1 asflicense 41 The patch does not generate ASF License warnings.
6536
Reason Tests
Failed junit tests hadoop.ozone.client.rpc.TestReadRetries
hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis
hadoop.ozone.client.rpc.TestBlockOutputStream
hadoop.ozone.client.rpc.TestWatchForCommit
hadoop.ozone.client.rpc.TestOzoneRpcClient
hadoop.ozone.TestMiniOzoneCluster
hadoop.ozone.client.rpc.TestOzoneAtRestEncryption
hadoop.ozone.om.TestOzoneManagerHA
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/2/artifact/out/Dockerfile
GITHUB PR #1019
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc
uname Linux 4cc2bbd8e0ff 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / e966edd
Default Java 1.8.0_212
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/2/artifact/out/diff-checkstyle-hadoop-hdds.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/2/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/2/testReport/
Max. process+thread count 4724 (vs. ulimit of 5500)
modules C: hadoop-hdds/container-service hadoop-ozone/integration-test U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/2/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 37 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 78 Maven dependency ordering for branch
+1 mvninstall 505 trunk passed
+1 compile 252 trunk passed
+1 checkstyle 63 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 833 branch has no errors when building and testing our client artifacts.
+1 javadoc 155 trunk passed
0 spotbugs 320 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 511 trunk passed
_ Patch Compile Tests _
0 mvndep 30 Maven dependency ordering for patch
+1 mvninstall 435 the patch passed
+1 compile 270 the patch passed
+1 cc 270 the patch passed
+1 javac 270 the patch passed
+1 checkstyle 62 the patch passed
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 607 patch has no errors when building and testing our client artifacts.
+1 javadoc 140 the patch passed
+1 findbugs 500 the patch passed
_ Other Tests _
+1 unit 233 hadoop-hdds in the patch passed.
-1 unit 1398 hadoop-ozone in the patch failed.
+1 asflicense 39 The patch does not generate ASF License warnings.
6303
Reason Tests
Failed junit tests hadoop.ozone.client.rpc.TestWatchForCommit
hadoop.ozone.client.rpc.TestOzoneRpcClient
hadoop.ozone.client.rpc.TestReadRetries
hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis
hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory
hadoop.ozone.client.rpc.TestSecureOzoneRpcClient
hadoop.ozone.om.TestScmSafeMode
hadoop.ozone.client.rpc.TestContainerStateMachine
hadoop.ozone.client.rpc.TestOzoneAtRestEncryption
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/3/artifact/out/Dockerfile
GITHUB PR #1019
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc
uname Linux eead6f9ba932 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 34747c3
Default Java 1.8.0_212
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/3/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/3/testReport/
Max. process+thread count 5367 (vs. ulimit of 5500)
modules C: hadoop-hdds/container-service hadoop-ozone/integration-test U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/3/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@mukul1987 mukul1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks good to me.

@@ -545,18 +545,28 @@ private void handlePipelineFailure(RaftGroupId groupId,
+ roleInfoProto.getRole());
}

triggerPipelineClose(groupId, msg,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have 2 Reasons, a) candidate failed, b) leader failed

try {
pipelineManager.getPipeline(openPipeline.getId());
} catch (PipelineNotFoundException e) {
Assert.assertTrue("pipeline should exist", false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Junit, the test will exit if an uncaught exception is thrown, so this might not be needed.

Copy link
Contributor

@arp7 arp7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, will hold off submitting since @mukul1987 has couple of questions.

@mukul1987
Copy link
Contributor

+1, from me as well. Lets create followup jira for the review comments.

@mukul1987 mukul1987 closed this Jul 10, 2019
@mukul1987 mukul1987 reopened this Jul 10, 2019
@mukul1987 mukul1987 merged commit ac7a8ac into apache:trunk Jul 10, 2019
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 0 Docker mode activated.
-1 patch 12 #1019 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.
Subsystem Report/Notes
GITHUB PR #1019
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1019/4/console
versions git=2.7.4
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

bshashikant pushed a commit to bshashikant/hadoop that referenced this pull request Jul 10, 2019
shanthoosh pushed a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
* Adding parameterized tests for legacy-offset file

* Adding test for checking precedence between offset files
amahussein pushed a commit to amahussein/hadoop that referenced this pull request Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants