HDDS-9635. Trying to close a closed container from CLI results in indefinite retries.#5710
HDDS-9635. Trying to close a closed container from CLI results in indefinite retries.#5710adoroszlai merged 6 commits intoapache:masterfrom
Conversation
…efinite retries (cherry picked from commit bfafdfc98191054873f11ff94239bf1a2852e1a2)
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @sadanand48 for the patch.
hadoop-hdds/interface-admin/src/main/proto/ScmAdminProtocol.proto
Outdated
Show resolved
Hide resolved
...pache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java
Outdated
Show resolved
Hide resolved
|
Thanks @sadanand48 for updating the patch. Now that I've tested it, I noticed that while the container is closing, we may still get some ugly exceptions a few times, before finally getting I also noticed that after I would like to suggest considering the following enhancements if possible:
What do you think? |
|
Thanks, it makes sense, will update the patch. |
xBis7
left a comment
There was a problem hiding this comment.
@sadanand48 Thanks for the patch.
How does the client know when to retry and when not to? Does it have to do with the exception type or a non-0 exit code?
Will it be better to return the response proto back to the client instead of returning void and throwing an exception?
|
Command exits now bash-4.2$ ozone admin container close 1
Unable to close container
java.io.IOException: Container 1 is in closing state
at org.apache.hadoop.hdds.scm.protocolPB.StorageContainerLocationProtocolClientSideTranslatorPB.closeContainer(StorageContainerLocationProtocolClientSideTranslatorPB.java:592)
at org.apache.hadoop.hdds.scm.cli.ContainerOperationClient.closeContainer(ContainerOperationClient.java:423)
at org.apache.hadoop.hdds.scm.cli.container.CloseSubcommand.execute(CloseSubcommand.java:53)
at org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:39)
at org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:29)
at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
at picocli.CommandLine.execute(CommandLine.java:2078)
at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
at org.apache.hadoop.hdds.cli.OzoneAdmin.lambda$execute$0(OzoneAdmin.java:92)
at org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
at org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
at org.apache.hadoop.hdds.cli.OzoneAdmin.execute(OzoneAdmin.java:91)
at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
at org.apache.hadoop.hdds.cli.OzoneAdmin.main(OzoneAdmin.java:84)
bash-4.2$ ozone admin container close 1
Unable to close container
java.io.IOException: Container 1 already closed
at org.apache.hadoop.hdds.scm.protocolPB.StorageContainerLocationProtocolClientSideTranslatorPB.closeContainer(StorageContainerLocationProtocolClientSideTranslatorPB.java:592)
at org.apache.hadoop.hdds.scm.cli.ContainerOperationClient.closeContainer(ContainerOperationClient.java:423)
at org.apache.hadoop.hdds.scm.cli.container.CloseSubcommand.execute(CloseSubcommand.java:53)
at org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:39)
at org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:29)
at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
at picocli.CommandLine.execute(CommandLine.java:2078)
at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
at org.apache.hadoop.hdds.cli.OzoneAdmin.lambda$execute$0(OzoneAdmin.java:92)
at org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
at org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
at org.apache.hadoop.hdds.cli.OzoneAdmin.execute(OzoneAdmin.java:91)
at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
at org.apache.hadoop.hdds.cli.OzoneAdmin.main(OzoneAdmin.java:84) |
Thanks @xBis7 , yes the current patch returns the exception state (CLOSED/CLOSING) in the response proto and lets client know to not retry. In regular case, the retry happens whenever there is an exception. |
@sadanand48, I think we should print these stack traces only in verbose mode |
Thanks @adoroszlai , updated. bash-4.2$ ozone admin container close 1
Unable to close container : Container 1 already closed
bash-4.2$ ozone admin --verbose container close 1
Unable to close container
java.io.IOException: Container 1 already closed
at org.apache.hadoop.hdds.scm.protocolPB.StorageContainerLocationProtocolClientSideTranslatorPB.closeContainer(StorageContainerLocationProtocolClientSideTranslatorPB.java:592)
at org.apache.hadoop.hdds.scm.cli.ContainerOperationClient.closeContainer(ContainerOperationClient.java:423)
at org.apache.hadoop.hdds.scm.cli.container.CloseSubcommand.execute(CloseSubcommand.java:58)
at org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:39)
at org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:29)
at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
at picocli.CommandLine.execute(CommandLine.java:2078)
at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
at org.apache.hadoop.hdds.cli.OzoneAdmin.lambda$execute$0(OzoneAdmin.java:92)
at org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
at org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
at org.apache.hadoop.hdds.cli.OzoneAdmin.execute(OzoneAdmin.java:91)
at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
at org.apache.hadoop.hdds.cli.OzoneAdmin.main(OzoneAdmin.java:84) |
xBis7
left a comment
There was a problem hiding this comment.
@sadanand48 I've tested it locally. LGTM!
|
Thanks @sadanand48 for updating the patch, LGTM. |
|
Thanks @sadanand48 for the patch, @xBis7 for the review. |
What changes were proposed in this pull request?
SCM should let client know if container is closed when it tries to close a container instead of retrying.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9635
How was this patch tested?
Unit tests