HDDS-4873. Re-replication failure throws NPE#1986
Conversation
|
@jojochuang Can you review this PR for me?Thank you. |
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @sky76093016 for working on this.
We should not simply log "please retry one more time", as container replication is performed by datanodes automatically. It will be retried eventually when datanode receives the replication command from SCM again.
If download fails currently the code reaches the exception handler with NPE. The only problem with this is that it's noisy, and we could handle the condition more gracefully. I think if tempTarFile == null, it should simply set task status to failed, and skip the code in the try block.
|
what Attila said. Thanks for the review! |
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @sky76093016 for updating the patch.
...src/main/java/org/apache/hadoop/ozone/container/replication/DownloadAndImportReplicator.java
Outdated
Show resolved
Hide resolved
cku328
left a comment
There was a problem hiding this comment.
LGTM. Thanks @sky76093016 for working on this.
What changes were proposed in this pull request?
Increase the log that helps users to use.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4873
How was this patch tested?
No test.