New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-3117. Recon throws InterruptedException while getting new snapshot from OM. #648
Conversation
@swagle / @vivekratnavel Please review. |
Acceptance test failure is related to this change:
Filesize count is spread across many buckets because of the previous tests. We might need to change the filesize while running freon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @avijayanhwx for working on this. Can you please check if the acceptance test delay can be improved?
...on/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/OzoneManagerServiceProviderImpl.java
Show resolved
Hide resolved
...on/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/OzoneManagerServiceProviderImpl.java
Show resolved
Hide resolved
Thanks @avijayanhwx for the fix, @swagle and @vivekratnavel for the reviews. |
What changes were proposed in this pull request?
On a cluster where an ozone client is continuously pushing data into OM, we can have cases where the OM Delta updates request timing out or failing due to some unexpected error. In that case, the expected behavior of Recon is to request the whole snapshot from OM. The bug was that we were interrupting the thread on exception from delta updates query which caused the sync process to stop. Removed the interrupt call.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-3117
How was this patch tested?
Added acceptance tests for Recon's OM related APIs.
Manually tested by creating 50 million keys on OM, and verified that Recon OM DB sync works as expected.