-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-8580. Reduce memory usage in ContainerKeyMapperTask#reprocess #4696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@dombizita @smengcl Could you help take a look? |
ashishkumar50
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ivandika3, Thanks for working on this. Please find my comments inline.
...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java
Show resolved
Hide resolved
...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java
Outdated
Show resolved
Hide resolved
...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java
Outdated
Show resolved
Hide resolved
...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java
Outdated
Show resolved
Hide resolved
| private boolean flushAndCommitContainerKeyMapToDB( | ||
| Map<ContainerKeyPrefix, Integer> containerKeyMap) { | ||
| try { | ||
| writeToTheDB(containerKeyMap, Collections.emptyMap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we should flush the "containerKeyCountMap" also and optimize more memory footprint if we are introducing this flush mechanism, so you can refactor the code a bit and pass "containerKeyCountMap" instead of sending empty map in 2nd argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. I have included containerKeyCountMap into the flush, and removing the last writeToDB in the 'reprocess' function. I have also moved the flush function to an outer scope to prevent inaccurate container key info when there are multiple bucket layouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ivandika3 for fixing comments. Patch LGTM. +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devmadhuu Thank you for the review.
devmadhuu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few comments, pls check
|
CI failure does not seem to be related. |
|
Hi @dombizita could you help review this? |
dombizita
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for working on this @ivandika3, I only had one question about a method (to make sure that I understand it correctly) and one about a comment change, beside this it looks good to me!
...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java
Outdated
Show resolved
Hide resolved
...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java
Outdated
Show resolved
Hide resolved
...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java
Show resolved
Hide resolved
|
@ashishkumar50 do you have other comments? Plan to merge this PR. Thanks |
I'm waiting to get a reply on this comment. |
ashishkumar50
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivandika3, Thanks for updating the patch, LGTM +1.
|
I just wanted to merge this PR but there is a merge conflict, can you resolve it @ivandika3? |
|
Hi @dombizita, I have resolved the conflict. Thank you. |
|
thanks for the patch @ivandika3! thanks for the review @devmadhuu @ashishkumar50! |
|
mvn build failed in https://github.com/apache/ozone/actions/runs/5162988533, also failed in master branch now. |
|
Hi @whbing, there is an incompatibility due to refactoring done on HDDS-8733. I have raised #4826 to include the fix for the incompatibility. @dombizita @szetszwo Could you help to reconcile the conflict? Sorry for the inconvenience. |
|
thanks for finding this @whbing, #4826 is merged. thanks for the quick fix @ivandika3! |
What changes were proposed in this pull request?
While setting up for OM performance test during Recon OM full snapshot, I removed the Recon DB directory before restarting Recon to trigger full snapshot (essentially bootstrapping a new Recon).
However, it is found after OM DB is successfully downloaded, during the reprocess of ContainerKeyMapperTask, the Recon heap usage increased significantly for large keys table (our cluster has around 350 million keys).
It is found that the issue was caused due to in-memory maps that store all the OM keys during the reprocess. This is a regression introduced in HDDS-6783. In essence, the patch is to revert the implementation of HDDS-6783 ONLY for ContainerKeyMapperTask#reprocess.
After the patch is applied, the Recon heap size stays stable during the full snapshot.
Any suggestion for better approach is welcomed.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8580
How was this patch tested?
Manual test.
Attached is Recon heap memory before and after the patch.