Skip to content

Conversation

@ivandika3
Copy link
Contributor

@ivandika3 ivandika3 commented May 11, 2023

What changes were proposed in this pull request?

While setting up for OM performance test during Recon OM full snapshot, I removed the Recon DB directory before restarting Recon to trigger full snapshot (essentially bootstrapping a new Recon).

However, it is found after OM DB is successfully downloaded, during the reprocess of ContainerKeyMapperTask, the Recon heap usage increased significantly for large keys table (our cluster has around 350 million keys).

It is found that the issue was caused due to in-memory maps that store all the OM keys during the reprocess. This is a regression introduced in HDDS-6783. In essence, the patch is to revert the implementation of HDDS-6783 ONLY for ContainerKeyMapperTask#reprocess.

  • ContainerKeyMapperTask#process should not increase the heap memory significantly since the number of delta updates are already limited by the Recon configurations
  • HDDS-6783 aims for atomicity during the Recon OM task updates. However since ContainerKeyMapperTask#reprocess truncate all the Recon Container DB before it starts and rebuilt the Recon Container DB, I think it's acceptable.

After the patch is applied, the Recon heap size stays stable during the full snapshot.

Any suggestion for better approach is welcomed.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8580

How was this patch tested?

Manual test.

Attached is Recon heap memory before and after the patch.

image

@ivandika3
Copy link
Contributor Author

@dombizita @smengcl Could you help take a look?

Copy link
Contributor

@ashishkumar50 ashishkumar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ivandika3, Thanks for working on this. Please find my comments inline.

private boolean flushAndCommitContainerKeyMapToDB(
Map<ContainerKeyPrefix, Integer> containerKeyMap) {
try {
writeToTheDB(containerKeyMap, Collections.emptyMap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should flush the "containerKeyCountMap" also and optimize more memory footprint if we are introducing this flush mechanism, so you can refactor the code a bit and pass "containerKeyCountMap" instead of sending empty map in 2nd argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. I have included containerKeyCountMap into the flush, and removing the last writeToDB in the 'reprocess' function. I have also moved the flush function to an outer scope to prevent inaccurate container key info when there are multiple bucket layouts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivandika3 for fixing comments. Patch LGTM. +1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devmadhuu Thank you for the review.

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few comments, pls check

@smengcl smengcl requested a review from dombizita May 13, 2023 08:42
@ivandika3
Copy link
Contributor Author

CI failure does not seem to be related.

@ivandika3
Copy link
Contributor Author

Hi @dombizita could you help review this?

Copy link
Contributor

@dombizita dombizita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this @ivandika3, I only had one question about a method (to make sure that I understand it correctly) and one about a comment change, beside this it looks good to me!

@ivandika3 ivandika3 requested a review from dombizita May 19, 2023 01:11
@ferhui
Copy link
Contributor

ferhui commented May 23, 2023

@ashishkumar50 do you have other comments? Plan to merge this PR. Thanks

@dombizita
Copy link
Contributor

@ashishkumar50 do you have other comments? Plan to merge this PR. Thanks

I'm waiting to get a reply on this comment.

cc @devmadhuu @ivandika3

Copy link
Contributor

@ashishkumar50 ashishkumar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivandika3, Thanks for updating the patch, LGTM +1.

@dombizita
Copy link
Contributor

I just wanted to merge this PR but there is a merge conflict, can you resolve it @ivandika3?

@ivandika3
Copy link
Contributor Author

Hi @dombizita, I have resolved the conflict. Thank you.

@ivandika3 ivandika3 requested a review from dombizita May 31, 2023 01:30
@dombizita dombizita merged commit a2aa1d4 into apache:master Jun 3, 2023
@dombizita
Copy link
Contributor

thanks for the patch @ivandika3! thanks for the review @devmadhuu @ashishkumar50!

@ivandika3 ivandika3 deleted the HDDS-8580 branch June 3, 2023 09:39
@whbing
Copy link
Contributor

whbing commented Jun 4, 2023

mvn build failed in https://github.com/apache/ozone/actions/runs/5162988533, also failed in master branch now.
If it's related to this PR? Thanks !

Error:  /home/runner/work/ozone/ozone/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java:[472,49] 
org.apache.hadoop.ozone.recon.api.types.ContainerKeyPrefix is abstract; cannot be instantiated

@ivandika3
Copy link
Contributor Author

ivandika3 commented Jun 4, 2023

Hi @whbing, there is an incompatibility due to refactoring done on HDDS-8733. I have raised #4826 to include the fix for the incompatibility. @dombizita @szetszwo Could you help to reconcile the conflict? Sorry for the inconvenience.

@dombizita
Copy link
Contributor

thanks for finding this @whbing, #4826 is merged. thanks for the quick fix @ivandika3!

@ivandika3 ivandika3 self-assigned this Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants