Skip to content

HDDS-7279. Snapshot Create requires Double Buffer Flush thread to split the commit batch.#3958

Merged
smengcl merged 11 commits intoapache:HDDS-6517-Snapshotfrom
hemantk-12:HDDS-7279
Nov 28, 2022
Merged

HDDS-7279. Snapshot Create requires Double Buffer Flush thread to split the commit batch.#3958
smengcl merged 11 commits intoapache:HDDS-6517-Snapshotfrom
hemantk-12:HDDS-7279

Conversation

@hemantk-12
Copy link
Contributor

@hemantk-12 hemantk-12 commented Nov 14, 2022

What changes were proposed in this pull request?

The OmRequest double buffer flush thread flushes the entire buffer as a batch. Since follower OM's will flush batches with different contents, snapshots can't stay consistent between the leader and the followers.
This change is to flush double buffer snapshot aware and flush all the transactions before snapshot create request.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7279

How was this patch tested?

  • Added unit test.
  • Tested it by preventing the flush thread from waking up when a key commit request occurs as described by George in jira

@smengcl
Copy link
Contributor

smengcl commented Nov 14, 2022

Hi @GeorgeJahad , if you are interested you can also take a look at this draft as we discussed earlier.

@hemantk-12 hemantk-12 changed the title HDDS-7279. Snapshot Create requires Double Buffer Flush thread to spl… HDDS-7279. Snapshot Create requires Double Buffer Flush thread to split the commit batch. Nov 16, 2022
@hemantk-12 hemantk-12 marked this pull request as ready for review November 16, 2022 21:02
@kerneltime kerneltime added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Nov 21, 2022
@smengcl
Copy link
Contributor

smengcl commented Nov 21, 2022

Looks like there are a bunch of failures in the newly added test:

https://github.com/apache/ozone/actions/runs/3510796319/jobs/5880950816

Error:  Failures: 
Error:    TestOzoneManagerDoubleBuffer.testOzoneManagerDoubleBuffer:201 expected: <1> but was: <2>
Error:    TestOzoneManagerDoubleBuffer.testOzoneManagerDoubleBuffer:208 expected: <3> but was: <4>
Error:    TestOzoneManagerDoubleBuffer.testOzoneManagerDoubleBuffer:208 expected: <6> but was: <7>
Error:    TestOzoneManagerDoubleBuffer.testOzoneManagerDoubleBuffer:208 expected: <10> but was: <11>
Error:    TestOzoneManagerDoubleBuffer.testOzoneManagerDoubleBuffer:208 expected: <14> but was: <15>
[INFO] 
Error:  Tests run: 628, Failures: 5, Errors: 0, Skipped: 1

@hemantk-12
Copy link
Contributor Author

Looks like there are a bunch of failures in the newly added test:

https://github.com/apache/ozone/actions/runs/3510796319/jobs/5880950816

Error:  Failures: 
Error:    TestOzoneManagerDoubleBuffer.testOzoneManagerDoubleBuffer:201 expected: <1> but was: <2>
Error:    TestOzoneManagerDoubleBuffer.testOzoneManagerDoubleBuffer:208 expected: <3> but was: <4>
Error:    TestOzoneManagerDoubleBuffer.testOzoneManagerDoubleBuffer:208 expected: <6> but was: <7>
Error:    TestOzoneManagerDoubleBuffer.testOzoneManagerDoubleBuffer:208 expected: <10> but was: <11>
Error:    TestOzoneManagerDoubleBuffer.testOzoneManagerDoubleBuffer:208 expected: <14> but was: <15>
[INFO] 
Error:  Tests run: 628, Failures: 5, Errors: 0, Skipped: 1

Yes, It worked fine on mac but seems like failing due to some race condition in workflow. I thought it will be fixed if we disabled the daemon thread and enable it after adding responses to the currentBuffer. But still failing.

I'm gone extract out the flushing logic and test it independently.

Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hemanthboyina for the patch. Overall looks good. I have some minor comments inline.

Comment on lines +439 to +446
// New queue gets created in three conditions:
// 1. It is first element in the response,
// 2. Current request is createSnapshot request.
// 3. Previous request was createSnapshot request.
if (response.isEmpty() ||
omResponse.getCreateSnapshotResponse() != null ||
(previousOmResponse != null &&
previousOmResponse.getCreateSnapshotResponse() != null)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic sounds good to me. But IntelliJ somehow gives scary "condition always true" warning and suggests removal of the condition. I think this is a false alarm. Would you also double check?

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is because CreateSnapshotResponse is optional attribute.

Copy link
Contributor

@prashantpogde prashantpogde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @hemantk-12. LGTM

@smengcl smengcl merged commit 6fe2ab8 into apache:HDDS-6517-Snapshot Nov 28, 2022
@smengcl
Copy link
Contributor

smengcl commented Nov 28, 2022

Thanks @hemantk-12 for the patch, @GeorgeJahad and @prashantpogde for reviewing this.

@hemantk-12 hemantk-12 deleted the HDDS-7279 branch December 8, 2022 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants