Skip to content

Comments

HDDS-3227. Ensure eviction of stateMachineData from cache only when both followers catch up#2704

Merged
lokeshj1703 merged 16 commits intoapache:masterfrom
bshashikant:HDDS-3227
Dec 7, 2021
Merged

HDDS-3227. Ensure eviction of stateMachineData from cache only when both followers catch up#2704
lokeshj1703 merged 16 commits intoapache:masterfrom
bshashikant:HDDS-3227

Conversation

@bshashikant
Copy link
Contributor

What changes were proposed in this pull request?

Changing the stateMachine caching gurantees.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3227

How was this patch tested?

Existing UTs

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bshashikant Thanks for working on this! I think we should make this change configurable along with the follower gap threshold in Ratis.

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bshashikant Thanks for updating the PR! I have a few comments inline.

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bshashikant Thanks for updating the PR! The changes look good to me. Only a minor comment. +1 o.w.

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase on master might help with failing CI.

@bshashikant
Copy link
Contributor Author

The test failures are not related to the patch.

@adoroszlai
Copy link
Contributor

It seems TestRandomKeyGenerator is timing out consistently (3/3 runs now). Problem happens with the 2GB key test case (Key size: 2147483657 bytes). Can be reproduced locally, too.

@lokeshj1703 lokeshj1703 merged commit ea53dc1 into apache:master Dec 7, 2021
@lokeshj1703
Copy link
Contributor

@bshashikant Thanks for the contribution! @bharatviswa504 @adoroszlai Thanks for the review! I have committed the PR to master branch.

@adoroszlai
Copy link
Contributor

@lokeshj1703 @bshashikant

While TestRandomKeyGenerator did not time out in the last PR run before the merge, I think it still shows some problem. It took more than twice longer than previously:

Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 443.256 s - in org.apache.hadoop.ozone.freon.TestRandomKeyGenerator

vs.

Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 186.665 s - in org.apache.hadoop.ozone.freon.TestRandomKeyGenerator

And it has timed out on master after the PR was merged:
https://github.com/apache/ozone/runs/4440507728?check_suite_focus=true#step:4:3010

Running it locally, I see appendEntries Timeout in the log.

@adoroszlai
Copy link
Contributor

OK, I think I found the problem:
https://issues.apache.org/jira/browse/HDDS-6071

@bshashikant
Copy link
Contributor Author

OK, I think I found the problem: https://issues.apache.org/jira/browse/HDDS-6071

Thanks @adoroszlai for root causing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants