HDDS-10366. Add new testPrepare() in TestOzoneManagerPrepare by aryangupta1998 · Pull Request #6245 · apache/ozone

aryangupta1998 · 2024-02-21T06:59:15Z

What changes were proposed in this pull request?

Adding new testPrepare() in TestOzoneManagerPrepare which has snapshot interval set to 1, when upgrade prepare request comes, triggers force snapshot from ratis and waits for complete, once force snapshot is completed, then submit the upgrade prepare for the remaining task to mark the upgrade prepare complete. We then verify that prepare index should always be less than the current transaction index.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10366

How was this patch tested?

Tested Manually.

adoroszlai · 2024-02-21T07:39:48Z

@aryangupta1998 TestOzoneManagerHA subclasses historically had several intermittent failures. Please run them in flaky-test-check workflow to ensure reduced snapshot threshold does not cause regressions.

Also, please compare test run time before/after the change.

errose28

Thanks for the fix @aryangupta1998. This snapshot race has probably been making prepare slightly flaky for a while.

errose28 · 2024-02-22T23:05:19Z

hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOzoneManagerHA.java

  private static final int OZONE_CLIENT_FAILOVER_MAX_ATTEMPTS = 5;
  private static final int IPC_CLIENT_CONNECT_MAX_RETRIES = 4;
-  private static final long SNAPSHOT_THRESHOLD = 50;
+  private static final long SNAPSHOT_THRESHOLD = 1;


Setting this globally may mask other problems in prepare or other OM HA tests when snapshots are not taken. We should probably only set in the one new test.

Due to the presence of some functions like submitCancelPrepareRequest() in setup(), the transaction index increases due to which we may not be able to produce the current scenario i.e, SNAPSHOT_THRESHOLD = 1.
Do you feel we should write a new test class extending TestOzoneManagerHA?

I think desired result is that only the new snapshot test in TestOzoneManagerPrepare has a snapshot threshold of 1, and the rest of the tests for TestOzoneManagerPrepare and TestOzoneManagerHA subclasses remain unchanged. Doing that with the current TestOzoneManagerHA setup is difficult because it is using a static method to set the configuration, so some refactoring may be required there.

If we do so, the transaction index will still increase due to the setup() function in TestOzoneManagerPrepare!

the transaction index increases due to which we may not be able to produce the current scenario i.e, SNAPSHOT_THRESHOLD = 1.

If we do so, the transaction index will still increase due to the setup() function in TestOzoneManagerPrepare!

I think there may be some misunderstanding about the snapshot threshold? SNAPSHOT_THRESHOLD=1 means the OM will take a snapshot on every new request, not that it will only take a snapshot on index 1. The transaction index can be any number at any time during the test, and by setting this config to 1 we know that transaction will have a snapshot taken.

errose28 · 2024-02-22T23:19:07Z

...ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/upgrade/OMPrepareRequest.java

-    } else if (!ratisStateMachineApplied) {
-      throw new IOException(String.format("After waiting for %d seconds, " +
-              "Ratis state machine applied index %d which is less than" +
-              " the minimum required index %d.",
-          flushTimeout.getSeconds(), lastRatisCommitIndex,
-          minRatisStateMachineIndex));


We should remove minRatisStateMachineIndex and ratisStateMachineApplied from the other parts of this function as well if we are no longer using them. We also need to make sure that om.getRatisSnapshotIndex() returning an index means that was actually written to RocksDB. Checking the transaction info table may be a better option.

aryangupta1998 requested a review from sumitagrawl February 21, 2024 06:59

adoroszlai added the test label Feb 22, 2024

errose28 reviewed Feb 22, 2024

View reviewed changes

Aryan Gupta added 3 commits March 5, 2024 11:51

HDDS-10366. Add new testPrepare() in TestOzoneManagerPrepare

bdc2c8e

Addressed Comments.

4efdbbf

Fixed Build failure.

eddc894

aryangupta1998 force-pushed the HDDS-10366 branch from b3d16b1 to eddc894 Compare March 5, 2024 08:20

Fixed Build failure.

6770087

aryangupta1998 closed this Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-10366. Add new testPrepare() in TestOzoneManagerPrepare#6245

HDDS-10366. Add new testPrepare() in TestOzoneManagerPrepare#6245
aryangupta1998 wants to merge 4 commits intoapache:masterfrom
aryangupta1998:HDDS-10366

aryangupta1998 commented Feb 21, 2024

Uh oh!

adoroszlai commented Feb 21, 2024

Uh oh!

errose28 left a comment

Uh oh!

errose28 Feb 22, 2024

Uh oh!

aryangupta1998 Feb 26, 2024

Uh oh!

errose28 Feb 28, 2024

Uh oh!

aryangupta1998 Feb 28, 2024

Uh oh!

errose28 Feb 28, 2024

Uh oh!

errose28 Feb 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aryangupta1998 commented Feb 21, 2024

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

adoroszlai commented Feb 21, 2024

Uh oh!

errose28 left a comment

Choose a reason for hiding this comment

Uh oh!

errose28 Feb 22, 2024

Choose a reason for hiding this comment

Uh oh!

aryangupta1998 Feb 26, 2024

Choose a reason for hiding this comment

Uh oh!

errose28 Feb 28, 2024

Choose a reason for hiding this comment

Uh oh!

aryangupta1998 Feb 28, 2024

Choose a reason for hiding this comment

Uh oh!

errose28 Feb 28, 2024

Choose a reason for hiding this comment

Uh oh!

errose28 Feb 22, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants