HDDS-10366. Add new testPrepare() in TestOzoneManagerPrepare#6245
HDDS-10366. Add new testPrepare() in TestOzoneManagerPrepare#6245aryangupta1998 wants to merge 4 commits intoapache:masterfrom
Conversation
|
@aryangupta1998 Also, please compare test run time before/after the change. |
errose28
left a comment
There was a problem hiding this comment.
Thanks for the fix @aryangupta1998. This snapshot race has probably been making prepare slightly flaky for a while.
| private static final int OZONE_CLIENT_FAILOVER_MAX_ATTEMPTS = 5; | ||
| private static final int IPC_CLIENT_CONNECT_MAX_RETRIES = 4; | ||
| private static final long SNAPSHOT_THRESHOLD = 50; | ||
| private static final long SNAPSHOT_THRESHOLD = 1; |
There was a problem hiding this comment.
Setting this globally may mask other problems in prepare or other OM HA tests when snapshots are not taken. We should probably only set in the one new test.
There was a problem hiding this comment.
Due to the presence of some functions like submitCancelPrepareRequest() in setup(), the transaction index increases due to which we may not be able to produce the current scenario i.e, SNAPSHOT_THRESHOLD = 1.
Do you feel we should write a new test class extending TestOzoneManagerHA?
There was a problem hiding this comment.
I think desired result is that only the new snapshot test in TestOzoneManagerPrepare has a snapshot threshold of 1, and the rest of the tests for TestOzoneManagerPrepare and TestOzoneManagerHA subclasses remain unchanged. Doing that with the current TestOzoneManagerHA setup is difficult because it is using a static method to set the configuration, so some refactoring may be required there.
There was a problem hiding this comment.
If we do so, the transaction index will still increase due to the setup() function in TestOzoneManagerPrepare!
There was a problem hiding this comment.
the transaction index increases due to which we may not be able to produce the current scenario i.e, SNAPSHOT_THRESHOLD = 1.
If we do so, the transaction index will still increase due to the setup() function in TestOzoneManagerPrepare!
I think there may be some misunderstanding about the snapshot threshold? SNAPSHOT_THRESHOLD=1 means the OM will take a snapshot on every new request, not that it will only take a snapshot on index 1. The transaction index can be any number at any time during the test, and by setting this config to 1 we know that transaction will have a snapshot taken.
| } else if (!ratisStateMachineApplied) { | ||
| throw new IOException(String.format("After waiting for %d seconds, " + | ||
| "Ratis state machine applied index %d which is less than" + | ||
| " the minimum required index %d.", | ||
| flushTimeout.getSeconds(), lastRatisCommitIndex, | ||
| minRatisStateMachineIndex)); |
There was a problem hiding this comment.
We should remove minRatisStateMachineIndex and ratisStateMachineApplied from the other parts of this function as well if we are no longer using them. We also need to make sure that om.getRatisSnapshotIndex() returning an index means that was actually written to RocksDB. Checking the transaction info table may be a better option.
b3d16b1 to
eddc894
Compare
What changes were proposed in this pull request?
Adding new testPrepare() in TestOzoneManagerPrepare which has snapshot interval set to 1, when upgrade prepare request comes, triggers force snapshot from ratis and waits for complete, once force snapshot is completed, then submit the upgrade prepare for the remaining task to mark the upgrade prepare complete. We then verify that prepare index should always be less than the current transaction index.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10366
How was this patch tested?
Tested Manually.