HDDS-4914. Failure injection and validating HDDS upgrade.#1998
HDDS-4914. Failure injection and validating HDDS upgrade.#1998avijayanhwx merged 10 commits intoapache:HDDS-3698-nonrolling-upgradefrom
Conversation
|
@adoroszlai These fault injection tests are adding significant time to complete CI runs, is there any way to 1) run the conditionally on layout version upgrade or 2) speed up the tests by running as separate processes? |
adoroszlai
left a comment
There was a problem hiding this comment.
These fault injection tests are adding significant time to complete CI runs
We can skip it in regular integration tests by adding an <exclude> for TestHDDSUpgrade in pom.xml:
Lines 2175 to 2191 in 2ce0594
If the pre-existing test case(s) or any other future test cases need to be run as regular integration tests, then the new injection test cases should be separated into a separate class.
is there any way to 1) run the conditionally on layout version upgrade
We can introduce a new workflow for failure injection tests. It can be scheduled with lower frequency.
Is "layout version upgrade" indicated by changes to specific source files, which we could use as trigger? Also, are you sure that upgrade functionality is not affected by other cocde changes?
- speed up the tests by running as separate processes?
We can override surefire fork parameters for this separate workflow.
...rc/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/hadoop/ozone/container/upgrade/DataNodeUpgradeFinalizer.java
Outdated
Show resolved
Hide resolved
...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/upgrade/UpgradeFinalizer.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/upgrade/BasicUpgradeFinalizer.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/upgrade/UpgradeFinalizer.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/hadoop/ozone/container/upgrade/DataNodeUpgradeFinalizer.java
Outdated
Show resolved
Hide resolved
.../server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/upgrade/SCMUpgradeFinalizer.java
Outdated
Show resolved
Hide resolved
f67b44d to
054e073
Compare
|
@adoroszlai @avijayanhwx @swagle Addressing all CI failures. The long running failure injection test is disabled by default. We need to find a way to run them with less frequency. |
|
The one failure in CI is unrelated with the changes. |
avijayanhwx
left a comment
There was a problem hiding this comment.
Thanks for working on this @prashantpogde. This will be useful in the future where complex finalization/rollback scenarios can be tested. I have some comments on the abstractions.
I am yet to review the actual test code that has been added. Will post more comments if needed after reviewing the injected failure testing.
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/upgrade/BasicUpgradeFinalizer.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/upgrade/BasicUpgradeFinalizer.java
Outdated
Show resolved
Hide resolved
...p-hdds/common/src/main/java/org/apache/hadoop/ozone/upgrade/UpgradeFinalizationExecutor.java
Outdated
Show resolved
Hide resolved
...p-hdds/common/src/main/java/org/apache/hadoop/ozone/upgrade/UpgradeFinalizationExecutor.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/upgrade/BasicUpgradeFinalizer.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/upgrade/UpgradeFinalizer.java
Outdated
Show resolved
Hide resolved
...ommon/src/main/java/org/apache/hadoop/ozone/upgrade/InjectedUpgradeFinalizationExecutor.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/upgrade/TestHDDSUpgrade.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/upgrade/TestHDDSUpgrade.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/upgrade/TestHDDSUpgrade.java
Outdated
Show resolved
Hide resolved
For now we have disabled long running tests. Therefore we do not need to make change in pom.xml for now. But we do need a way to run these test with less frequency e.g. every 50th commit or something like that. |
|
@prashantpogde Can we resolve the merge conflicts? That will trigger CI which will actually run the added tests. |
942d50e to
5212538
Compare
Done |
|
Thanks for working on this @prashantpogde. I am merging this, with a follow up item of HDDS-5108. |
* HDDS-3698-nonrolling-upgrade: HDDS-5086. Add pre-finalize validation action for SCM HA. (apache#2143) HDDS-4914. Failure injection and validating HDDS upgrade. (apache#1998) HDDS-5014. Move upgrade user flow to 'feature' folder. HDDS-5014. Upgrade usage primer documentation. (apache#2133) HDDS-4181. Add acceptance tests for upgrade, finalization and downgrade. (apache#2056) HDDS-4828. SCM should go into "safe mode" until there is at least 1 pipeline to work with after finalization. (apache#2101)
What changes were proposed in this pull request?
The goals of this PR is to write comprehensive framework that will
HDDS upgrade model can be thought of as a State Machine model {states, transitions}, where
states are specific stages in upgrade finalization either on the SCM node or on the individual DataNodes
transitions are events that trigger state change
Different HDDS-Upgrade stages, for Both DataNodes as well SCM are defined as
This validation framework will trigger all possible combination of failures while the nodes are in different possible states. The different combinations will include :
-Try this for all possible SCM-upgrade states
- Try this for all possible SCM-upgrade states
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4914
How was this patch tested?
Running newly introduced Integration Tests.