Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-8981. TestRootedOzoneFileSystem runs out of disk space #5029

Merged
merged 1 commit into from
Jul 6, 2023

Conversation

adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

TestRootedOzoneFileSystem#testSafeMode (introduced for HDDS-8436) stops datanodes, restarts SCM and forces it to exit safe mode. This leaves the cluster in a bad state, breaking other test cases. Log is flooded due to an "infinite" loop trying to allocate block without datanodes. (It's not really infinite, exits after at most 5 minutes, but by that time tests are aborted due to disk out of space error.)

This PR extracts testSafeMode to a separate class, where the final state of the cluster is not a problem for other test cases.

The same test is run for both OFS and O3FS.

https://issues.apache.org/jira/browse/HDDS-8981

How was this patch tested?

$ mvn -am -pl :ozone-integration-test -Dtest='TestRootedOzoneFileSystem,TestSafeMode' clean test
...
[WARNING] Tests run: 250, Failures: 0, Errors: 0, Skipped: 9, Time elapsed: 332.357 s - in org.apache.hadoop.fs.ozone.TestRootedOzoneFileSystem
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 41.188 s - in org.apache.hadoop.fs.ozone.TestSafeMode
...
[INFO] BUILD SUCCESS

CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/5475617964

@adoroszlai adoroszlai self-assigned this Jul 6, 2023
@adoroszlai adoroszlai added the test label Jul 6, 2023
Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@szetszwo szetszwo merged commit 4980c20 into apache:master Jul 6, 2023
20 checks passed
@adoroszlai adoroszlai deleted the HDDS-8981 branch July 6, 2023 19:57
@adoroszlai
Copy link
Contributor Author

Thanks @szetszwo for reviewing and merging this.

errose28 added a commit to errose28/ozone that referenced this pull request Jul 10, 2023
* master: (36 commits)
  HDDS-8990. Intermittent timeout waiting on datanode4 9856 to become available (apache#5039)
  Revert "HDDS-7750. Incorrect WRITE ACL check. (apache#4992)"
  HDDS-7750. Incorrect WRITE ACL check. (apache#4992)
  HDDS-8985. Intermittent timeout exiting safe mode in HA secure tests (apache#5033)
  HDDS-8593. Add RootCARotationPoller to CertClient (apache#5030)
  HDDS-7645. Kubernetes check should fail fast if cluster cannot start (apache#5028)
  HDDS-8981. TestRootedOzoneFileSystem runs out of disk space (apache#5029)
  HDDS-8592. Fetch and save all root certificates during service's certificate rotation. (apache#5025)
  HDDS-8981. Disable TestRootedOzoneFileSystem#testSafeMode
  HDDS-8591. Create scheduler to check for new root ca certificates (apache#4961)
  HDDS-8979. error validating kustomization.yaml (apache#5024)
  HDDS-8973. Ozone SCM HA should not allocates duplicate IDs when transferring leadership (apache#5018)
  HDDS-8970. Snapshot Diff should return path relative to bucket root (apache#5015)
  HDDS-8975. Clarify SCM HA auto-bootstrap doc (apache#5021)
  HDDS-8689. Rotate Root CA and Sub CA in SCM. (apache#4943)
  HDDS-8436. Support setSafeMode(), isFileClosed() FileSystem API (apache#4825)
  HDDS-8880. Intermittent fork timeout in TestOMRatisSnapshots (apache#5022)
  HDDS-8962. Ensure docker env is stopped (apache#5011)
  HDDS-7794. [snapshot] SnapshotDiff should throw better error messages for exception handling (apache#5007)
  HDDS-7922. [FSO] S3G folder support fso layout filestatus s3A compatibility (apache#4448)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants