Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-4167. Acceptance test logs missing if SCM fails to exit safe mode #1366

Merged
merged 1 commit into from Sep 1, 2020

Conversation

adoroszlai
Copy link
Contributor

@adoroszlai adoroszlai commented Aug 29, 2020

What changes were proposed in this pull request?

Acceptance test sometimes fails due to SCM not coming out of safe mode. If this happens, the cluster is stopped without running Robot tests. rebot command to process test results fails due to missing input, and acceptance check is abruptly stopped without fetching docker logs or running tests in other environments.

Fix:

  1. Only run intermediate rebot processing if input files are available (it is safe to let the final one in test-all.sh fail)
  2. Pre-create results dir for ozone-mr, which contains multiple test sub-directories, to avoid error in find

And some cleanup:

  1. Reduce some code duplication between test-all.sh and ozone-mr/test.sh by extracting functions for the shared code being fixed
  2. Replace set +e; ...; set -e with if ! ...; then ... (partly belongs to HDDS-4101) in the code being fixed

https://issues.apache.org/jira/browse/HDDS-4167

How was this patch tested?

Temporarily reduced wait time for exit from safe mode to 10 seconds, causing all tests to fail early. Verified that docker logs were still added to the bundle:

unzip -t acceptance-misc.zip
Archive:  acceptance-misc.zip
    testing: docker-hadoop27.log      OK
    testing: docker-hadoop31.log      OK
    testing: docker-hadoop32.log      OK
    testing: docker-ozone-csi.log     OK
    testing: docker-ozone-ha.log      OK
    testing: docker-ozone-om-ha-s3.log   OK
    testing: docker-ozone-topology.log   OK
    testing: docker-ozones3-haproxy.log   OK
    testing: docker-ozonesecure-mr.log   OK
    testing: docker-ozonesecure-om-ha.log   OK
    testing: docker-upgrade.log       OK

https://github.com/adoroszlai/hadoop-ozone/runs/1045059176

Regular CI:
https://github.com/adoroszlai/hadoop-ozone/runs/1045057585

@adoroszlai adoroszlai self-assigned this Aug 29, 2020
@adoroszlai adoroszlai requested a review from elek September 1, 2020 08:25
@elek
Copy link
Member

elek commented Sep 1, 2020

Reduce some code duplication between test-all.sh and ozone-mr/test.sh by extracting functions for the shared code being fixed

❤️

Copy link
Member

@elek elek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 thanks for the patch.
Nice cleanup.

@elek elek merged commit 13fe31b into apache:master Sep 1, 2020
@adoroszlai adoroszlai deleted the HDDS-4167 branch September 1, 2020 18:54
@adoroszlai
Copy link
Contributor Author

Thanks @elek for reviewing and committing it.

rakeshadr pushed a commit to rakeshadr/hadoop-ozone that referenced this pull request Sep 3, 2020
errose28 added a commit to errose28/ozone that referenced this pull request Sep 11, 2020
* master: (26 commits)
  HDDS-4167. Acceptance test logs missing if fails during cluster startup (apache#1366)
  HDDS-4121. Implement OmMetadataMangerImpl#getExpiredOpenKeys. (apache#1351)
  HDDS-3867. Extend the chunkinfo tool to display information from all nodes in the pipeline. (apache#1154)
  HDDS-4077. Incomplete OzoneFileSystem statistics (apache#1329)
  HDDS-3903. OzoneRpcClient support batch rename keys. (apache#1150)
  HDDS-4151. Skip the inputstream while offset larger than zero in s3g (apache#1354)
  HDDS-4147. Add OFS to FileSystem META-INF (apache#1352)
  HDDS-4137. Turn on the verbose mode of safe mode check on testlib (apache#1343)
  HDDS-4146. Show the ScmId and ClusterId in the scm web ui. (apache#1350)
  HDDS-4145. Bump version to 1.1.0-SNAPSHOT on master (apache#1349)
  HDDS-4109. Tests in TestOzoneFileSystem should use the existing MiniOzoneCluster (apache#1316)
  HDDS-4149. Implement OzoneFileStatus#toString (apache#1356)
  HDDS-4153. Increase default timeout in kubernetes tests (apache#1357)
  HDDS-2411. add a datanode chunk validator fo datanode chunk generator (apache#1312)
  HDDS-4140. Auto-close /pending pull requests after 21 days of inactivity (apache#1344)
  HDDS-4152. Archive container logs for kubernetes check (apache#1355)
  HDDS-4056. Convert OzoneAdmin to pluggable model (apache#1285)
  HDDS-3972. Add option to limit number of items displaying through ldb tool. (apache#1206)
  HDDS-4068. Client should not retry same OM on network connection failure (apache#1324)
  HDDS-4062. Non rack aware pipelines should not be created if multiple racks are alive. (apache#1291)
  ...
ayushtkn pushed a commit to ayushtkn/hadoop-ozone that referenced this pull request Oct 31, 2020
* HDDS-1577. Add default pipeline placement policy implementation. (apache#1366)



(cherry picked from commit b640a5f6d53830aee4b9c2a7d17bf57c987962cd)

* HDDS-1571. Create an interface for pipeline placement policy to support network topologies. (apache#1395)

(cherry picked from commit 753fc6703a39154ed6013e44dbae572391748906)

* HDDS-2089: Add createPipeline CLI. (apache#1418)

(cherry picked from commit 326b5acd4a63fe46821919322867f5daff30750c)

* HDDS-1569 Support creating multiple pipelines with same datanode. Contributed by Li Cheng. 

This closes apache#28

* HDDS-1572 Implement a Pipeline scrubber to clean up non-OPEN pipeline. (apache#237)

* Rebase Fix

* HDDS-2650 Fix createPipeline CLI. (apache#340)

* HDDS-2035 Implement datanode level CLI to reveal pipeline relation. (apache#348)

* Revert "HDDS-2650 Fix createPipeline CLI. (apache#340)"

This reverts commit 7c71710.

* HDDS-2650 Fix createPipeline CLI and make it message based. (apache#370)

* HDDS-1574 Average out pipeline allocation on datanodes and add metrcs/test (apache#291)

* Resolve rebase conflict.

* HDDS-2756. Handle pipeline creation failure in different way when it exceeds pipeline limit

Closes apache#401

* HDDS-2115 Add acceptance test for createPipeline CLI and datanode list CLI (apache#375)

* HDDS-2115 Add acceptance test for createPipeline CLI and datanode list CLI.

* HDDS-2772 Better management for pipeline creation limitation. (apache#410)

*  HDDS-2913 Update config names and CLI for multi-raft feature. (apache#462)

* HDDS-2924. Fix Pipeline#nodeIdsHash collision issue. (apache#478)

* HDDS-2923 Add fall-back protection for rack awareness in pipeline creation. (apache#516)

* HDDS-3007 Fix CI test failure for TestSCMNodeManager. (apache#550)

Co-authored-by: Sammi Chen <sammichen@apache.org>
Co-authored-by: Xiaoyu Yao <xyao@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants