-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-8962. Ensure docker env is stopped #5011
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adoroszlai for the improvement! Change mostly lgtm. I have one question inline.
docker-compose --ansi never down | ||
if ! { docker-compose --ansi never up -d --scale datanode="${datanode_count}" \ | ||
&& wait_for_safemode_exit \ | ||
&& wait_for_om_leader ; }; then | ||
[[ -n "$OUTPUT_NAME" ]] || OUTPUT_NAME="$COMPOSE_ENV_NAME" | ||
stop_docker_env | ||
return 1 | ||
fi | ||
|
||
trap stop_docker_env EXIT HUP INT TERM | ||
|
||
docker-compose --ansi never up -d --scale datanode="${datanode_count}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, with the changes in this PR:
copy_daemon_logs
andstop_docker_env
calls are removed fromexecute_robot_test()
copy_daemon_logs
is now called bystop_docker_env()
stop_docker_env
is now called fromstart_docker_env()
Thus whoever calls start_docker_env
last would have to call stop_docker_env
in order to have the logs collected, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @smengcl for the review.
whoever calls
start_docker_env
last would have to callstop_docker_env
in order to have the logs collected
It's like a ShutdownHook
in Java. The shell calls stop_docker_env
when the test script exits. start_docker_env
just sets it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah makes sense now. That is exactly what trap
is for. Thanks @adoroszlai .
Thanks @smengcl for the review. |
* master: (36 commits) HDDS-8990. Intermittent timeout waiting on datanode4 9856 to become available (apache#5039) Revert "HDDS-7750. Incorrect WRITE ACL check. (apache#4992)" HDDS-7750. Incorrect WRITE ACL check. (apache#4992) HDDS-8985. Intermittent timeout exiting safe mode in HA secure tests (apache#5033) HDDS-8593. Add RootCARotationPoller to CertClient (apache#5030) HDDS-7645. Kubernetes check should fail fast if cluster cannot start (apache#5028) HDDS-8981. TestRootedOzoneFileSystem runs out of disk space (apache#5029) HDDS-8592. Fetch and save all root certificates during service's certificate rotation. (apache#5025) HDDS-8981. Disable TestRootedOzoneFileSystem#testSafeMode HDDS-8591. Create scheduler to check for new root ca certificates (apache#4961) HDDS-8979. error validating kustomization.yaml (apache#5024) HDDS-8973. Ozone SCM HA should not allocates duplicate IDs when transferring leadership (apache#5018) HDDS-8970. Snapshot Diff should return path relative to bucket root (apache#5015) HDDS-8975. Clarify SCM HA auto-bootstrap doc (apache#5021) HDDS-8689. Rotate Root CA and Sub CA in SCM. (apache#4943) HDDS-8436. Support setSafeMode(), isFileClosed() FileSystem API (apache#4825) HDDS-8880. Intermittent fork timeout in TestOMRatisSnapshots (apache#5022) HDDS-8962. Ensure docker env is stopped (apache#5011) HDDS-7794. [snapshot] SnapshotDiff should throw better error messages for exception handling (apache#5007) HDDS-7922. [FSO] S3G folder support fso layout filestatus s3A compatibility (apache#4448) ...
What changes were proposed in this pull request?
Currently, if acceptance test script exits abruptly due to failed command (e.g. timeout in
wait_for_port
), the docker environment is left running, logs are not saved, subsequent test scripts may fail due to port conflict, etc. This is happening recently in HA-secure:This change adds a
trap
to ensurestop_docker_env
is executed.https://issues.apache.org/jira/browse/HDDS-8962
How was this patch tested?
Verified that logs are saved for
failures1
, too:Regular CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/5424893835