Skip to content

Commit

Permalink
fix(ci): handle disk mounting and logs reading edge-cases (#7690)
Browse files Browse the repository at this point in the history
* fix: use `exit-nopipe` with consistent `shell` usage

Temporarily disabled the `set -e` option around the docker logs command to handle the broken pipe error gracefully.

Handle more complex scenarios in our `Result of ${{ inputs.test_id }} test` job

* fix: Use single quotes for the outer command

* fix: use same approach for CD

* test: check launch failure logs

* fix: revert CD changes

* fix: do not try to increase the disk size and wait mounting

* fix: increase GB a bit more

* fix: do not fail on pipe failure

* fix: use plain `tee /dev/stderr`

If this does not work try `(tee … || true)`

* fix: `tee` not stoping on cd config tests

* fix: match logic with GCP tests

* fix(cd): handle pipe and other errors correctly

* try `tee --output-error=exit-nopipe`

* fix: TRAP without pipefail

* test: pipefail with exit and trap

* fix: use a subshell

* fix(ci): wait for mounting and show system logs if fail

* fix(ci): GCP is not always mounting disks in the same order

* fix: use `grep` instead of `awk`

* fix: typo

* fix: use simpler `grep` command

* fix: do not sleep if not require

* chore: reduce diff
  • Loading branch information
gustavovalverde committed Oct 9, 2023
1 parent a2b7859 commit 8d0a17e
Show file tree
Hide file tree
Showing 2 changed files with 162 additions and 81 deletions.
91 changes: 53 additions & 38 deletions .github/workflows/continous-delivery.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ on:
type: boolean
default: false

# Temporarily disabled to reduce network load, see #6894.
# TODO: Temporarily disabled to reduce network load, see #6894.
#push:
# branches:
# - main
Expand Down Expand Up @@ -132,29 +132,37 @@ jobs:

# Make sure Zebra can sync at least one full checkpoint on mainnet
- name: Run tests using the default config
shell: /usr/bin/bash -exo pipefail {0}
run: |
set -ex
docker pull ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }}
docker run --detach --name default-conf-tests -t ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }}
# show the logs, even if the job times out
docker logs --tail all --follow default-conf-tests | \
tee --output-error=exit /dev/stderr | \
grep --max-count=1 --extended-regexp --color=always \
'net.*=.*Main.*estimated progress to chain tip.*BeforeOverwinter'
# Use a subshell to handle the broken pipe error gracefully
(
trap "" PIPE;
docker logs \
--tail all \
--follow \
default-conf-tests | \
tee --output-error=exit /dev/stderr | \
grep --max-count=1 --extended-regexp --color=always \
-e "net.*=.*Main.*estimated progress to chain tip.*BeforeOverwinter"
) || true
LOGS_EXIT_STATUS=$?
docker stop default-conf-tests
# get the exit status from docker
EXIT_STATUS=$( \
docker wait default-conf-tests || \
docker inspect --format "{{.State.ExitCode}}" default-conf-tests || \
echo "missing container, or missing exit status for container" \
)
docker logs default-conf-tests
echo "docker exit status: $EXIT_STATUS"
if [[ "$EXIT_STATUS" = "137" ]]; then
echo "ignoring expected signal status"
exit 0
EXIT_STATUS=$(docker wait default-conf-tests || echo "Error retrieving exit status");
echo "docker exit status: $EXIT_STATUS";
# If grep found the pattern, exit with the Docker container exit status
if [ $LOGS_EXIT_STATUS -eq 0 ]; then
exit $EXIT_STATUS;
fi
exit "$EXIT_STATUS"
# Handle other potential errors here
echo "An error occurred while processing the logs.";
exit 1;
# Test reconfiguring the docker image for testnet.
test-configuration-file-testnet:
Expand All @@ -172,30 +180,37 @@ jobs:

# Make sure Zebra can sync the genesis block on testnet
- name: Run tests using a testnet config
shell: /usr/bin/bash -exo pipefail {0}
run: |
set -ex
docker pull ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }}
docker run --env "NETWORK=Testnet" --detach --name testnet-conf-tests -t ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }}
# show the logs, even if the job times out
docker logs --tail all --follow testnet-conf-tests | \
tee --output-error=exit /dev/stderr | \
grep --max-count=1 --extended-regexp --color=always \
-e 'net.*=.*Test.*estimated progress to chain tip.*Genesis' \
-e 'net.*=.*Test.*estimated progress to chain tip.*BeforeOverwinter'
# Use a subshell to handle the broken pipe error gracefully
(
trap "" PIPE;
docker logs \
--tail all \
--follow \
testnet-conf-tests | \
tee --output-error=exit /dev/stderr | \
grep --max-count=1 --extended-regexp --color=always \
-e "net.*=.*Test.*estimated progress to chain tip.*Genesis" \
-e "net.*=.*Test.*estimated progress to chain tip.*BeforeOverwinter";
) || true
LOGS_EXIT_STATUS=$?
docker stop testnet-conf-tests
# get the exit status from docker
EXIT_STATUS=$( \
docker wait testnet-conf-tests || \
docker inspect --format "{{.State.ExitCode}}" testnet-conf-tests || \
echo "missing container, or missing exit status for container" \
)
docker logs testnet-conf-tests
echo "docker exit status: $EXIT_STATUS"
if [[ "$EXIT_STATUS" = "137" ]]; then
echo "ignoring expected signal status"
exit 0
EXIT_STATUS=$(docker wait testnet-conf-tests || echo "Error retrieving exit status");
echo "docker exit status: $EXIT_STATUS";
# If grep found the pattern, exit with the Docker container exit status
if [ $LOGS_EXIT_STATUS -eq 0 ]; then
exit $EXIT_STATUS;
fi
exit "$EXIT_STATUS"
# Handle other potential errors here
echo "An error occurred while processing the logs.";
exit 1;
# Deploy Managed Instance Groups (MiGs) for Mainnet and Testnet,
# with one node in the configured GCP region.
Expand Down
Loading

0 comments on commit 8d0a17e

Please sign in to comment.