Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flink 34569][e2e] fail fast if AWS cli container fails to start #24491

Merged

Commits on Mar 17, 2024

  1. [FLINK-34569][e2e] Fail fast if aws cli container fails to run

    Why:
    An end-to-end test run failed and in the test logs you could see that the
    AWS cli container failed to start. Because of the way it's organised the
    failure in the subshell did not cause a failure and AWSCLI_CONTAINER_ID was
    empty. This lead to a loop trying to docker exec a command in a container
    named "" and the test taking 15 minutes to time out. This change speeds up
    the failure.
    
    Note that we use 'return' to prevent an immediate failure of the script so
    that we have the potential to implement a simple retry.
    
    Signed-off-by: Robert Young <robeyoun@redhat.com>
    robobario committed Mar 17, 2024
    Configuration menu
    Copy the full SHA
    793ca0d View commit details
    Browse the repository at this point in the history
  2. [FLINK-34569][e2e] Add naive retry when creating aws cli container

    Why:
    An end-to-end test run failed with what looked like a transient network
    exception when pulling the aws cli image. This retries once.
    
    Signed-off-by: Robert Young <robeyoun@redhat.com>
    robobario committed Mar 17, 2024
    Configuration menu
    Copy the full SHA
    92354f8 View commit details
    Browse the repository at this point in the history
  3. [FLINK-34569][e2e] Remove jq containers after user

    Why:
    A large pile of exited jq containers were left in docker after
    an operation was retried repeatedly.
    
    Signed-off-by: Robert Young <robeyoun@redhat.com>
    robobario committed Mar 17, 2024
    Configuration menu
    Copy the full SHA
    ca05028 View commit details
    Browse the repository at this point in the history
  4. [FLINK-34569][e2e] Clean up after failed awscli container run

    Why:
    If for some reason the command can return a non-zero exit code and also
    create a container, this will remove it so we don't have an orphan sitting
    stranded.
    
    Signed-off-by: Robert Young <robeyoun@redhat.com>
    robobario committed Mar 17, 2024
    Configuration menu
    Copy the full SHA
    4b233a7 View commit details
    Browse the repository at this point in the history