Skip to content

[BEAM-6346] Inspect the docker container state#7395

Merged
mxm merged 1 commit intoapache:masterfrom
angoenka:check_container_creation
Jan 9, 2019
Merged

[BEAM-6346] Inspect the docker container state#7395
mxm merged 1 commit intoapache:masterfrom
angoenka:check_container_creation

Conversation

@angoenka
Copy link
Contributor

@angoenka angoenka commented Jan 3, 2019

Please add a meaningful description for your change here


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---

@angoenka
Copy link
Contributor Author

angoenka commented Jan 3, 2019

R: @mxm @robertwb

@angoenka angoenka requested review from mxm and robertwb January 3, 2019 01:29
// Wait on a client from the gRPC server.
while (instructionHandler == null) {
Preconditions.checkArgument(
docker.isContainerRunning(containerId), "No container running for id " + containerId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume that the container startup is synchronous? If not, would it make sense to add a retry logic here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue that I observed here was when the container failed to start. As it is started in detached mode, we don't know the state of the container and simply wait on the control client to connect.

Synchronous container start will not work and will be very difficult to orchestrate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is too strict then. We won't get RUNNING directly after starting the container. We need something like a repeated check which times out after some max startup time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the 'docker run -d ' return before the container is in running state. From running state, it can dies because of user code error/completion.

However it makes sense to check the state after the initial suggested wait of 1 min.

docker.isContainerRunning(containerId), "No container running for id " + containerId);
try {
instructionHandler = clientSource.take(workerId, Duration.ofMinutes(2));
instructionHandler = clientSource.take(workerId, Duration.ofSeconds(30));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there perhaps environments that need a longer time to spin up the services? Would only lower to 1 minute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 Min sounds good to me.

@angoenka angoenka force-pushed the check_container_creation branch from ff627f1 to 02f3a5e Compare January 8, 2019 21:16
@mxm mxm merged commit dffe2c1 into apache:master Jan 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants