[BEAM-6346] Inspect the docker container state#7395
Conversation
| // Wait on a client from the gRPC server. | ||
| while (instructionHandler == null) { | ||
| Preconditions.checkArgument( | ||
| docker.isContainerRunning(containerId), "No container running for id " + containerId); |
There was a problem hiding this comment.
Can we assume that the container startup is synchronous? If not, would it make sense to add a retry logic here?
There was a problem hiding this comment.
The main issue that I observed here was when the container failed to start. As it is started in detached mode, we don't know the state of the container and simply wait on the control client to connect.
Synchronous container start will not work and will be very difficult to orchestrate.
There was a problem hiding this comment.
The check is too strict then. We won't get RUNNING directly after starting the container. We need something like a repeated check which times out after some max startup time.
| docker.isContainerRunning(containerId), "No container running for id " + containerId); | ||
| try { | ||
| instructionHandler = clientSource.take(workerId, Duration.ofMinutes(2)); | ||
| instructionHandler = clientSource.take(workerId, Duration.ofSeconds(30)); |
There was a problem hiding this comment.
Are there perhaps environments that need a longer time to spin up the services? Would only lower to 1 minute.
There was a problem hiding this comment.
1 Min sounds good to me.
ff627f1 to
02f3a5e
Compare
Please add a meaningful description for your change here
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.It will help us expedite review of your Pull Request if you tag someone (e.g.
@username) to look at it.Post-Commit Tests Status (on master branch)