Is there a way to delay container startup to support dependant services with a longer startup time #374

Open
dancrumb opened this Issue Aug 4, 2014 · 301 comments

Projects

None yet
@dancrumb
dancrumb commented Aug 4, 2014

I have a MySQL container that takes a little time to start up as it needs to import data.

I have an Alfresco container that depends upon the MySQL container.

At the moment, when I use fig, the Alfresco service inside the Alfresco container fails when it attempts to connect to the MySQL container... ostensibly because the MySQL service is not yet listening.

Is there a way to handle this kind of issue in Fig?

@d11wtq
Contributor
d11wtq commented Aug 4, 2014

At work we wrap our dependent services in a script that check if the link is up yet. I know one of my colleagues would be interested in this too! Personally I feel it's a container-level concern to wait for services to be available, but I may be wrong :)

@nubs
Contributor
nubs commented Aug 4, 2014

We do the same thing with wrapping. You can see an example here: https://github.com/dominionenterprises/tol-api-php/blob/master/tests/provisioning/set-env.sh

@bfirsh
Collaborator
bfirsh commented Aug 4, 2014

It'd be handy to have an entrypoint script that loops over all of the links and waits until they're working before starting the command passed to it.

This should be built in to Docker itself, but the solution is a way off. A container shouldn't be considered started until the link it exposes has opened.

@dancrumb
dancrumb commented Aug 4, 2014

@bfirsh that's more than I was imagining, but would be excellent.

A container shouldn't be considered started until the link it exposes has opened.

I think that's exactly what people need.

For now, I'll be using a variation on https://github.com/aanand/docker-wait

@silarsis
silarsis commented Aug 4, 2014

Yeah, I'd be interested in something like this - meant to post about it earlier.

The smallest impact pattern I can think of that would fix this usecase for us would be to be the following:

Add "wait" as a new key in fig.yml, with similar value semantics as link. Docker would treat this as a pre-requisite and wait until this container has exited prior to carrying on.

So, my docker file would look something like:

db:
  image: tutum/mysql:5.6

initdb:
  build: /path/to/db
  link:
    - db:db
  command: /usr/local/bin/init_db

app:
  link:
    - db:db
  wait:
    - initdb

On running app, it will start up all the link containers, then run the wait container and only progress to the actual app container once the wait container (initdb) has exited. initdb would run a script that waits for the database to be available, then runs any initialisations/migrations/whatever, then exits.

That's my thoughts, anyway.

@dnephin
Member
dnephin commented Aug 5, 2014

(revised, see below)

@dsyer
dsyer commented Aug 14, 2014

+1 here too. It's not very appealing to have to do this in the commands themselves.

@jcalazan

+1 as well. Just ran into this issue. Great tool btw, makes my life so much easier!

@arruda
arruda commented Aug 16, 2014

+1 would be great to have this.

@prologic

+1 also. Recently run into the same set of problems

@chymian
chymian commented Aug 19, 2014

+1 also. any statement from dockerguys?

@codeitagile

I am writing wrapper scripts as entrypoints to synchronise at the moment, not sure if having a mechanism in fig is wise if you have other targets for your containers that perform orchestration a different way. Seems very application specific to me, as such the responsibility of the containers doing the work.

@prologic

After some thought and experimentation I do kind of agree with this.

As such an application I'm building basically has a synchronous
waitfor(host, port) function that lets me waits for services the application
is depending on (either detected via environment or explicitly
configuration via cli options).

cheers
James

James Mills / prologic

E: prologic@shortcircuit.net.au
W: prologic.shortcircuit.net.au

On Fri, Aug 22, 2014 at 6:34 PM, Mark Stuart notifications@github.com
wrote:

I am writing wrapper scripts as entrypoints to synchronise at the moment,
not sure if having a mechanism in fig is wise if you have other targets for
your containers that perform orchestration a different way. Seems very
application specific to me as such the responsibility of the containers
doing the work.

β€”
Reply to this email directly or view it on GitHub
#374 (comment).

@shuron
Contributor
shuron commented Aug 31, 2014

Yes some basic "depend's on" neeeded here...
so if you have 20 container, you just wan't to run fig up and everything starts with correct order...
However it also have some timeout option or other failure catching mechanisms

@ahknight

Another +1 here. I have Postgres taking longer than Django to start so the DB isn't there for the migration command without hackery.

@dnephin
Member
dnephin commented Oct 23, 2014

@ahknight interesting, why is migration running during run ?

Don't you want to actually run migrate during the build phase? That way you can startup fresh images much faster.

@ahknight

There's a larger startup script for the application in question, alas. For now, we're doing non-DB work first, using nc -w 1 in a loop to wait for the DB, then doing DB actions. It works, but it makes me feel dirty(er).

@dnephin
Member
dnephin commented Oct 23, 2014

I've had a lot of success doing this work during the fig build phase. I have one example of this with a django project (still a work in progress through): https://github.com/dnephin/readthedocs.org/blob/fig-demo/dockerfiles/database/Dockerfile#L21

No need to poll for startup. Although I've done something similar with mysql, where I did have to poll for startup because the mysqld init script wasn't doing it already. This postgres init script seems to be much better.

@arruda
arruda commented Oct 24, 2014

Here is what I was thinking:

Using the idea of docker/docker#7445 we could implement this "wait_for_helth_check" attribute in fig?
So it would be a fig not a Docker issue?

is there anyway of making fig check the tcp status on the linked container, if so then I think this is the way to go. =)

@docteurklein

@dnephin can you explain a bit more what you're doing in Dockerfiles to help this ?
Isn't the build phase unable to influence the runtime?

@dnephin
Member
dnephin commented Nov 10, 2014

@docteurklein I can. I fixed the link from above (https://github.com/dnephin/readthedocs.org/blob/fig-demo/dockerfiles/database/Dockerfile#L21)

The idea is that you do all the slower "setup" operations during the build, so you don't have to wait for anything during container startup. In the case of a database or search index, you would:

  1. start the service
  2. create the users, databases, tables, and fixture data
  3. shutdown the service

all as a single build step. Later when you fig up the database container it's ready to go basically immediately, and you also get to take advantage of the docker build cache for these slower operations.

@docteurklein

nice! thanks :)

@arruda
arruda commented Nov 11, 2014

@dnephin nice, hadn't thought of that .

@oskarhane

+1 This is definitely needed.
An ugly time delay hack would be enough in most cases, but a real solution would be welcome.

@dnephin
Member
dnephin commented Dec 5, 2014

Could you give an example of why/when it's needed?

@dacort
dacort commented Dec 5, 2014

In the use case I have, I have an Elasticsearch server and then an application server that's connecting to Elasticsearch. Elasticsearch takes a few seconds to spin up, so I can't simply do a fig up -d because the application server will fail immediately when connecting to the Elasticsearch server.

@ddossot
ddossot commented Dec 5, 2014

Say one container starts MySQL and the other starts an app that needs MySQL and it turns out the other app starts faster. We have transient fig up failures because of that.

@oskarhane

crane has a way around this by letting you create groups that can be started individually. So you can start the MySQL group, wait 5 secs and then start the other stuff that depends on it.
Works in a small scale, but not a real solution.

@arruda
arruda commented Dec 6, 2014

@oskarhane not sure if this "wait 5 secs" helps, in some cases in might need to wait more (or just can't be sure it won't go over the 5 secs)... it's isn't much safe to rely on time waiting.
Also you would have to manually do this waiting and loading the other group, and that's kind of lame, fig should do that for you =/

@aanand
Contributor
aanand commented Dec 6, 2014

@oskarhane, @dacort, @ddossot: Keep in mind that, in the real world, things crash and restart, network connections come and go, etc. Whether or not Fig introduces a convenience for waiting on a TCP socket, your containers should be resilient to connection failures. That way they'll work properly everywhere.

@ddossot
ddossot commented Dec 6, 2014

You are right, but until we fix all pre-existing apps to do things like gracefully recovering from the absence of their critical resources (like DB) on start (which is a Great Thingβ„’ but unfortunately seldom supported by frameworks), we should use fig start to start individual container in a certain order, with delays, instead of fig up.

I can see a shell script coming to control fig to control docker πŸ˜‰

@anentropic

I am ok with this not being built in to fig but some advice on best practice for waiting on readiness would be good

I saw in some code linked from an earlier comment this was done:

while ! exec 6<>/dev/tcp/${MONGO_1_PORT_27017_TCP_ADDR}/${MONGO_1_PORT_27017_TCP_PORT}; do
    echo "$(date) - still trying to connect to mongo at ${TESTING_MONGO_URL}"
    sleep 1
done

In my case there is no /dev/tcp path though, maybe it's a different linux distro(?) - I'm on Ubuntu

I found instead this method which seems to work ok:

until nc -z postgres 5432; do
    echo "$(date) - waiting for postgres..."
    sleep 1
done

This seems to work but I don't know enough about such things to know if it's robust... does anyone know if there's there a possible race condition between port showing up to nc and postgres server really able to accept commands?

I'd be happier if it was possible to invert the check - instead of polling from the dependent containers, is it possible instead to send a signal from the target (ie postgres server) container to all the dependents?

Maybe it's a silly idea, anyone have any thoughts?

@aanand
Contributor
aanand commented Dec 29, 2014

@anentropic Docker links are one-way, so polling from the downstream container is currently the only way to do it.

does anyone know if there's there a possible race condition between port showing up to nc and postgres server really able to accept commands?

There's no way to know in the general case - it might be true for postgres, it might be false for other services - which is another argument for not doing it in Fig.

@mindnuts
mindnuts commented Jan 8, 2015

@aanand I tried using your docker/wait image approach but i am not sure what is happening. So basically i have this "Orientdb" container which lot of other NodeJS app containers link to. This orientdb container takes some amount of time to start listening on the TCP port and this makes the other containers to get "Connection Refused" error.

I hoped that by linking wait container to Orientdb i will not see this error. But unfortunately i am still getting it randomly. Here is my setup (Docker version 1.4.1, fig 1.0.1 on an Ubuntu 14.04 Box):

orientdb:
    build: ./Docker/orientdb
    ports:
        -   "2424:2424"
        -   "2480:2480"
wait:
    build: ./Docker/wait
    links:
        - orientdb:orientdb
....
core:
    build:  ./Docker/core
    ports:
        -   "3000:3000"
    links:
        -   orientdb:orientdb
        -   nsqd:nsqd

Any help is appreciated. Thanks.

@aanand
Contributor
aanand commented Jan 8, 2015

@mindnuts the wait image is more of a demonstration; it's not suitable for use in a fig.yml. You should use the same technique (repeated polling) in your core container to wait for the orientdb container to start before kicking off the main process.

@MrMMorris

+1 just started running into this as I am pulling custom built images vs building them in the fig.yml. Node app failing because mongodb is not ready yet...

@kennu
kennu commented Jan 17, 2015

I just spent hours debugging why MySQL was reachable when starting WordPress manually with Docker, and why it was offline when starting with Fig. Only now I realized that Fig always restarts the MySQL container whenever I start the application, so the WordPress entrypoint.sh dies not yet being able to connect to MySQL.

I added my own overridden entrypoint.sh that waits for 5 seconds before executing the real entrypoint.sh. But clearly this is a use case that needs a general solution, if it's supposed to be easy to launch a MySQL+WordPress container combination with Docker/Fig.

@dnephin
Member
dnephin commented Jan 18, 2015

so the WordPress entrypoint.sh dies not yet being able to connect to MySQL.

I think this is an issue with the WordPress container.

While I was initially a fan of this idea, after reading docker/docker#7445 (comment), I think such a feature would be the wrong approach, and actually encourages bad practices.

There seem to be two cases which this issue aims to address:

A dependency service needs to be available to perform some initialization.

Any container initialization should really be done during build. That way it is cached, and the work doesn't need to be repeated by every user of the image.

A dependency service needs to be available so that a connection can be opened

The application should really be resilient to connection failures and retry the connection.

@kennu
kennu commented Jan 19, 2015

I suppose the root of the problem is that there are no ground rules as to whose responsibility it is to wait for services to become ready. But even if there were, I think it's a bit unrealistic to expect that developers would add database connection retrying to every single initialization script. Such scripts are often needed to prepare empty data volumes that have just been mounted (e.g. create the database).

The problem would actually be much less obtrusive if Fig didn't always restart linked containers (i.e. the database server) when restarting the application container. I don't really know why it does that.

@aanand
Contributor
aanand commented Jan 19, 2015

The problem would actually be much less obtrusive if Fig didn't always restart linked containers (i.e. the database server) when restarting the application container. I don't really know why it does that.

Actually it doesn't just restart containers, it destroys and recreates them, because it's the simplest way to make sure changes to fig.yml are picked up. We should eventually implement a smarter solution that can compare "current config" with "desired config" and only recreate what has changed.

Getting back to the original issue, I really don't think it's unrealistic to expect containers to have connection retry logic - it's fundamental to designing a distributed system that works. If different scripts need to share it, it should be factored out into an executable (or language-specific module if you're not using shell), so each script can just invoke waitfor db at the top.

@docteurklein

@kennu what about --no-recreate ? /cc @aanand

@kennu
kennu commented Jan 19, 2015

@aanand I meant the unrealism comment from the point of view the Docker Hub is already full of published images that probably don't handle connection retrying in their initialization scripts, and that it would be quite an undertaking to get everybody to add it. But I guess it could be done if Docker Inc published some kind of official guidelines / requirements.

Personally I'd rather keep containers/images simple though and let the underlying system worry about resolving dependencies. In fact, Docker's restart policy might already solve everything (if the application container fails to connect to the database, it will restart and try again until the database is available).

But relying on the restart policy means that it should be enabled by default, or otherwise people spend hours debugging the problem (like I just did). E.g. Kubernetes defaults to RestartPolicyAlways for pods.

@MrMMorris

any progress on this? I would like to echo that expecting all docker images to change and the entire community implement connection retry practices is not reasonable. Fig is a Docker orchestration tool and the problem lies in the order it does things so the change needs to be made in Fig, not Docker or the community.

@dnephin
Member
dnephin commented Jan 24, 2015

expecting all docker images to change and the entire community implement connection retry practices is not reasonable

It's not that an application should need to retry because of docker or fig. Applications should be resilient to dropped connections because the network is not reliable. Any application should already be built this way.

I personally haven't had to implement retries in any of my containers, and I also haven't needed any delay or waiting on startup. I believe most cases of this problem fall into these two categories (my use of "retry" is probably not great here, I meant more that it would re-establish a connection if the connection was closed, not necessarily poll for some period attempting multiple times).

If you make sure that all initialization happens during the "build" phase, and that connections are re-established on the next request you won't need to retry (or wait on other containers to start). If connections are opened lazily (when the first request is made), instead of eagerly (during startup), I suspect you won't need to retry at all.

the problem lies in the order [fig] does things

I don't see any mention of that in this discussion so far. Fig orders startup based on the links specified in the config, so it should always start containers in the right order. Can you provide a test case where the order is incorrect?

@thaJeztah
Member

I have to agree with @dnephin here. Sure, it would be convenient if compose/fig was able to do some magic and check availability of services, however, what would the expected behavior be if a service doesn't respond? That really depends on the requirements of your application/stack. In some cases, the entire stack should be destroyed and replaced with a new one, in other cases a failover stack should be used. Many other scenarios can be thought of.

Compose/Fig cannot make these decisions, and monitoring services should be the responsibility of the applications running inside the container.

@kennu
kennu commented Jan 25, 2015

I would like to suggest that @dnephin has merely been lucky. If you fork two processes in parallel, one of which will connect to a port that the other will listen to, you are essentially introducing a race condition; a lottery to see which process happens to initialize faster.

I would also like to repeat the WordPress initialization example: It runs a startup shell script that creates a new database if the MySQL container doesn't yet have it (this can't be done when building the Docker image, since it's dependent on the externally mounted data volume). Such a script becomes significantly more complex if it has to distinguish generic database errors from "database is not yet ready" errors and implement some sane retry logic within the shell script. I consider it highly likely that the author of the image will never actually test the startup script against the said race condition.

Still, Docker's built-in restart policy provides a workaround for this, if you're ready to accept that containers sporadically fail to start and regularly print errors in logs. (And if you remember to turn it on.)

Personally, I would make Things Just Work, by making Fig autodetect which container ports are exposed to a linked container, ping them before starting the linked container (with a sane timeout), and ultimately provide a configuration setting to override/disable this functionality.

@thaJeztah
Member

this can't be done when building the Docker image, since it's dependent on the externally mounted data volume

True. An approach here is to start just the database container once (if needed, with a different entrypoint/command), to initialise the database, or use a data-only container for the database, created from the same image as the database container itself.

Such a script becomes significantly more complex if it has to distinguish generic database errors from "database is not yet ready" errors

Compose/Fig will run into the same issue there; How to check if MySQL is up, and accepting connections? (and PostgreSQL, and (insert your service here)). Also, where should the "ping" be executed from? Inside the container you're starting, from the host?

As far as I can tell, the official WordPress image includes a check to see if MySQL is accepting connections in the docker-entrypoint.sh

@kennu
kennu commented Jan 25, 2015

@thaJeztah "Add some simple retry logic in PHP for MySQL connection errors" authored by tianon 2 days ago - Nice. :-) Who knows, maybe this will become a standard approach after all, but I still have my doubts, especially about this kind of retry implementations actually having being tested by all image authors.

About the port pinging - I can't say offhand what the optimal implementation would be. I guess maybe simple connection checking from a temporary linked container and retrying while getting ECONNREFUSED. Whatever solves 80% (or possibly 99%) of the problems, so users don't have to solve them by themselves again and again every time.

@thaJeztah
Member

@kennu Ah! Thanks, wasn't aware it was just added recently, just checked the script now because of this discussion.

To be clear, I understand the problems you're having, but I'm not sure Compose/Fig would be able to solve them in a clean way that works for everyone (and reliably). I understand many images on the registry don't have "safeguards" in place to handle these issues, but I doubt it's Compose/Fig's responsibility to fix that.

@thaJeztah
Member

Having said the above; I do think it would be a good thing to document this in the Dockerfile best practices section.

People should be made aware of this and some examples should be added to illustrate how to handle service "outage". Including a link to the WikiPedia article that @dnephin mentioned (and possibly other sources) for reference.

@soupdiver

I ran into the same problem and like this idea from @kennu

Personally, I would make Things Just Work, by making Fig autodetect which container ports are exposed to a linked container, ping them before starting the linked container (with a sane timeout), and ultimately provide a configuration setting to override/disable this functionality.

I think this would solve a lot typical use cases, like for me when depending on the official mongodb container.

@MrMMorris

I agree with @soupdiver. I am also having trouble in conjunction with a mongo container, and although I have it working with a start.sh script, the script is not very dynamic and adds another file I need to keep in my repo (I would like to just have a Dockerfile and docker-compose.yml in my node repo). It would be nice if there were some way to just Make It Work, but I think something simple like a wait timer won't cut it in most cases.

@schmunk42
Contributor

IMO pinging is not enough, because the basic network connection may be available, but the service itself is still not ready.
This is the case with the MySQL image for example, using curl or telnet for the connection check on the exposed ports would be safer, although I don't know if it would be enough. But most containers don't have these tools installed by default.

Could docker or fig handle these checks?

@thaJeztah
Member

Could docker or fig handle these checks?

In short: no. For various reasons;

  • Performing a "ping" from within a container would mean running a second process. Fig/Compose cannot automatically start such process, and I don't think you'd want Fig/Compose to modify your container by installing software (such as curl or telnet) in it.
  • (As I mentioned in a previous comment), each service requires different way to check if it is accepting connections / ready for use. Some services may need credentials or certificates to establish a connection. Fig/Compose cannot automatically invent how to do that.
@schmunk42
Contributor

and I don't think you'd want Fig/Compose to modify your container by installing software (such as curl or telnet) in it.

No, for sure not.

Fig/Compose cannot automatically invent how to do that.

Not invent. I was thinking more about an instruction for fig or docker, how to check it, eg.

web:
    image: nginx
    link: db
db:
   is_available: "curl DB_TCP_ADDR:DB_TCP_PORT"

The telnet command would be executed on the docker-host, not in the container.
But I am just thinking loud, I know that this is not the perfect solution. But the current way of using custom check-scripts for the containers could be improved.

@thaJeztah
Member

The telnet command would be executed on the docker-host, not in the container.

Then curl or <name a tool that's needed> would have to be installed on the host. This could even have huge security issues (e.g. someone wants to be funny and uses is_available: "rm -rf /"). Apart from that, being able to access the database from the host is no guarantee that it's also accessible from inside the container.

But I am just thinking loud, ...

I know, and I appreciate it. Just think there's no reliable way to automate this, or would serve most use-cases. In many cases you'd end up with something complex (take, for example, the curl example; how long should it try to connect? Retry?). Such complexity is better to move inside the container, which would also be useful if the container was started with Docker, not Fig/Compose.

@schmunk42
Contributor

@thaJeztah I totally agree with you. And it's very likely that there will be no 100% solution.

@silarsis
silarsis commented Feb 9, 2015

I’m going to repeat a suggestion I made earlier: It would be sufficient for me if I could state in the fig.yml β€œwait for this container to exit before running this other container”.

This would allow me to craft a container that knows how to wait for all it’s dependencies - check ports, initialise databases, whatever - and would require fig know as little as possible.

I would see it configured as something like:

β€œβ€"
app:
links:
- db:db
prereqs:
- runthisfirst

runthisfirst:
links:
- db:db
β€œβ€β€

runthisfirst has a link that means the database starts up so it can check access. app will only run once runthisfirst has exited (bonus points if runthisfirst has to exit successfully).

Is this feasible as an answer?

KJL

On 10 Feb 2015, at 05:28, Tobias Munk notifications@github.com wrote:

@thaJeztah https://github.com/thaJeztah I totally agree with you. And it's very likely that there will be no 100% solution.

β€”
Reply to this email directly or view it on GitHub #374 (comment).

@jgeiger
jgeiger commented Feb 27, 2015

I've just tried migrating my shell script launchers and ran into this issue. It would be nice even just to add a simple sleep/wait key that just sleeps for that number of seconds before launching the next container.

db:
  image: tutum/mysql:5.6
  sleep: 10
app:
  link:
    - db:db
@prologic

I really dno't like this for a number of reasons.

a) I think it's the wrong place for this
b) How long do you sleep for?
c) What if the timeout is not long enougH?

Aside from the obvious issues I really don't think
infrastructure should care about what the application
is and vice versa. IHMO the app should be written to be
more tolerant and/or smarter about it's own requirements.

That being said existing applications and legacy applications
will need something -- But it should probably be more along
the lines of:

a docker-compose.yml:

db:
  image: tutum/mysql:5.6
app:
  wait: db
  link:
    - db:db

Where wait waits for "exposed" services on db to become available.

The problem is how do you determine that?

In the simplest cases you wait until you can successfully open
a tcp or udp connection to the exposed services.

@mattwallington

This might be overkill for this problem but what would be a nice solution is if docker provided an event triggering system where you could initiate a trigger from one container that resulted in some sort of callback in another container. In the case of waiting on importing data into a MySQL database before starting another service, just monitoring whether the port was available isn't enough.

Having an entrypoint script set an alert to Docker from inside the container (set a pre-definied environment variable for example) that triggered an event in another container (perhaps setting the same synchronized environment variable) would enable scripts on both sides to know when certain tasks are complete.

Of course we could set up our own socket server or other means but that's tedious to solve a container orchestration issue.

@n3llyb0y

@aanand I almost have something working using your wait approach as the starting point. However, there is something else happening between docker-compose run and docker run where the former appears to hang whilst the later works a charm.

example docker-compose.yml:

db:
  image: postgres
  ports:
    - "5432"
es:
  image: dockerfile/elasticsearch
  ports:
    - "9200"
wait:
  image: n3llyb0y/wait
  environment:
    PORTS: "5432 9200"
  links:
    - es
    - db

then using...

docker-compose run wait

however this is not to be. The linked services start and it looks like we are about to wait only for it to choke (at least within my virtualbox env. I get to the nc loop and we get a single dot then...nothing).

However, with the linked services running I can use this method (which is essentially what I have been doing for our CI builds)

docker run -e PORTS="5432 9200" --links service_db_1:wait1 --links service_es_1:wait2 n3llyb0y/wait

It feels like docker-compose run should work in the same way. The difference is that when using docker-compose run with the detach flag -d you get no wait benefit as the wait container backgrounds and I think (at this moment in time) that not using the flag causes the wait to choke on the other non-backgrounded services. I am going to take a closer look

@n3llyb0y

After a bit of trial and error it seems the above approach does work! It's just the busybox base doesn't have a netcat util that works very well. My modified version of @aanand wait utility does work against docker-compose 1.1.0 when using docker-compose run <util label> instead of docker-compose up. Example of usage in the link.

Not sure if it can handle chaining situations as per the original question though. Probably not.

Let me know what you think.

@adrianhurt

This is a very interesting issue. I think it would be really interesting to have a way that one container waits until another one it's ready. But as everybody says, what does ready mean? In my case I have a container for MySQL, another one that manage its backups and is also in charge of import an initial database, and then the containers for each app that need the database. It's obvious that to wait the ports to be exposed is not enough. First the mysql container must be started and then the rest should wait until the mysql service is ready to use, not before. To get that, I have needed to implement a simple script to be executed on reboot that uses the docker exec functionality. Basically, the pseudo-code would be like:

run mysql
waitUntil "docker exec -t mysql mysql -u root -prootpass database -e \"show tables\""
run mysql-backup
waitUntil "docker exec -t mysql mysql -u root -prootpass database -e \"describe my_table\""
run web1
waitUntil "dexec web1 curl localhost:9000 | grep '<h1>Home</h1>'"
run web2
waitUntil "dexec web2 curl localhost:9000 | grep '<h1>Home</h1>'"
run nginx

Where waitUntil function has a loop with a timeout that evals the docker exec … command and check if the exit code is 0.

With that I assure that every container waits until its dependencies are ready to use.

So I think it could be an option to integrate within compose utility. Maybe something like that, where wait_until declares a list of other dependencies (containers) and waits for each one until they respond ok to the corresponding command (or maybe with an optional pattern or regex to check if the result matches to something you expect, even though using grep command could be enough).

mysql:
  image: mysql
  ...
mysql-backup:
  links:
   - mysql
  wait_until:
   - mysql: mysql -u root -prootpass database -e "show tables"
  ...
web1:
  links:
   - mysql
  wait_until:
   - mysql: mysql -u root -prootpass database -e "describe my_table"
  ...
web2:
  links:
   - mysql
  wait_until:
   - mysql: mysql -u root -prootpass database -e "describe my_table"
  ...
nginx:
  links:
   - web1
   - web2
  wait_until:
   - web1: curl localhost:9000 | grep '<h1>Home</h1>'
   - web2: curl localhost:9000 | grep '<h1>Home</h1>'
  ...
@robsonpeixoto

Wha not a simple eait for the port like it?
http://docs.azk.io/en/azkfilejs/wait.html#

@mattwallington

@robsonpeixoto: Waiting for the port isn't sufficient for a lot of use cases. For example, let's say you are seeding a database with data on creation and don't want the web server to start and connect to it until the data operation has completed. The port will be open the whole time so that wouldn't block the web server from starting.

@mattwallington

Something like AWS CloudFormation's WaitCondition would be nice. http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-waitcondition.html

@deanpcmad

+1 I'm having the same issue when using Docker for testing my Rails apps which depend on MySQL

@AwokeKnowing

+1 I have this issue too. I like @adrianhurt idea, where you actually supply the condition to be evaluated to determine if the wait is complete. That way you still have a nice declarative yml, and you don't have to have an arbitrary definition of "ready".

@rkettelerij

+1

@anentropic

I've had this tab open for a while: http://crosbymichael.com/docker-events.html ...seems relevant

@tuscland
tuscland commented May 4, 2015

+1

@kkamkou
kkamkou commented May 4, 2015

+1 for simple timeout

@ryneeverett

+1 for a ready condition

@ddohler ddohler referenced this issue in azavea/ashlar May 6, 2015
Merged

Set up local tests using Docker #30

@fdellavedova

+1

@schmunk42
Contributor

I am solving this very reliably on the application level since a while, like it was recommended in this thread.

Just to give you an idea how this can be implemented for MySQL + PHP here's my code.

From igorw/retry :)

Since the network is reliable, things should always work. Am I right? For those cases when they don't, there is retry.

@rfink
rfink commented May 20, 2015

+1

@aanand
Contributor
aanand commented May 20, 2015

@schmunk42 Nice stuff - I like that it's a good example of both establishing the connection and performing an idempotent database setup operation.

@thaJeztah
Member

Might be good to create a (some) basic example(s) for inclusion in the docs, for different cases, e.g. NodeJS, Ruby, PHP.

@yeasy
yeasy commented May 21, 2015

+1, at least should provide some options to add some delay before the container starts successfully.

@Silex
Silex commented Jun 3, 2015

+1

@robsonpeixoto

How to solve problems when you try to connect services that aren't your code.
For example, if a have the service Service and the database InfluxDB. Services requires InfluxDB and the InfluxDB has a slow startup.

How can docker-compose wait for InfluxDB be ready?

I the code is my, I can solve it putting a retry. But for third app I can't changes the code.

@artem-sidorenko

@robsonpeixoto there are some examples in this ticket with netcat or similar ways. You can take a look to my MySQL example in another ticket: docker/docker#7445 (comment)

@adrianhurt

That's the reason I think each container should have the optional ability to indicate its own readiness. For a DB, for example, I want to wait until the service is completely ready, not when the process is created. I solve this with customized checks with docker exec and checking if it can solve a simple query, for example.

Some optional flag for docker run to indicate an internal check command would be great to later link it from another container using a special flag for the link.

Something like:

$ sudo docker run -d --name db training/postgres --readiness-check /bin/sh -c "is_ready.sh"
$ sudo docker run -d -P --name web --link db:db --wait-for-readiness db training/webapp python app.py

Where is_ready.sh is a simple boolean test which is in charge of the decision of when the container is considered as ready.

@ringanta

+1

@zheli
zheli commented Jun 26, 2015

@schmunk42 nice quote!

@realulim

To me it's not a good idea to hard-code an arbitrary collection of "availability checks". There are numerous situations that are specific to one kind of deployment and you can never cover them all. Just as an example, in my multi-container app I need to wait for a certain log message to appear in a certain log file - only then will the container service be ready.
Instead what's needed is an SPI that I can implement. If Docker provides some example implementations for the most frequent use cases (e. g. TCP connect), that's fine. But there needs to be a way for me plug in my own functionality and have Docker call it.
Docker Compose is pretty much useless to me as a whole product, if I can't get my containers up and running dependably. So a stable and uniform "container service readiness SPI" is needed. And "ready" should not be a boolean, as there are possibly more levels of readiness (such as: "now you can read" and "now you can write").

@carloscarcamo carloscarcamo added a commit to carloscarcamo/docker-example-node-app-nginx that referenced this issue Feb 13, 2016
@carloscarcamo carloscarcamo Updates
- Node.js container no longer need to wait for mongo container exposes its TCP port. These should be handled on node.js app, more info about this on docker/compose#374
- Environment variables for node.js App
2584584
@carloscarcamo carloscarcamo added a commit to carloscarcamo/docker-example-node-app-nginx that referenced this issue Feb 13, 2016
@carloscarcamo carloscarcamo Updates
- Node.js container no longer needs to wait for mongo container exposes its TCP port. These should be handled on node.js app, more info about this on docker/compose#374
- Environment variables for node.js App
83337ab
@gittycat

@realulim Good writeup. I fully agree with the idea of letting us define what a service's "ready" state means via plugins. I also think it's a good idea to have a default for the plugin that only checks that a service is listening to a http/tcp connection. That would cover a majority of cases right there.

@kulbida
kulbida commented Feb 27, 2016

This is what I came up with, in entrypoint file;

until netcat -z -w 2 database 5432; do sleep 1; done
# do the job here, database host on port 5432 accepts connections
@pgporada

@kulbida ,
I do something very similar with MySQL. "database" in this case is a link in a compose file.

if [[ "$APP_ENV" == "local" ]]; then
    while ! mysqladmin ping -h database --silent; do
        sleep 1
    done
    # Load in the schema or whatever else is needed here.
fi
@mglasgow42

There have been some comments in this thread which claim that startup ordering is only a subset of application level error recovery, which your application should be handling anyway. I would like to offer up one example to illustrate where this might not always be the case. Consider if some services depend on a clustered database, and whenever a quorum is lost due to a crash etc, you do not want to automatically retry from the app. This could be the case for example if database recovery requires some manual steps, and you need services to remain unambiguously down until those steps are performed.

Now the app's error handling logic may be quite different from the startup logic:

  • If the db is down because we're just starting up, wait for it to become available.
  • If the db is down because it crashed, log a critical error and die.

It may not be the most common scenario, but you do see this pattern occasionally. In this case, clustering is used to solve the "network is unreliable" problem in the general case, which changes some of the expectations around which error conditions should be retried in the app. Cluster crashes can be rare enough, and automatically restarting them can be risky enough, that manually restarting services is preferred to retrying in the application. I suspect there are other scenarios as well which might challenge assumptions around when to retry.

More generally, I'm claiming that startup ordering and error handling are not always equivalent, and that it's appropriate for a framework to provide (optional) features to manage startup order. I do wonder if this belongs in docker-engine, though, rather than compose. It could be needed anytime docker starts up, regardless of whether compose is used.

@dnephin
Member
dnephin commented Mar 14, 2016

There is a discussion starting on the docker engine repo in proposal docker/docker#21142 to add support for health checking. Once this support is available it will be possible for Compose to provide a way to configure it, and use it for a delayed start up.

@armab armab referenced this issue in StackStorm/st2-dockerfiles Mar 16, 2016
Open

Make containers wait for required services before starting #11

@konobi
konobi commented Mar 25, 2016

How about using the filesystem to check for the existence of a file?

ready_on: /tmp/this_container_is_up_and_ready

That way it's up to the container developer to decide when things are UP, but compose can wait until the container declares itself ready. It's an explicit convention, but could be easily added as an additional layer to images that don't have that behaviour..

@abesto abesto referenced this issue in openzipkin/docker-zipkin Apr 2, 2016
Closed

errors from zipkin-query on restart #74

@alexch
alexch commented Apr 13, 2016

Built-in support for health checks will be good; in the meantime here's the hack I got working in my local docker-compose setup:

    nginx:
        image: nginx:latest
        command: /bin/bash -c "sleep 2 && echo starting && nginx -g 'daemon off;'"
        ...

(In production, my app proxies to a few already-running upstream servers using proxy_pass; in local dev and test, I start docker instances of these, and nginx needs to wait a bit for them to start, else it crashes and dies. The daemon off thing keeps nginx in a single process, else docker will stop the container as soon as the parent process spawns its daemon child.)

@p-a-s-c-a-l p-a-s-c-a-l referenced this issue in cismet/cids-docker-images Apr 14, 2016
Closed

cids-server does not wait for integration-base service #1

@1ma
1ma commented Apr 20, 2016 edited

Just to add my two cents, if you happen to be using the ANT build tool it comes with builtin support to delay execution until a certain socket is open.

Our Jenkins CI server spins up the project containers with Docker Compose and then runs ANT from within the main container, like this:

docker-compose up -d
docker exec -it projectx-fpm-jenkins ant -f /var/www/projectX/build.xml

This is the relevant piece of configuration from the docker-compose.yml file. Note that, as discussed above, making fpm depend on mysql is not enough to guarantee that the MySQL service will be ready when it is actually needed.

version: '2'
services:
  nginx:
    build: ./docker/nginx
    depends_on:
      - fpm
  fpm:
    build: ./docker/fpm
    depends_on:
      - mysql
  mysql:
    image: mysql:5.7
    environment:
      - MYSQL_ROOT_PASSWORD=projectx
      - MYSQL_DATABASE=projectx

But you can wait for it during the ANT task:

<!-- other targets... -->

<target name="setup db">
    <!-- wait until the 3306 TCP port in the "mysql" host is open -->
    <waitfor>
        <socket server="mysql" port="3306"/>
    </waitfor>

    <exec executable="php">
        <arg value="${consoledir}/console"/>
        <arg value="doctrine:database:create"/>
        <arg value="--no-interaction"/>
    </exec>
</target>
@skorokithakis

@kulbida That did the trick, thanks. Something a bit faster:

while ! nc -w 1 -z db 5432; do sleep 0.1; done
@syamsathyan
syamsathyan commented May 5, 2016 edited

depends_on might solve the issue.
From docker-compose documentation.
Express dependency between services, which has two effects:

  1. docker-compose up will start services in dependency order. In the following example, db and redis will be started before web.
  2. docker-compose up SERVICE will automatically include SERVICE’s dependencies. In the following example, docker-compose up web will also create and start db and redis.

version: '2'
services:
web:
build: .
depends_on:
- db
- redis
redis:
image: redis
db:
image: postgres

@alexch : at a customer side performance test(micro-service routed via nginx+). Dockerized nginx test - a dip in load from very highs to a near zero low was repeating every 1-2 mins. Finally decided to go with non-dockerized Nginx running as a VM (just because of the huge performance difference), maybe a network driver plugin / libNetwork issue.

@nottrobin

@syamsathyan depends_on doesn't appear to help.

@nottrobin

@skorokithakis, @kulbida this is a nice solution. Unfortunately, netcat isn't available by default in any of the services that I need to connect to my database (including postgres). Do you know of any alternative method?

@skorokithakis

@nottrobin I'm afraid not, I just installed it in my image :/

@syamsathyan

@nottrobin my team is working on this, will let you know in a day or two!

@typekpb
typekpb commented Jun 9, 2016 edited

For those having recent bash, there is a netcat-free solution (inspired by: http://stackoverflow.com/a/19866239/1581069):

while ! timeout 1 bash -c 'cat < /dev/null > /dev/tcp/db/5432'; do sleep 0.1; done

or less verbose version:

while ! timeout 1 bash -c 'cat < /dev/null > /dev/tcp/db/5432' >/dev/null 2>/dev/null; do sleep 0.1; done
@nottrobin

@typekpb that works perfectly. Thanks!

@CpuID
CpuID commented Jun 9, 2016

Now that HEALTHCHECK support is merged upstream as per docker/docker#23218 - this can be considered to determine when a container is healthy prior to starting the next in the order. Half of the puzzle solved :)

@Soullivaneuh

Now that HEALTHCHECK support is merged upstream as per docker/docker#23218 - this can be considered to determine when a container is healthy prior to starting the next in the order. Half of the puzzle solved :)

Looks good. How to implement it on docker-compose.yml?

@CpuID
CpuID commented Jun 10, 2016

Looks good. How to implement it on docker-compose.yml?

The other piece of the puzzle will be having docker-compose watch for healthy containers, and use something like the depends_on syntax mentioned further up in this issue. Will require patches to docker-compose to get things working.

Also note that the health check feature in Docker is currently unreleased, so will probably need to align with a Docker/Docker Compose release cycle.

@DmitryEfimenko

I wrote a js library that has a method .waitForPort(). Just like it was mentioned before, this might not work for all situations, but could do just fine for majority of use-cases.
See my blog.

@aanand
Contributor
aanand commented Jun 13, 2016

The HEALTHCHECK merge is great news.

In the meantime, this document describes the problem and some solutions.

@1ma
1ma commented Jun 28, 2016 edited

@pablofmorales Nope, because depends_on just checks that the container is up.

Some daemons need some extra time to bootstrap themselves and start listening to their assigned ports and addresses, most notably MySQL.

@konobi
konobi commented Jun 29, 2016

I'm still thinking a "READY_ON" declaration is still the best overall. It leaves the decision about when something's ready to the container itself, regardless of image, it's explicit in opt-ing into and the resourcepath (within container) functionality in the Docker Remote API ensures minimal changes needed.

The behaviour of when a container is "up" is the only affect this should have. It'll only report as "up" when the READY_ON file exists.

I think this is 90% of the behaviour that everyone's been discussing. I think "healthcheck" here is getting conflated as 2 different events, but trying to cramp it into one. One is "ready" for chain of events when spinning up infrastructure, the other is "health" so that infrastructure can be kept up.

"ready" is totally an appropriate place for docker to be helping out. As for "health", it's so varied in terms of systems, I think it's up to the container to deal with that.

For a better alternate to healthcheck, you might want to look at something like containerpilot that covers not just health, but service discovery and monitoring too. https://github.com/joyent/containerpilot

@skorokithakis

Yes, this is an accurate and important distinction. However, how will containers write that file without images becoming significantly more complicated? It seems to me that it would require a wrapper script for every single container that wants to use this.

@konobi
konobi commented Jun 29, 2016

Well, you'd have to kick off a script to initialize the instance anyway... the last thing that script needs to do is touch a file. To me, that seems much easier than attempting to running an exec on a remote machine to do a health check. At least with a touch file, it can be watched, etc. entirely via API passively without needing to enter the context of the container.

@skorokithakis

I agree, but many containers don't use a script, they just install a service like Postgres or Redis and let it start up without watching it.

@pablofmorales

In my case, I'm using Kong API Gateway

Before run the kong container I just check if Cassandra is working with this script

while true; do
    CHECK=`kong-database/check`
    if [[ $CHECK =~ "system.dateof" ]]; then
        break
    fi
    sleep 1;
done

the check file contain this

#!/bin/bash
docker cp cassandra-checker kong-database:/root/
docker exec -i kong-database cqlsh -f /root/cassandra-checker

cassandra-checker is just a simple query

SELECT dateof(now()) FROM system.local ;
@konobi
konobi commented Jun 29, 2016

Sure, but the alternate is a healthcheck, which requires a script that you'd have to write anyway, so there's no overhead difference. It's also an explicit opt-in, which means that you're stating you want this behaviour. As for something that doesn't run a script, you could always have a ready_on path check for a pid file or a unix socket; which wouldn't require a script.

@skorokithakis

That's true, you're right.

@mglasgow42

Checking for the existence of a file may be fine for a lot of cases, but forcing containers to use a startup script when they wouldn't otherwise need one is a nuisance. Why can't there also be checks for other very simple conditions? Especially useful would be waiting until the process is listening on a particular tcp port.

@konobi
konobi commented Jun 30, 2016

This idea is opt-in, so there's no forcing of anything. Infact you're being explicit in saying what should be expected.

A tcp port listening may not be sufficient to tell when a container has been initialized as there may be a bunch of setup data that needs run. Hell, if you connect to a postgres container too quickly, even over tcp, you'll get an error stating that the db isn't ready yet.

@mglasgow42

If I understand you correctly, it's "opt-in, or else you can't use this feature". Ergo, if I need this feature and my app doesn't use a pid file, I'm forced to use a startup script.

For MySQL (the OP's case), once it's listening, it's ready. They go to a lot of trouble to ensure that's true, probably for cases much like this one. My take is that there is probably a short list of conditions that could be enumerated such that you could "opt-in" configuring a ready check against any of those conditions. I see no reason it has to be done one and only one way.

@konobi
konobi commented Jun 30, 2016

For mysql, once it's listening, it's not ready. In the simple one node case it'll be ready, but if you have more than one node, then it certainly won't be ready yet. I understand what you mean by "one and only one way", but i think as a base abstraction it's just perfect. I see it more as a spot where you can apply whatever tooling you want. Heck, your script could even communicate with external services and have them verify the container, in which case your external services could signal your container agent to write the file. Flexibility ftw.

If you attempt anything thing in this list of "conditions" there will ALWAYS be a case where it doesn't work. However touching a file will always work, since the image knows when it believes it's ready (oh, i have to wait on other hosts, i need for files to be downloaded, i need to make sure $external_service is also available, I span up properly, but for some reason I don't have the correct permissions to the database, why is this image readonly... etc. etc.

These sorts of scripts already exist all over the place... hell it's already been necessary to write these scripts because we haven't had functionality like this before. So dropping in a script like this is minimal, since it's likely a script already exists.

@konobi
konobi commented Jun 30, 2016

Another likely case, is that you'd have something like chef or ansible run against that host and then write the file.

@Joshfindit

If it's a question of a Docker-side check, then something like;

UPCHECK --port=7474 --interval=0.5s --response="Please log in"

For the record I think the file solution has a lot of merit, but it also introduces complexity.
80% of the time, verifying the tcp response would work just fine.

@konobi
konobi commented Jun 30, 2016

well... i suppose:

UPCHECK --file=/tmp/container_is_ready --interval=0.5s --timeout=2m

Is just the same.

@dansteen
dansteen commented Jun 30, 2016 edited

I'm actually working on a re-implementation of docker-compose that adds functionality to wait for specific conditions. It uses libcompose (so I don't have to rebuild the docker interaction) and adds a bunch of config commands for this. Check it out here: https://github.com/dansteen/controlled-compose

Note, that the code is finished, but I'm waiting on a couple of upstream issues to be resolved before this will be able to be really used.

@jammycakes jammycakes added a commit to jammycakes/passwords that referenced this issue Jul 20, 2016
@jammycakes jammycakes Wait for database container to initialise, then run the migrations
Docker doesn't wait for all services in a container to become ready
before spinning up its dependencies. This gets round it by adding a
three second delay, then running the migrations.

For more information, see: docker/compose#374
ce44c61
@aelsabbahy

Goss can be used as a fairly flexible shim to delay container startup, I've written a blog post explaining how this can be accomplished with a minor change to your image here:

Kubernetes has the concept of init-containers I wonder if compose/swarm would benefit from a similar concept.

@piotr-s-brainhub

+1

@starx
starx commented Sep 20, 2016 edited

I think it's better to let the service you are exposing on a container decide whether or not it is ready or capable of exposing its service.

For example for a PHP application might depend on MySQL's connection. So on the ENTRYPOINT of PHP Container, I wrote something like this.

#!/bin/bash
cat << EOF > /tmp/wait_for_mysql.php
<?php
\$connected = false;
while(!\$connected) {
    try{
        \$dbh = new pdo( 
            'mysql:host=mysql:3306;dbname=db_name', 'db_user', 'db_pass',
            array(PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION)
        );
        \$connected = true;
    }
    catch(PDOException \$ex){
        error_log("Could not connect to MySQL");
        error_log(\$ex->getMessage());
        error_log("Waiting for MySQL Connection.");
        sleep(5);
    }
}
EOF
php /tmp/wait_for_mysql.php
# Rest of entry point bootstrapping

This way, I can add any logic to ensure that the dependencies of the service I am exposing i.e. php has been resolved.

@realulim

Nabin Nepal schrieb:

I think it's better to let the service you are exposing on a container decide whether or not it is ready or capable of exposing its service.

You can of course hardcode this behavior into every container that uses your
MySql container. But if something in your MySql service changes, then you are
changing all dependent containers, not to speak of the repetetive coding
needed in each. This is not DRY, there is no stable contract and thus it will
lead to brittle systems.

From a software craftsmanship standpoint there should be some kind of
"Container Readiness SPI", which the container developer can implement. On
the other side there should be a "Container Readiness API", which the
services can depend on.

Ulrich

@starx
starx commented Sep 20, 2016 edited

@realulim I agree that any change in the MySQL container has to be replicated or propagated to all affected or linked containers.

However, if the change is about parameters like DB_HOST, DB_NAME, DB_USER and DB_PASSWORD. These could be passed as an ARG (argument) and be shared by all related container. If you using docker-compose.yml file then, the change happens on one file.

And totally agree that having an API to check for container's readiness being the real way of solving this but I still believe that the service being exposed would be a better candidate to declare this.

@piotr-s-brainhub

a workaround until nc -z localhost 27017; do echo Waiting for MongoDB; sleep 1; done

@starx
starx commented Sep 23, 2016

@piotr-s-brainhub From the comments above it mentions that having an open port does not mean that the service is ready.

@tartakynov
tartakynov commented Sep 29, 2016 edited

Can we have optional condition of readiness which can be triggered either by logs, port opening or time delay? Something like:

ready_when:
  in_logs: `MySQL init process done`
  ports_open:
  - 3306
@korya
korya commented Nov 2, 2016 edited

I just realized that waiting for dependency containers to become ready can be easily implemented with tools like ansible. Did anyone use that approach? Can you easily replace docker-compose with ansible/chef/puppet? Any project on github demonstrating this approach?

Note: I understand the importance of writing a robust service that can run even when its dependencies are unavailable at the moment. That's not the question.

@djui
djui commented Nov 2, 2016

I solved this nowadays with a tool I wrote: https://github.com/betalo-sweden/await

It can wait until a given list of resources is available, and continue with what you want to continue, either by going to the next command implicitly or calling it explicitly.

@derekmahar

@djui, what does await do while it is waiting for a given resource?

@djui
djui commented Nov 2, 2016

@derekmahar It polls. It has a default timeout of 60 seconds. Every time it can't see the resource, it will just retry in 1s intervals. Currently it doesn't do concurrent resource detection, so it's sequential, but that turned out to be good enough and can be fixed.

I use it in the following scenario:

I spin up a docker-compose infrastructure and then run an integration test driver. The driver service gets started only after all components in the infrastructure are available, using await; so await eventually calls the driver's run command.

@mixja
mixja commented Nov 25, 2016 edited

Here's a way to do this with the new Docker HEALTHCHECK directive using make:

https://gist.github.com/mixja/1ed1314525ba4a04807303dad229f2e1

[UPDATE: updated gist to deal with if the container exits with an error code, as Docker 1.12 somewhat stupidly reports Healthcheck status on the stopped container as "starting"]

@habitullence

Thanks @mixja, nice solution.

@korya
korya commented Nov 25, 2016

@mixja, nice solution! That's exactly the functionality I would expect to come out of the box. But now the question is if you start your containers manually, why do you need docker-compose at all?

@sslavic
sslavic commented Nov 25, 2016

For testing I use https://github.com/avast/docker-compose-gradle-plugin and it uses Docker healthcheck as well - no more artificial pauses, faster builds.

@mixja
mixja commented Nov 29, 2016

@korya - Docker compose is not really an orchestration tool - it is more of an environment specification and management tool. I use Make to provide procedural style orchestration over Docker Compose and Docker (and other tools as required). The combination of Make, Docker and Docker Compose is very powerful and you can achieve a lot of different scenarios with these building blocks.

@korya
korya commented Nov 29, 2016

@mixja well, may be you are right. But as many people pointed in this thread, an orchestration functionality is very needed in test environments, and when there is docker-compose in your toolbox it is very tempting to require this kind of functionality from docker-compose.

Indeed, according to the docs "Compose is a tool for defining and running multi-container Docker applications". Although it does not say that compose is an orchestration tool, I think that from user's perspective (e.g. myself) it is natural to expect from "a tool for defining and running multi-container Docker applications" to support basic dependency management between the managed containers out of the box.

I am not saying that the tool has to support it. All I am saying is that it is very natural to expect it. Otherwise everyone has to come up with their super smart ways to do it. In fact, we use a bash script doing something similar to what your makefile does.

@djui
djui commented Nov 30, 2016 edited

@mixja @korya I would like to improve my tool await and would like to ask you for feedback what you Makefile versions provide that is missing/more convenient/enabling over await.

It seems the healthcheck+make version seems to be a "global" view, no single container knows the global state (but the makefile does) and await is a "local" view, each enabled container knows (only) what it needs to know, similar to depends_on or links. Furthermore you prefer to ship the container with the tools required for the healthcheck (which sometimes is the default, eg. mysqlshow) and otherwise leave the Dockerfile untouched. Additionally you seem to use docker-compose not mainly for the composition anymore but mainly for the flexible configuration (e.g. docker-compose up -d mysql should be equivalent to docker run -d -e ... -v ... -p ... mysql).

@mixja
mixja commented Dec 1, 2016

Hi @djui - it's probably a philosophical point of view, but I think the whole premise of the HEALTHCHECK is promoting the right behaviour - i.e. a container can provide a means of establishing container health, without any external dependencies.

This by no means detracts from the value of having something external verify connectivity, however I would typically run a suite of acceptance tests to cover this as you want to verify connectivity and a whole lot more (i.e. application functionality). Of course you can't generally run this level of testing until a complete environment has been established and the scope of your await tool and other approaches I've used in the past (Ansible playbooks wrapped in an agent container) is really focused on getting the environment setup orchestrated correctly (not the end goal of acceptance testing) and until now was really the only approach available in a Docker world.

With Docker 1.12 we now have a means to introspect the Docker environment and the ability to use well-established constructs (i.e. bash/shell mechanisms) to "await" a certain state, of course as long as our containers have defined their own health checks. I see more value in leveraging the native capabilities of the platform and encouraging container owners to define their own health checks, rather than relying on the historical external (I've started my application process, it's no longer my problem) approach we have had to resort to.

As a related analogy consider AWS CloudFormation and the concept of autoscaling groups and orchestrating rolling updates. How does CloudFormation know if a new instance is "healthy" ready to go and we can kill an old instance and roll in another new instance? Do we write an external healthcheck or do we rely on the instance itself to signal health? The answer is the latter, it means the instance owner can set whatever success criteria is required for his/her instance, and then signal to the overarching orchestration system (i.e. CloudFormation) that the instance is "healthy".

With regards to your comments about Docker Compose - it is a tool that can provide both aspects you mention. The docker-compose.yml part is the desired state compositional environment specification, whilst the various docker-compose commands provide the ability to interact with the environment in a number of ways. For now we need external orchestration tools because fundamentally docker-compose does not perform dependency management between services well enough. As docker-compose gets features like native health check support, the goal of a single docker-compose up command will be more realistic, assuming we'll be to able to specify for example a service must be marked healthy before it is considered "up", which then means our dependant services effectively wait until the dependency is healthy.

@djui
djui commented Dec 1, 2016

@mixja Thanks for the detailed explanation. I think

I see more value in leveraging the native capabilities of the platform

is a good/the main point. Just waiting for Docker Compose to leverage the healthchecks natively either in depends_on or a new key, await. Just wonder if should/will go even a step further than that and basically brings down linked containers if e.g. --abort-on-container-exit is set and a health check during runtime sets the healthcheck label to unhealthy.

@desprit
desprit commented Dec 9, 2016

Possible temporary workaround for those of you who's looking for delay functionallity to run tests:

I have two docker-compose yml files. One is for testing and another one for development. The difference is just in having sut container in docker-compose.test.yml. sut container runs pytest. My goal was to run test docker-compose and if pytest command in sut container fails, don't run development docker-compose. Here is what I came up with:

# launch test docker-compose; note: I'm starting it with -p argument
docker-compose -f docker-compose.test.yml -p ci up --build -d
# simply get ID of sut container
tests_container_id=$(docker-compose -f docker-compose.test.yml -p ci ps -q sut)
# wait for sut container to finish (pytest will return 0 if all tests passed)
docker wait $tests_container_id
# get exit code of sut container
tests_status=$(docker-compose -f docker-compose.test.yml -p ci ps -q sut | xargs docker inspect -f '{{ .State.ExitCode  }}' | grep -v 0 | wc -l | tr -d ' ')
# print logs if tests didn't pass and return exit code
if [ $tests_status = "1" ] ; then
    docker-compose -f docker-compose.test.yml -p ci logs sut
    return 1
else
    return 0
fi

Now you can use the code above in any function of your choice (mine is called test) and do smth like that:

test
test_result=$?
if [[ $test_result -eq 0 ]] ; then
    docker-compose -f docker-compose.yml up --build -d
fi

Works well for me but I'm still looking forward to see docker-compose support that kind of stuff natively :)

@blockjon

+1

@electrofelix

Perhaps things that are considered outside the core of docker-compose could be supported through allowing plugins? Similar to the request #1341 it seems there is additional functionality that some would find useful but doesn't necessarily fully align with the current vision. Perhaps supporting a plugin system such as proposed by #3905 would provide a way to allow compose focus on a core set of capabilities and if this isn't one then those that want it for their particular use case could write a plugin to handle performing up differently?

It would be nice to be able to have docker-compose act as the entrypoint to all projects we have locally around docker env setup, rather than needing to add a script sitting in front of all to act as the default entrypoint instead of people needing to remember to run the script for the odd cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment