Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fin cleanup is potentially disruptive #582

Closed
achekulaev opened this issue Jun 5, 2018 · 13 comments
Closed

fin cleanup is potentially disruptive #582

achekulaev opened this issue Jun 5, 2018 · 13 comments
Assignees

Comments

@achekulaev
Copy link
Member

fin cleanup removes networks for existing but stopped projects.
Which can later lead to errors if you try to start the project

screen shot 2018-06-05 at 10 48 02 am

@lmakarov
Copy link
Member

lmakarov commented Jun 6, 2018

True and I've been running into this myself. Unfortunately, there is a limit of about 30 docker-compose projects/networks that can exist on a host at the same time, so then do need to be cleanup up once in a while. See #184 (comment)

We can take the same approach as in vhost-proxy and re-create the network during fin project start when necessary.

@shelane
Copy link
Member

shelane commented Jun 7, 2018

I just came across this error after running the fin update with one project that was stopped. I know I've seen this before and I'm not sure how to get it going again. fin system reset didn't do it. I did fin rm for my project and then fin up but that gave these errors:

ERROR: for llnl_cli_1  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

ERROR: for cli  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

I did a full computer restart which resolved the issues.

@shelane
Copy link
Member

shelane commented Jun 7, 2018

Actually, this time a full computer restart did not fix one of my projects.

fin up
Identity added: id_rsa (id_rsa)
Starting services...
Starting llnl8_solr_1      ... error
Starting llnl8_cli_1  ... 
Starting llnl8_memcached_1 ... 
Starting llnl8_db_1        ... 
Starting llnl8_memcached_1 ... error
Starting llnl8_cli_1       ... error

ERROR: for llnl8_memcached_1  Cannot start service memcached: network 33adaf93d43b287eaf8a6b86801f972694931b63723234e3f9a047a6188fe297 not found

Starting llnl8_db_1        ... done

ERROR: for solr  Cannot start service solr: network 33adaf93d43b287eaf8a6b86801f972694931b63723234e3f9a047a6188fe297 not found

ERROR: for memcached  Cannot start service memcached: network 33adaf93d43b287eaf8a6b86801f972694931b63723234e3f9a047a6188fe297 not found

ERROR: for cli  Cannot start service cli: network 33adaf93d43b287eaf8a6b86801f972694931b63723234e3f9a047a6188fe297 not found
ERROR: Encountered errors while bringing up the project.

@lmakarov
Copy link
Member

lmakarov commented Jun 7, 2018

@shelane

ERROR: for llnl8_memcached_1 Cannot start service memcached: network 33adaf93d43b287eaf8a6b86801f972694931b63723234e3f9a047a6188fe297 not found

Restarting you computer will not fix it.

You will either have to reset the project with fin project reset (understandably not a great option) or recreate the network and reconnect the project containers manually.

Here's a script that simplifies this process:

project=$(fin debug -c 'echo $COMPOSE_PROJECT_NAME')
fin docker network create ${project}_default
for service in $(fin docker-compose ps --services); do fin docker network connect ${project}_default ${project}_${service}_1; done

Then fin up (fin project start)

@achekulaev
Copy link
Member Author

achekulaev commented Jun 7, 2018

I guess we need to have some project-level command to re-create the network. Or remove network prune from the cleanup as it causes issues with all stopped projects.

@lmakarov
Copy link
Member

lmakarov commented Jun 7, 2018

Short term, we can stop doing docker network prune -f in fin cleanup.

Long term, we should replace it with a more granular cleanup process, which will disconnect stopped project containers from the network, then drop that network. Alternatively, add logic to recreate the network and reconnect containers in fin project start.

vhost-proxy will need to implement the same logic, as it is also affected by this issue and is not able to restart a project with a missing network.

@lmakarov
Copy link
Member

lmakarov commented Jun 7, 2018

Pushed the short term fix ^

@crittermike
Copy link
Contributor

Just for the record, here's the Lando issue for this problem: lando/lando#990

And the issue for docker-compose itself: docker/compose#5745

@paulsheldrake
Copy link
Sponsor Contributor

The script in here
#582 (comment)
Didn't work for me

i've just been running fin init every time it breaks

@lmakarov lmakarov self-assigned this Jul 18, 2018
@lmakarov
Copy link
Member

Here's the commit that addresses the issue with network recreation at the vhost-proxy level:
docksal/service-vhost-proxy@b0f3dba

In the context of docksal/docksal, we should do a few things:

  • automatically create/restore the project network if it does not exist
  • remove the project network upon fin stop (to free up the network pool)
  • restore docker network purge in fin cleanup
  • add tests to make sure this case is covered

cc @achekulaev

@ey-
Copy link
Contributor

ey- commented Jul 20, 2018

Follow up to #582 (comment)
Just for the record, in case someone needs temporary solution until next release.

The line

project=$(fin debug -c 'echo $COMPOSE_PROJECT_NAME')

should be

project=$(fin debug -c 'echo $COMPOSE_PROJECT_NAME_SAFE')

So a custom command regenerate_network in the project could look like:

#!/bin/bash

project=$(fin debug -c 'echo $COMPOSE_PROJECT_NAME_SAFE')
fin docker network create ${project}_default
for service in $(fin docker-compose ps --services); do fin docker network connect --alias "$service" ${project}_default ${project}_${service}_1; done
exit 0;

looks like this can be called everytime safely before fin up.
And you would need to update your vhost_proxy to the latest version so commit mentioned in #582 (comment) is there (in case you are using the "PROJECT_INACTIVITY_TIMEOUT"-feature).

@attiks
Copy link

attiks commented Jul 30, 2018

Ran into a similar problem, but the network still existed, only the binding with the containers was gone, lazy fix is to comment out the fin docker network create ${project}_default line

@attiks
Copy link

attiks commented Jul 30, 2018

Updated script so it continues

#!/bin/bash

project=$(fin debug -c 'echo $COMPOSE_PROJECT_NAME_SAFE')
fin docker network create ${project}_default || true
for service in $(fin docker-compose ps --services); do fin docker network connect --alias "$service" ${project}_default ${project}_${service}_1; done
exit 0;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants