New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk cleanup question #180

Closed
drmrbrewer opened this Issue Feb 11, 2018 · 8 comments

Comments

Projects
None yet
2 participants
@drmrbrewer
Copy link
Contributor

drmrbrewer commented Feb 11, 2018

Usually I restart a service (app) by hitting Save & Update Configuration in the app settings on the Console.

If I do the following to clean up on the server:

# docker container prune --force && docker image prune --all --force

and then do a Save & Update Configuration again, the app/service fails to restart. Is this expected? I have to re-deploy the app again.

And how do I do that for a one-click app like postgres after a cleanup, since there is nothing to re-deploy to get things back to normal?

@drmrbrewer

This comment has been minimized.

Copy link
Contributor Author

drmrbrewer commented Feb 11, 2018

@githubsaturn Ironically, exactly this has just happened :)

I think the server must have restarted itself or something, leaving all of my app-based services showing REPLICAS of 0/1 for docker service ls, except for the one mentioned in my OP which I had already re-deployed. That survived whatever event happened at the server, and was showing REPLICAS of 1/1. I re-deployed all non-oneclick apps, and REPLICAS for those is now back to 1/1.

But my postgres service is still showing REPLICAS of 0/1, and I'm not sure how to recover from that because there is nothing to re-deploy... please help! BTW docker ps doesn't show anything for postgres. Perhaps the container has been destroyed for good, and needs to be rebuilt (with db from backup)?

Is this a danger of doing a disk cleanup as per the above (and as per the wiki)? Does it leave the service vulnerable to some event like this? BTW I am certain that when I did the disk cleanup, all were at REPLICAS of 1/1.

@drmrbrewer

This comment has been minimized.

Copy link
Contributor Author

drmrbrewer commented Feb 11, 2018

OK I managed to get the postgres service back up and running (I remembered this post):

$ docker service update --image postgres:latest srv-captain--postgres

But I'd be interested to hear your view on whether doing a disk cleanup might have led to this, or whether it's unrelated.

@githubsaturn

This comment has been minimized.

Copy link
Collaborator

githubsaturn commented Feb 12, 2018

I'll look into this and will let you know.

@drmrbrewer

This comment has been minimized.

Copy link
Contributor Author

drmrbrewer commented Feb 12, 2018

Thanks for looking into it. Looking back at the sequence of events, it does seem to be significant to me that:

(a) after a disk cleanup, a Save & Update Configuration failed to restart the app... does this simulate would happen after a server reboot, i.e. docker would try (but fail) to restart the service, which would be left on 0/1?

(b) after the server event (reboot?) the only service (for an app of mine) that recovered back to 1/1 was the one that I'd already re-deployed (captainduckduck, nginx and certbot were OK though)... seems to back up point (a)?

Think maybe I'll stop doing any disk cleanups until further notice.

@githubsaturn

This comment has been minimized.

Copy link
Collaborator

githubsaturn commented Feb 13, 2018

Excellent find! Turned out to be a Docker issue. Unfortunately, we cannot fix this as it's out of our hands. I added some notes to our wiki and reported the issue with Docker:
moby/moby#36295

The reason that I haven't personally experienced this issue is that I typically use a local Docker registry. That way, after cleanup they get pulled back from the registry.

@drmrbrewer

This comment has been minimized.

Copy link
Contributor Author

drmrbrewer commented Feb 13, 2018

Thanks for following up on this.

I'm curious as to why the problem seemed to be limited to apps which I myself have created via the Captain console, i.e. why captainduckduck, nginx and certbot seemed to survive the image prune and restart...

Also, having to recover from a situation like this, where it is necessary to re-deploy the app in order to replace the lost image, prompts another question in my mind. One of my apps is deployed via a github webhook. I don't actually keep a local copy of the repo concerned (it is too big and is pretty much static... no further development ongoing), and instead make any changes directly on github, thereby triggering the webhook. But to force a new deploy, when no changes to the repo are required, I've been making some inconsequential change, like editing a dummy file in the repo. This doesn't seem too elegant, and I wonder if there is a way of triggering the webhook manually to force a re-deploy? Some sort of "manually trigger webhook" button in the Captain console would be nice.

And I can't see any changes to the disk cleanup wiki page.

@githubsaturn

This comment has been minimized.

Copy link
Collaborator

githubsaturn commented Feb 13, 2018

And I can't see any changes to the disk cleanup wiki page.

Doh... Forgot to save the page. Thanks.

Also, having to recover from a situation like this, where it is necessary to re-deploy the app in order to replace the lost image, prompts another question in my mind. One of my apps is deployed via a github webhook. I don't actually keep a local copy of the repo concerned (it is too big and is pretty much static... no further development ongoing), and instead make any changes directly on github, thereby triggering the webhook. But to force a new deploy, when no changes to the repo are required, I've been making some inconsequential change, like editing a dummy file in the repo. This doesn't seem too elegant, and I wonder if there is a way of triggering the webhook manually to force a re-deploy? Some sort of "manually trigger webhook" button in the Captain console would be nice.

Adding a button is easy. But it's only to address a bug. But in your scenario, you could have forcefully triggering build by sending a POST request to the webhook URL. Any POST call to that URL forcefully triggers the build. You can use simple CURL to issue that POST request:

curl -d "dummyparam=dummyvalue" -X POST https://YOUR_WEB_HOOK_FULL_URL
@githubsaturn

This comment has been minimized.

Copy link
Collaborator

githubsaturn commented Feb 13, 2018

I'm curious as to why the problem seemed to be limited to apps which I myself have created via the Captain console, i.e. why captainduckduck, nginx and certbot seemed to survive the image prune and restart...

Those are not locally built images, they are hosted on Docker hub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment