Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dokku ps:restore fails due to CHECKS file #1943

Closed
z0mt3c opened this issue Feb 25, 2016 · 20 comments
Closed

dokku ps:restore fails due to CHECKS file #1943

z0mt3c opened this issue Feb 25, 2016 · 20 comments

Comments

@z0mt3c
Copy link

z0mt3c commented Feb 25, 2016

Description of problem:

For me it seems a CHECKS prevents dokku ps:restore from restoring the applications/instances. I noticed it because suddenly the my services didn't come back after system reboots Whenever i published an app containing a CHECKS file none of my applications were started after reboot (ps:restore). (Even there were some without a CHECKS file as well.) May they were started and terminated after a few seconds. May it's just my fault? Did i miss something? Any idea?

My CHECKS file looked like this:

WAIT=2
ATTEMPTS=5
/

My upstart/dokku-redeploy.log looked like this when it went wrong (with CHECKS file) for the particular app:

Restoring app PROJECT_NAME ...
-----> Releasing PROJECT_NAME (dokku/PROJECT_NAME:latest)...
-----> Deploying PROJECT_NAME (dokku/PROJECT_NAME:latest)...
-----> DOKKU_SCALE file found (/home/dokku/PROJECT_NAME/DOKKU_SCALE)
=====> web=1
-----> Running pre-flight checks
-----> Attempt 1/5 Waiting for 2 seconds ...
       CHECKS expected result:
       http://localhost/ => ""
 !
curl: (7) Failed to connect to 172.17.0.3 port 5000: Connection refused
 !     Check attempt 1/5 failed.
-----> Attempt 2/5 Waiting for 2 seconds ...
       CHECKS expected result:
       http://localhost/ => ""
 !
curl: (7) Failed to connect to 172.17.0.3 port 5000: Connection refused
 !     Check attempt 2/5 failed.
-----> Attempt 3/5 Waiting for 2 seconds ...
       CHECKS expected result:
       http://localhost/ => ""
 !
curl: (7) Failed to connect to 172.17.0.3 port 5000: Connection refused
 !     Check attempt 3/5 failed.
-----> Attempt 4/5 Waiting for 2 seconds ...
       CHECKS expected result:
       http://localhost/ => ""
 !
curl: (7) Failed to connect to 172.17.0.3 port 5000: Connection refused
 !     Check attempt 4/5 failed.
-----> Attempt 5/5 Waiting for 2 seconds ...
       CHECKS expected result:
       http://localhost/ => ""
 !
curl: (7) Failed to connect to 172.17.0.3 port 5000: Connection refused
 !     Check attempt 5/5 failed.
Could not start due to 1 failed checks.
=====> PROJECT_NAME container output:
=====> end PROJECT_NAME container output
Failed to kill container (99ffe4a50feaf10dc61b1713e755af45a13079227e5c5b1566032c6094441933): Error response from daemon: Cannot kill container 99ffe4a50feaf10dc61b1713e755af45a13079227e5c5b1566032c6094441933: Container 99ffe4a50feaf10dc61b1713e755af45a13079227e5c5b1566032c6094441933 is not running

Removing the CHECKS file helped. Afterwards it starts just fine:

Restoring app PROJECT_NAME ...
-----> Releasing PROJECT_NAME (dokku/PROJECT_NAME:latest)...
-----> Deploying PROJECT_NAME (dokku/PROJECT_NAME:latest)...
-----> DOKKU_SCALE file found (/home/dokku/PROJECT_NAME/DOKKU_SCALE)
=====> web=1
-----> Running pre-flight checks
       For more efficient zero downtime deployments, create a file CHECKS.
       See http://dokku.viewdocs.io/dokku/checks-examples.md for examples
       CHECKS file not found in container: Running simple container check...
-----> Waiting for 10 seconds ...
-----> Default container check successful!
=====> PROJECT_NAME container output:
=====> end PROJECT_NAME container output
-----> Running post-deploy
-----> Found previous container(s) (3ae6e2f3da64) named PROJECT_NAME.web.1
=====> renaming container (3ae6e2f3da64) PROJECT_NAME.web.1 to PROJECT_NAME.web.1.1456405107
=====> renaming container (aa05a9783e98) prickly_goldberg to PROJECT_NAME.web.1
-----> Configuring PROJECT_NAME.something.tld...(using /var/lib/dokku/plugins/available/nginx-vhosts/templates/nginx.conf.template)
-----> Creating http nginx.conf
-----> Running nginx-pre-reload
       Reloading nginx
-----> Setting config vars
       DOKKU_APP_RESTORE: 1
-----> Shutting down old containers in 60 seconds
=====> 3ae6e2f3da642b49692b44d1dd0a0741526aef3d9aca44620366432a2672370e
=====> Application deployed:
       http://PROJECT_NAME.something.tld

Output of the following commands

  • uname -a: Linux app 3.19.0-51-generic dokku should clean up containers #57~14.04.1-Ubuntu SMP Fri Feb 19 14:36:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  • docker version: 1.10.2
  • docker run -ti gliderlabs/herokuish:latest herokuish version: 0.3.8
  • dokku version: 0.4.14

My app is just a simple node application using the default builtsteps with a Procfile containing

web: node server.js
@michaelshobbs
Copy link
Member

Please include the info requested in our issue template. If you're using a data store plugin, perhaps that container isn't up at the time of restore?

@josegonzalez
Copy link
Member

@z0mt3c Are you using a data store plugin?

@z0mt3c
Copy link
Author

z0mt3c commented Mar 3, 2016

The actual app with check file didn't "use" any plugins.. i think there was another app linked to a mongo instance. I will try to have another deeper look next days and provide further information. thanks!

@kblcuk
Copy link

kblcuk commented Mar 9, 2016

I'm having a somewhat similar issue (at least symptoms are the same), but not sure it's because of CHECKS in my case.

Basically dokku ps:start app (for a stopped app) sometimes fails due to

Could not start due to 1 failed checks.
=====> dogfish-agenda container output:
=====> end dogfish-agenda container output
Error response from daemon: Cannot kill container 528d8f77568e9c7d039130383410d9023c55571b399c013be180d9026e10c141: notrunning: Container 528d8f77568e9c7d039130383410d9023c55571b399c013be180d9026e10c141 is not running
Error: failed to kill containers: [528d8f77568e9c7d039130383410d9023c55571b399c013be180d9026e10c141]

If I try to re-build it, it hangs on

dokku ps:rebuild app
-----> Cleaning up...
-----> Building dogfish-agenda from herokuish...
-----> Adding BUILD_ENV to build environment...
-----> Checking deploymentkeys Plugin sanity ...
-----> Installing shared SSH keys in build environment ...
-----> Checking Hostkeys Plugin sanity ...
-----> Installing Hostkeys in build environment ...
       No app keys available.
       Adding shared keys.

So I have to destroy and re-create app in the end. Destroying app gives

dokku apps:destroy app
 !     WARNING: Potentially Destructive Action
 !     This command will destroy dogfish-agenda (including all add-ons).
 !     To proceed, type "app"

> app
Destroying app (including all add-ons)
Error response from daemon: no such id: 02539e6d193e9452f9c8ad879314962d15d04fe33c9b9b2e00f40f83002dec5f
Error: failed to stop containers: [02539e6d193e9452f9c8ad879314962d15d04fe33c9b9b2e00f40f83002dec5f]
Error response from daemon: no such id: 02539e6d193e9452f9c8ad879314962d15d04fe33c9b9b2e00f40f83002dec5f
Error: failed to remove containers: [02539e6d193e9452f9c8ad879314962d15d04fe33c9b9b2e00f40f83002dec5f]

@josegonzalez
Copy link
Member

Sounds like your container is failing to start when you call ps:start on it. Almost certainly something weird like a downstream stopped container, or running out of memory on your server.

@kblcuk
Copy link

kblcuk commented Mar 9, 2016

Hm, but this happens just a minute after I called ps:stop on the app, so it should have plenty of memory... (nothing else runs on the server except dokku).

@josegonzalez
Copy link
Member

Can you verify that that is the case? Also, can you run the commands with dokku trace on?

Again, I don't think you're having the same issue. Can you file a new issue?

@wzrdtales
Copy link

Maybe this helps:
I have encountered that issue on a server with a slow HDD and node js apps. If you restore a node js app it will need 2.731 million years to execute the chown /app command and thus completely fails with the too short time in your CHECKS file. node_modules folder can reach a size/filecount that is quite challenging for slow disks, especially non digital ones, though.

Only solutions are SSDs or longer wait times.

You might be facing a different issue, but it is worth mentioning it, and just maybe it already resolves the mystery.

@josegonzalez
Copy link
Member

That issue was fixed in gliderlabs/herokuish#117 and will be out in the next herokuish release. You can just pull the latest release manually if you wish.

@wzrdtales
Copy link

@josegonzalez Oh, cool. Don't have the problem anymore as I moved that parts to different container hosts. But good to know that this is fixed.

@michaelshobbs
Copy link
Member

Seems like we're good here. Please feel free to comment if not.

@christiangenco
Copy link
Contributor

Do we upgrade herokuish with sudo apt-get install herokuish? (from dokku/upgrading)

If so, I don't think it's released yet?

$ dokku version
0.4.14

$ sudo apt-get install herokuish
Reading package lists... Done
Building dependency tree
Reading state information... Done
herokuish is already the newest version.
herokuish set to manually installed.
The following packages were automatically installed and are no longer required:
  linux-headers-3.13.0-57 linux-headers-3.13.0-57-generic
  linux-headers-3.13.0-61 linux-headers-3.13.0-61-generic
  linux-image-3.13.0-57-generic linux-image-3.13.0-61-generic
  linux-image-extra-3.13.0-57-generic linux-image-extra-3.13.0-61-generic
Use 'apt-get autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 96 not upgraded.

$ sudo apt-get install dokku
Reading package lists... Done
Building dependency tree
Reading state information... Done
dokku is already the newest version.
The following packages were automatically installed and are no longer required:
  linux-headers-3.13.0-57 linux-headers-3.13.0-57-generic
  linux-headers-3.13.0-61 linux-headers-3.13.0-61-generic
  linux-image-3.13.0-57-generic linux-image-3.13.0-61-generic
  linux-image-extra-3.13.0-57-generic linux-image-extra-3.13.0-61-generic
Use 'apt-get autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 96 not upgraded.

I'd prefer not to make it a habit of pulling the latest release manually, so I'm happy waiting if sudo apt-get install herokuish will eventually fix it.

@josegonzalez
Copy link
Member

I'll push a release of that out tonight.

@josegonzalez
Copy link
Member

It's been released :)

@christiangenco
Copy link
Contributor

Is there a lag between when it's released and when it's available through apt-get? Just tried and got the same thing:

$ dokku version
0.4.14
$ sudo apt-get install dokku herokuish
Reading package lists... Done
Building dependency tree
Reading state information... Done
dokku is already the newest version.
herokuish is already the newest version.
The following packages were automatically installed and are no longer required:
  linux-headers-3.13.0-57 linux-headers-3.13.0-57-generic
  linux-headers-3.13.0-61 linux-headers-3.13.0-61-generic
  linux-image-3.13.0-57-generic linux-image-3.13.0-61-generic
  linux-image-extra-3.13.0-57-generic linux-image-extra-3.13.0-61-generic
Use 'apt-get autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 99 not upgraded.

@michaelshobbs
Copy link
Member

You'll want to run apt-get update first if you didn't already.

@christiangenco
Copy link
Contributor

@michaelshobbs Ahh, running apt-get update before sudo apt-get install dokku and sudo apt-get install herokuish did the trick.

Though it also took down my app (navigating to the webpage returned 502 Bad Gateway). The only suspicious thing I saw in the upgrade was at the end of sudo apt-get install herokuish:

Removing old herokuish image
Failed to remove image (gliderlabs/herokuish): Error response from daemon: conflict: unable to remove
 repository reference "gliderlabs/herokuish" (must force) - container 9bf778e8902d is using its refer
enced image 947e245ddbd9
Importing herokuish into docker (around 5 minutes)

Looks like something's still running:

$ dokku ps dbinbox
-----> running processes in container: 47ebe75a1529040aeadc22f64c44945af63713c537adb3787403d77c30a25363
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   7776  1808 ?        Ssl  01:07   0:00 /start web
u29336      16  1.1  4.8 420740 100344 ?       Sl   01:07   0:07 puma 2.14.0 (tcp://0.0.0.0:5000) [app]
u29336     190  0.0  4.6 1096660 94348 ?       Sl   01:07   0:00 puma: cluster worker 0: 16 [app]
u29336     192  0.0  4.6 1096660 94332 ?       Sl   01:07   0:00 puma: cluster worker 1: 16 [app]
root       218  0.0  0.0   4440   652 ?        Ss+  01:19   0:00 /bin/sh -c ps auxwww
root       224  0.0  0.0  15564  1140 ?        R+   01:19   0:00 ps auxwww
-----> running processes in container: 9483ad5820df6e2858d046c298c3ea389cd1c2ae6ae570405c55389af99a3757
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   6720  1820 ?        Ssl  01:07   0:00 /start worker
u29336      12  2.5  6.7 1413000 137788 ?      Sl   01:07   0:17 sidekiq 3.4.2 app [0 of 10 busy]
root       234  0.0  0.0   4440   656 ?        Ss+  01:19   0:00 /bin/sh -c ps auxwww
root       242  0.0  0.0  15564  1140 ?        R+   01:19   0:00 ps auxwww

dokku ps:rebuildall brought it back up, but also changed one of the URLs from HTTP to HTTPS:

$ dokku urls dbinbox
https://dbinbox.com
https://dbinbox.dokku02.gen.co

(https://dbinbox.dokku02.gen.co used to be http://dbinbox.dokku02.gen.co)

This broke my Route 53 failover checking, which is set to use the HTTP endpoint (HTTPS failover checking is extra, and I haven't set up an SSL certificate).

So now I need to figure out how to make that dbinbox.dokku02.gen.co URL use HTTPS, but I'll create a new issue for that.


For the future, is apt-get update && sudo apt-get install dokku herokuish the correct way to upgrade dokku? Was it a fluke that my app went down, or should I expect it to go down when upgrading?

@michaelshobbs
Copy link
Member

Upgrading is not advised on running apps. You'll want to stop everything first.

This was referenced Mar 29, 2016
@ncri
Copy link

ncri commented Apr 20, 2016

@michaelshobbs I know it doesn't fit here, but maybe you can point me to where to look: How to upgrade the server with no or little downtime? Logging into my digital ocean droplet i see upgrade notes quite frequently. If I stop the app I obviously have downtime. Also many times upgrades need a server restart. Wondering how Heroku does it. I assume they upgrade on a new server then start the app there and switch over to avoid downtime.

@josegonzalez
Copy link
Member

@ncri This question definitely doesn't belong here. Please open a new issue.

@dokku dokku locked and limited conversation to collaborators Apr 20, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants