Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploying a unicorn app causes 502s. Can we use sockets instead of a port? #386

Closed
ryana opened this issue Dec 16, 2013 · 20 comments
Closed

Comments

@ryana
Copy link

ryana commented Dec 16, 2013

Hey there,

Every time I deploy, I got a 502 for ~ 20 seconds. I believe the issue is that nginx is pointing at the new port before the new unicorn is up. This issue might be fixed if nginx was pointing at a socket instead of a port. However, if I understand how dokku works, then the whole idea of using Unicorn may not work out of the box. That is, nothing is telling dokku to send a USR2 to Unicorn (http://unicorn.bogomips.org/SIGNALS.html) to tell it to restart.

Can anyone with Unicorn/dokku experience chime in? Thanks!

@tspacek
Copy link

tspacek commented Dec 19, 2013

I have the same issue. I'll let you know if I work it out, but if anyone with more experience of this has any info it would be much appreciated.

@ryana
Copy link
Author

ryana commented Dec 19, 2013

After noodling on this for a bit, I think this is less about using sockets v. ports and more about managing when the host nginx is reloaded. Bottom line is that with dokku, we cannot use USR2 to reload Unicorn. However, Unicorn still has the forking niceties that effectively lets us "scale dynos" -- something that appears to be lacking in dokku right now.

I think fixing this issue this will require removing the nginx-vhosts plugin and adding a unicorn-nginx plugin (which I'm going to write) that will:

On initial deploy:

  1. Start a new container
  2. Writes the nginx config in a way that uses a socket on the host OS (child VM port proxies to host socket file)

On redeploy:

  1. Start a new container
  2. In a predeploy hook, copy the current CONTAINER file to a safe place and delete the file (dokku looks for this file and, upon it's existence, kills the current container automatically).
    2a. This stops the current container from being killed
  3. Wait until the new container responds to a GET
  4. Rewrite the host nginx to forward from the correct port
  5. Kill old container

Let me know if this makes sense.

@tspacek
Copy link

tspacek commented Dec 19, 2013

This is the same conclusion I came to over the last few hours. It sounds like besides using a socket this would be applicable to most deploys though?—
Sent from Mailbox for iPhone

On Fri, Dec 20, 2013 at 1:11 AM, Ryan Angilly notifications@github.com
wrote:

After noodling on this for a bit, I think this is less about using sockets v. ports and more about managing when the host nginx is reloaded. Bottom line is that with dokku, we cannot use USR2 to reload Unicorn. However, Unicorn still has the forking niceties that effectively lets us "scale dynos" -- something that appears to be lacking in dokku right now.
I think fixing this issue this will require removing the nginx-vhosts plugin and adding a unicorn-nginx plugin (which I'm going to write) that will:
On initial deploy:

  1. Start a new container
  2. Writes the nginx config in a way that uses a socket on the host OS (child VM port proxies to host socket file)
    On redeploy:
  3. Start a new container
  4. In a predeploy hook, copy the current CONTAINER file to a safe place and delete the file (dokku looks for this file and, upon it's existence, kills the current container automatically).
    2a. This stops the current container from being killed
  5. Wait until the new container responds to a GET
  6. Rewrite the host nginx to forward from the correct port
  7. Kill old container

Let me know if this makes sense.

Reply to this email directly or view it on GitHub:
#386 (comment)

@ryana
Copy link
Author

ryana commented Dec 19, 2013

Yeah this is more of a general "oh crap that's right we can't use the zero-downtime features of unicorn when we're using containers" -- so actually, it could (should) probably be generalized to not be unicorn specific at all. Something like:

  1. start the new container
  2. don't kill old container
  3. wait until a GET succeeds on the new container $PORT
  4. rewrite host nginx conf & reload it
  5. kill old container

@tspacek
Copy link

tspacek commented Dec 19, 2013

This would be really useful for my cases, and I suspect for others too. 

Sounds conceptually similar to https://devcenter.heroku.com/articles/labs-preboot


Sent from Mailbox for iPhone

On Fri, Dec 20, 2013 at 1:46 AM, Ryan Angilly notifications@github.com
wrote:

Yeah this is more of a general "oh crap that's right we can't use the zero-downtime features of unicorn when we're using containers" -- so actually, it could (should) probably be generalized to not be unicorn specific at all. Something like:

  1. start the new container
  2. don't kill old container
  3. wait until a GET succeeds on the new container $PORT
  4. rewrite host nginx conf & reload it

5. kill old container

Reply to this email directly or view it on GitHub:
#386 (comment)

@ryana
Copy link
Author

ryana commented Dec 19, 2013

Yeah. I haven't contributed much to open source lately... I think I'm due ;) Give me the wknd and I should have something worked out.

@plietar
Copy link
Contributor

plietar commented Dec 19, 2013

That sounds good.
The call to the post-deploy hook should only happen once the container is ready
Killing the old container/replacing CONTAINER should be moved to the post-deploy step.

The problem is getting this the work in a generic manner.
A GET / might not be best for all apps.
Also, what happens if an app crashes/hangs ? We would need a (configurable) timeout.

All of that could probably implemented as an extension to the dokku ps command (PR #298)

@tspacek
Copy link

tspacek commented Dec 19, 2013

Heroku seems to just wait 3 mins before switching over (with preboot), which is still better than the current sequential stop then start.

The downside is if you're running any workers in the same container they'll be ticking away doubled up for a while.

Sent from Mailbox for iPhone

On Fri, Dec 20, 2013 at 1:53 AM, Paul Lietar notifications@github.com
wrote:

That sounds good.
The call to the post-deploy hook should only happen once the container is ready
Killing the old container/replacing CONTAINER should be moved to the post-deploy step.
The problem is getting this the work in a generic manner.
A GET / might not be best for all apps.
Also, what happens if an app crashes/hangs ? We would need a (configurable) timeout.

All of that could probably implemented as an extension to the dokku ps command (PR #298)

Reply to this email directly or view it on GitHub:
#386 (comment)

@ryana
Copy link
Author

ryana commented Dec 19, 2013

Yeah what I have in mind:

  • Timeout on the new container start configurable
  • GET string configurable
  • Another timeout to wait for & ensure the old container dies (also configurable) so that resources aren't getting eaten up

I don't like the 3 minute thing. I'd rather do it right w/ a GET.

@plietar when you talk about moving the container kill to the post-deploy, are you suggesting that I look at making changes to dokku? Or should I start w/ wrapping all this up in a plugin that follows the current hook interface?

@tspacek
Copy link

tspacek commented Jan 13, 2014

Anything I can do to help on this?

@ryana
Copy link
Author

ryana commented Jan 17, 2014

Finally getting around to it now... hoping for a PR you can look at this evening...

@ryana
Copy link
Author

ryana commented Jan 17, 2014

Alright here's what I'm proposing: https://github.com/ryana/dokku/compare/zero-downtime?expand=1

You need to disable the nginx-vhosts plugin, but I just tested this a few times and it works. Actually, after thinking about it for a few minutes, this could probably just replace the nginx-vhosts plugin.... What do we think about that?

@tspacek
Copy link

tspacek commented Jan 17, 2014

I'll give this a whirl over the weekend.

I don't know enough about everything else going on to say whether replacing nginx-vhosts would be a good idea.

@ryana
Copy link
Author

ryana commented Jan 17, 2014

Awesome. I'll probably be pulling it out into a standalone plugin and putting it into production today. The reason I said it could replace nginx-vhosts is that it's very heavily based on nginx-vhosts. It's really just that plugin with a bunch of code moved around and a looping curl to wait until the new container is up. It needs some configuration options, but want to make sure it's working first.

@plietar
Copy link
Contributor

plietar commented Jan 17, 2014

@ryana There's already a lot of good work being done on the nginx integration by @mikexstudios (multiple domains support, ...)
Rather then modifying it, you could make a nginx-prereload hook, and loop with curl in there.

You might also want to have a look at #298 for better process management, even tough not much work has been done there.

@ryana
Copy link
Author

ryana commented Jan 17, 2014

Oh cool. Somehow I missed when you mentioned #298 in your earlier comment.

I saw the nginx-prereload hook, but due to its placement after re-writing nginx.conf, if the new container doesn't start you're left with an nginx.conf on the FS that doesn't reflect nginx's current state. That feels wrong, no?

Also, regarding the dokku ps stuff, any idea on the timeline for getting that pulled in?

@plietar
Copy link
Contributor

plietar commented Jan 18, 2014

This is certainly wrong, hadn't though about it.
Also the nginx part should be kept as seperate as possible from the rest of dokku.

I'd suggest adding a deploy-check, that would run between the actual deploy and the post-deploy hook. (Or nested in a post-deploy hook with high priority, so that it runs before the nginx stuff, eg 00_dokku-standard)

Based on the exit code of that hook, either continue with post-deploy (exit code 0), wait a few seconds and try again or fail.

This way we can extend the criteria for detection with new plugins. If at least one plugin returns a non zero value pluginhook aborts all plugins and returns it.

@plietar
Copy link
Contributor

plietar commented Jan 18, 2014

Concerning dokku ps, @slaskis has put a lot of effort initially, but hasn't updated it much since.
So there's no timeline for now, until slaskis or someone else has time to complete it.

@slaskis
Copy link

slaskis commented Jan 18, 2014

Oh this looks really interesting @ryana. Zero downtime is an issue with ps that I didn't get around to solve.

Mainly because the way I used named docker containers, which was really nice and convenient, but it wouldn't allow it to have two containers with the same name (and renaming them was not possible). So to get it to work I'm thinking that it would have to move from named docker containers and store the container id in a file in something like $APP/ps/$name which simply reads the container name.

But I have been running ps in production for a few months now and I have found some other kinks that needs to be fixed as well. And I'm curious to try it with newer docker too, might solve a few instabilities.

@josegonzalez
Copy link
Member

You can now specify checks in a CHECKS file as of #562 (which was released in v0.3.0). Change the DOKKU_CHECKS_WAIT environment variable to your desired amount (~30 seconds in this case).

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants