Change webserver to puma and adapt command line interface #677

Open
wants to merge 4 commits into
from

Projects

None yet

3 participants

@qbrossard
Contributor

The currently used thin webserver is not compatible with jruby.

The puma webserver is a good replacement as it's compatible with most ruby implementations. Moreover it's very fast.

@tylermauthe
Contributor

Awesome! Someone took a shot at this in #427 and I was going to find time to clean that up so it could be merged... but if you've got it going here that's great!

I don't know that we want to start running the tests with JRuby though, I think the most common use case will still be to run dashing as a Ruby app. My understanding was that switching to Puma will allow us to run in a JVM as well as in a normal Ruby context -- is that correct?

@qbrossard
Contributor

It's correct, puma works fine on all ruby VMs. Thanks for mentioning the work already done in #427. I will integrate it.
I think we should be able to solve the daemonizing issue mentioned in the comments to #427, maybe not for java but at least for MRI. Heroku have good setup instructions for puma (https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server) that might help too.

@qbrossard qbrossard changed the title from Changed webserver dependency to puma (for jruby compatibility) to Change webserver to puma and adapt command line interface Feb 15, 2016
@qbrossard
Contributor

I integrated most of the work previously done to support puma. I also used pumactl for supporting daemon mode (-d) for backwards compatibility. There's still one small open point regarding the location of puma's pidfile (needed in daemon mode by pumactl to stop the server). Otherwise I think this would be ready. Can @tylermauthe or @pushmatrix have a look at this?
Note that I also opened a separate PR (#683) for fixing the travis config.

@tylermauthe
Contributor

At a glance, this looks great. I'll give it a more thorough look at lunch today and let you know if I find any issues. Also, any chance you've tried this on Windows?

This is amazing! I've been putting off doing this for ages, thanks so much!

@tylermauthe
Contributor

Okay, I took a quick peak at this using the generated sample Dashing project -- it looks great. Exactly the behaviour I was hoping for. I thought it might be possible to specify the web server in the config.ru, but this is great!

I'd like to do some testing to see what happens when you try to start Dashing as a daemon on Windows -- I suspect it fails because Windows doesn't implement fork(). I'd also like to make sure the changeset plays nicely with existing dashboards -- though I don't see how moving to Puma would affect them.

I should be able to complete the rest of the testing tonight, so unless @pushmatrix finds any issues, we should be able to merge this sucker within the week. Thanks again!! 👍

@qbrossard
Contributor

Thanks for taking a look at this.
Regarding windows support for forking: it's effectively not working, even on JRuby. The puma README has some info on this under https://github.com/puma/puma#platform-constraints.
I also had a look at how rails configures the puma server, now that it's their default and it seems they generate a puma.rb config file under /config (it gets picked up automatically then). This might be a better long term solution as everything (port, pidfile, # threads) can then be configured in the file.

I will also update the wiki with information on puma when this PR gets merged.
You'll be able to close PRs #609 and #427 when you merge this one as it supercedes them.

@tylermauthe
Contributor

Yeah, I think the lack of Windows support for that is okay in this case. I believe that Thin didn't support it either:
https://github.com/macournoyer/thin/blob/master/lib/thin/daemonizing.rb#L40

Windows does not have the concept of a daemon in this sense, so it's okay that the feature doesn't work. It looks like Puma handles this gracefully as well:
https://github.com/puma/puma/blob/33e0fa9999de17e85ed7e064d5f8ede8dc009014/lib/puma/launcher.rb#L73

I tried this on Heroku and it is working fine so far, though I can't tell if it's actually using Puma -- I don't have a Procfile so I think it's just reading my config.ru and just plugging my code into whatever server is best for their dynos... I may try overriding this behaviour and see what happens.

I was also able to try this locally with two dashboard applications, they both worked well with dashing start and dashing start -d. However, I was a little sad that the port number changed to 9292. I can live with this.

The only issue I've found is that the EventSource is producing constant errors and is essentially being cut off directly after the SSE call returns the first datapoint. This seems to be a problem with Puma itself:
sinatra/sinatra#1035

The issue was identified a long time ago in Puma and was essentially closed as "won't fix":
puma/puma#3

Sadly, I think that this may require changes to Sinatra to resolve the issue properly. However, I did find an interesting article wherein Aaron Patterson describes how he created a Server Sent Event class for use in Puma:
http://tenderlovemaking.com/2012/07/30/is-it-live.html

It might be possible to use this technique to make it work for Dashing, since I am doubtful that Sinatra can be made to support this. The interface of the @io member in the SSE class would need to be adapted for Sinatra's Stream class. Alternatively, we could swap the EventSource for something like SockJS and just do xhr-streaming / long-polling -- not really ideal but it's an option.

@tylermauthe
Contributor

This is a problem for lots of people, not just us! I've pinged the Sinatra team to see if they're cool with adding in a Puma adapter: sinatra/sinatra#1035 (comment)

@qbrossard
Contributor

I missed the SSE related errors when I tested, sorry for that. Then it seems we are blocked by missing sinatra support here. :-(

Regarding the default port, I will change it so it uses the same as before (9292 is the default puma port). I think the best way would be to use a config/puma.rb file that would be autodetected (as rails does now). Not sure if that can be made backwards compatible with the old -d command line option but I'll try...

@tylermauthe
Contributor

Hey, no worries at all on the SSE stuff -- I am so thankful for your assistance.

We can build in our own adapter until Sinatra fixes their Stream class. It could act as a proof of concept for the fix in Sinatra as well, as the adapter would essentially be a re-written copy of their Stream class which has support for EventMachine. This will require the creation of an adapter class which simply blocks the thread for the lifetime of the request. Puma's threading model should handle this gracefully, as proven by the Rails 4 implementation of SSE -- which doesn't rely on EventMachine since the switch to Puma.

I'll see what I can whip up and try to submit a PR to somewhere. Thanks again for this work! 🍻

@tylermauthe
Contributor

Okay, I was able to get this working without changes to Sinatra -- making use of the Stream class already provided in sinatra-contrib.

I'll submit a PR to your PR as soon as I get the test working with the new connection style.

@qbrossard
Contributor

Cool.
I merged your PR (in my repo). I will update mine to setup puma similarly to rails, using the config/puma.rb configuration file as this will make server start / stop and general configuration easier. I'll check how to make it backwards compatible with the -d option but otherwise this should be quite easy.
I'll try to do more testing of the whole too.

@pushmatrix
Member

@qbrossard @tylermauthe this is looking real good :)

@tylermauthe
Contributor

I am not sure how I feel about the fact that we now create a queue of events to send to each connection, this increases the memory requirements by at least 2 * connections * total size of history. To be fair, though, this should really only be a couple megabytes.

Also, when a connection is dropped we simply let this queue grow to 2 x the length of history and then remove it. I couldn't figure out how to actually respond to the disconnect event 👎. Again, there's waste and also a chance that poorly performing connections will lose data. I would argue that naturally throttling poor connections is a good idea anyway and that the waste is kept small by the max queue size.

In any case, this is the best I've come up with. If it performs well, I am inclined to keep it. If anyone has improvements, especially in the above two areas, I greatly appreciate any and all feedback!

Let's let this change stew for a while so we can test it thoroughly.

@tylermauthe tylermauthe referenced this pull request in sinatra/sinatra Feb 22, 2016
Open

Cannot keep stream open with puma server #1035

@tylermauthe
Contributor

As I feared, my dashboard running on an old-school Heroku-free instance is completely unstable with my latest changeset. It was working fine with your changes @qbrossard, though the SSE was failing behind the scenes. It should be noted that the dashboard worked fine, but the browser JS console was strewn with errors and SSE reconnection attempts.

I see this in the logs:

2016-02-23T04:08:41.857508+00:00 heroku[web.1]: Process running mem=561M(109.6%)
2016-02-23T04:08:41.857697+00:00 heroku[web.1]: Error R14 (Memory quota exceeded)

I haven't tested this theory, but my guess is that the need to keep track of events on a per-connection basis combined with the ad-hoc mechanism for removing dead connections has increased the memory requirements too much.

Options I can think of:

  • Find a way to reduce the memory footprint of this change (write to disk, find connection ended event, be smarter, etc)
  • Switch to using a AJAX polling instead of an SSE (super lame)
  • Profile the actual memory difference in running applications -- do we actually care? (pretty lame too)

Hopefully someone else can think of a better idea because I'm not in love with any of those!

@qbrossard
Contributor

This memory usage problem looks strange. Make sure you use ruby 2.3.0, as there seem to be unverified memory problems with puma on older (2.2) rubies?
I cleaned up the configuration to use the standard config/puma.rb config file, including handling of the -d arg to daemonize...
I can also reproduce the SSE errors in the browser using a newly generated project and your latest commits. I see the same behavior, errors in the js console but application working fine.

@tylermauthe
Contributor

That is very odd, I can't reproduce the SSE errors when running locally or on Heroku. I only left the connection open for a few minutes though, perhaps it dies after some time?

I'm starting to think that the memory problem is unrelated to my changes, but I haven't spent much time investigating this further.

A more worrying aspect of this change is that running in JRuby actually causes scary errors:

================================================================================
scheduler caught exception:
Detected invalid array contents due to unsynchronized modifications with concurrent users
org/jruby/RubyArray.java:1113:in `<<'
/Users/tmauthe/.rvm/gems/jruby-9.0.4.0/gems/dashing-1.3.4/lib/dashing/app.rb:142:in `block in send_event'
org/jruby/RubyHash.java:1343:in `each'
/Users/tmauthe/.rvm/gems/jruby-9.0.4.0/gems/dashing-1.3.4/lib/dashing/app.rb:138:in `send_event'
/Users/tmauthe/Developer/opensource/test_puma/jobs/buzzwords.rb:8:in `block in (root)'
org/jruby/RubyProc.java:318:in `call'
/Users/tmauthe/.rvm/gems/jruby-9.0.4.0/gems/rufus-scheduler-2.0.24/lib/rufus/sc/jobs.rb:229:in `trigger_block'
/Users/tmauthe/.rvm/gems/jruby-9.0.4.0/gems/rufus-scheduler-2.0.24/lib/rufus/sc/jobs.rb:204:in `block in trigger'
org/jruby/RubyProc.java:318:in `call'
/Users/tmauthe/.rvm/gems/jruby-9.0.4.0/gems/rufus-scheduler-2.0.24/lib/rufus/sc/scheduler.rb:430:in `block in trigger_job'
================================================================================

This is happening because in one thread I'm appending objects to a data structure and in another thread I'm draining objects out of it.

This page has a fairly good breakdown of what options there are to solve this:
https://github.com/jruby/jruby/wiki/Concurrency-in-jruby

Hopefully there is a clever way to avoid the need to share mutable data across threads as I have implemented. Sharing mutable data across threads is typically the most naive way to implement concurrency -- so maybe if inspiration strikes we can figure a smarter solution.

The trouble comes when we need to actually send events from the jobs. How can we ensure that the data acquired in the Rufus threads is shuttled out exactly once to all connections?

@qbrossard
Contributor

It looks like JRuby actually checks synchronization on array access, cool.
I think if you use a Mutex (JRuby supports them) for reading/writing to the client_events you should be good.

@tylermauthe
Contributor

I took another stab at it:

https://github.com/tylermauthe/dashing/tree/puma_webserver

I'm going to let it run for a bit see how stable it is. So far, it's looking much better.

I will submit another Pull Request as soon as I am done trying JRuby, Rubinius and MRI for a while. I would appreciate it if you could rebase the commits afterward to have a bit cleaner history. I'd do it myself, but I believe the owner of this Pull Request must do that.

I realized that with EventMachine gone, Rufus was also free to spawn many threads -- send_event was being called simultaneously, so by adding a mutex there I was able to send the event directly to the stream instead of needing to queue it. This should drastically decrease the memory requirements.

@tylermauthe
Contributor

Seems to be running quite stable. It's pretty fun to just crank up the MAX_THREADS and watch the sucker take off.

@tylermauthe
Contributor

Oh, I just realized that I forgot synchronize access to the history.yml file. This isn't an issue on Heroku because there is no writable file system...

I'll fix that up too.

@qbrossard
Contributor

@tylermauthe Thanks a lot for taking the time to fix the threading issue. I'll squash my commits to make the history cleaner. This is looking really good now...

Tyler Mauthe and others added some commits Feb 22, 2016
@qbrossard Tyler Mauthe 🐛 Modify SSE connection handling for Puma
- Keep track of events on per-client basis
- Clients get a UUID
- If a client's event queue grows too big it gets dropped
- If a Puma thread tries to send the next batch of events to a client and it's event queue is dropped, the queue will be created
- Updated tests to match above changes
2a82079
@qbrossard Tyler Mauthe 🐛 Properly fix SSE connections in Puma!
- Realized that Rufus can also be multi-threaded now
- Added a mutex for each connection, allowing safe access during send_event
- Don't need to track a queue of events per connection - should lower memory footprint
- Connection termination detected by catching Puma::ConnectionError exceptions during send_event conntion writing
- Updated tests accordingly
64d365c
@qbrossard Tyler Mauthe Cleanup imports 34ec439
@qbrossard qbrossard Switch server to puma, provide puma config and adapt command line
- configure puma using config/puma.rb config file
- add default puma.rb config file for new projects
- adapt dashing cli to use the config file and correctly handle -d (daemonize) arg.
- adapt cli_test.rb to the new setup
- remove tmp dir from gitignore as we want to generate it in the project template
- add tmp/pids/ dir in project template (for saving puma pid / state files.
ee64d7d
@qbrossard
Contributor

@tylermauthe I squashed all my commits together and left yours as they were.

@tylermauthe
Contributor

Oh, sorry my commits could be squashed too. Especially the two that are fix and actually fix - hehe. Oh well!

I am curious if anyone else has run this for any length of time? My dashboards are running fine with it...

@qbrossard - Thanks again for all your work on this! It's been super helpful and it gave me the kick in the pants I needed to get it done.

I think we could probably merge this now if you've given it the Shopify seal of approval @pushmatrix

@pushmatrix
Member

Thanks for you work on this @qbrossard and @tylermauthe. Let's get this thing merged this week. I'll be reviewing it shortly.

@pushmatrix pushmatrix commented on the diff Apr 1, 2016
lib/dashing/app.rb
out << latest_events
- out.callback { settings.connections.delete(out) }
+ settings.connections << connection = {out: out, mutex: Mutex.new, terminated: false}
@pushmatrix
pushmatrix Apr 1, 2016 Member

Each connection gets its own Mutex?

@tylermauthe
tylermauthe Apr 1, 2016 Contributor

Yes, it signifies the out resource which we are injecting events into inside send_event.

@pushmatrix pushmatrix commented on the diff Apr 1, 2016
lib/dashing/app.rb
out << latest_events
- out.callback { settings.connections.delete(out) }
+ settings.connections << connection = {out: out, mutex: Mutex.new, terminated: false}
+ terminated = false
+
+ loop do
+ connection[:mutex].synchronize do
+ terminated = true if connection[:terminated]
+ end
+ break if terminated
+ end
+
@pushmatrix
pushmatrix Apr 1, 2016 Member

Can you guys explain this section a bit more?

@tylermauthe
tylermauthe Apr 1, 2016 Contributor

Each request gets a thread in the thread pool until Puma hits the max and multiplexes the threads. The loop is basically just "keep the SSE request open until marked as terminated".

The connection is marked as terminated in send_event when there is an exception caught when trying to send to a dead connection. send_even is triggered from Rufus -- which is multithreaded in Puma.

@pushmatrix
pushmatrix Apr 1, 2016 Member

There's no way to keep alive a connection without doing an idle loop?

@tylermauthe
tylermauthe Apr 1, 2016 Contributor

I don't think so. I am not sure, but I think this is because Sinatra doesn't properly implement Rack hijack.

The while loop stuff comes from this: http://tenderlovemaking.com/2012/07/30/is-it-live.html

@tylermauthe
tylermauthe Apr 1, 2016 Contributor

The idea that Sinatra doesn't implement rack hijack properly comes from smart dudes comments here: sinatra/sinatra#1035 (comment)

But most of it is over my head.

@pushmatrix
Member

Hey @qbrossard and @tylermauthe.

I've given a lot of thought into this, and unfortunately I have changed my mind on putting this into master. I initially thought that the switch to puma would be straightforward, but it seems like there are many gotchas in the way, and that it doesn't play very nice with Sinatra. As this project has now become no longer actively maintained, I fear that adding a bunch of complexity may introduce new issues that may not be addressed. I also don't want to break any existing dashboards out there.

What I suggest we do is make a branch for it, with a wiki article on how to use it. That way people will be able to use JRuby if they want to, and they won't have to fear about missing out on features because....well....there probably won't be any new ones coming :\

Thanks guys for your help in diving into this! Hopefully it was a good learning process, and I hope it will be of use to other people looking to use Puma and Dashing.

Cheers!

@tylermauthe
Contributor

+1 for the Puma branch -- makes sense to me. There are a lot of major changes and I know I don't have the time to deal with the fallout from the impact of those changes on thousands of dashboards in the wild.

Good thing rubygems are easy to install from a branch :)

https://github.com/gottfrois/dashing-rails - Runs on Rails 4, so that's another option for those looking to get Puma support from Dashing.

@pushmatrix
Member

It's now an official branch over here: https://github.com/Shopify/dashing/tree/puma_webserver

If there any gotchas or specific things someone needs when running it, feel free to post them on this wiki: https://github.com/Shopify/dashing/wiki/How-To%3A-Run-Dashing-on-Puma

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment