Sails Cluster / Auto Reload #1716

jbielick · 2014-05-17T19:52:43Z

Attempting to address the auto reloading feature in development environment with cluster processes.

It also allows sails to run with a cluster of processes.

Approach

The master proc bootstraps the app and starts grunt task, but stops before actually starting the http server. It then forks n workers in start.js which all listen and share the app's port using node's very own cluster. Worker processes are delegated the server.listen() and the worker count can be set in --workers n or in sails.config.workers.

In the development environment, the master proc will also watch the ./api file tree and on change-like events, cycle the cluster of workers with a kill signal. Didn't see a need to gracefully restart workers since cycling the workers is a development-only task.

In the event that a worker dies, another is immediately forked in its place. You can test this by sending a kill through your terminal to one of the forks. It will replace itself.

Re-forking processes that die will also occur in production environment. Ensuring that they are cycled in a long-running application.

There's a monkey patch for npm cluster in start.js addressing the missing port attribute on server(). It's not yet been addressed in node, but with the provided patch, the tests and servers all run correctly. (the server runs correctly without it, but attempts to check address().port throw exceptions).

I've screenshot some preliminary benchmarking on concurrency and request timing with siege and results can be seen below.

I'm sure I've missed some issues, but all tests pass and I'd be glad to work through whatever issues arise as well as strategize how to write better/more tests for this.

Todo

Include warnings about too many workers.
Give workers a time to live and cycle them for freshness.
Test performance / connectivity with regards to database requests.
Gracefully cycle workers (if there's a need?)

Benchmarks

500 concurrent connections, 1 process

500 concurrent connections, 10 workers

300 concurrent connections, 1 process (master)

300 concurrent connections, 8 workers (master)

1000 concurrent connections, 1 process (master)

1000 concurrent connections, 8 workers (sails-cluster)

RWOverdijk · 2014-05-17T21:17:05Z

bin/sails-lift.js

@@ -28,12 +29,15 @@ module.exports = function() {

  // console.time('cli_rc');
  var log = captains(rconf.log);
+  var workerCount = (1 * rconf.workers) || 1;


Simple, concise string to int coercion, returning NaN if not a coercible string thus shorting to 1.

http://jibbering.com/faq/notes/type-conversion/#tcNumber

But doesn't parseInt do the same thing?

No, parseInt counts on its fingers—out loud—while 1 * gets the answer from the person sitting next to it.

Thanks for clearing that up..

RWOverdijk · 2014-05-18T09:44:45Z

Are there any tests for these changes?

mikermcneil · 2014-05-20T01:48:06Z

this looks really neat guys, just need to test it out

@jbielick thanks!

luislobo · 2014-05-24T22:32:31Z

It looks very promising!

jbielick · 2014-05-25T01:43:36Z

There are probably some things I didn't consider, but I wanted to open this PR to start a conversation about what some possible "gotchas" will be so that I can address them. Let me know how I can help!

loicsaintroch · 2014-09-01T05:06:05Z

@jbielick Any news about your interesting pull request? It looks great and we would love to merge it.

Globegitter · 2014-09-05T14:57:51Z

Just started to look into clustering myself as well using pm2 and seem to be running into some issues, so this PR would be a greatly appreciate @jbielick is there still some major work that needs to be done?
Would be more than happy to help out with some testing, etc.

jbielick · 2014-09-05T15:52:19Z

@loicsaintroch Nothing new to report; I've mostly been waiting for feedback regarding the idea as a whole and any unforeseeable issue (from a new contributor's perspective). I was mainly looking for a response on whether this was something the sails contributing team wanted to include in the roadmap and support as it would obviously change the development environment a little bit. I imagine with all the work that's been done since this was created that some of the merge conflicts will take some time to go through.

I could see the want to keep the multi-threading enhancement a concern outside of the sails framework, but it isn't very obtrusive and the heavy lifting is done in nodejs source. The work I did in these commits was to cluster the whole application, but this same concept can be applied just to the development environment to have code-reloading available.

I can get this PR up to speed, but unless there's interest in merging it, there will continue to be merge conflicts as time goes by. The reason for that is that I had to separate the sails framework hooks and bootstrapping into two separate concerns: 1.) the master process (running grunt, holding the cluster, bootstrapping) and 2.) the child processes (which should only run express threads and take requests). So for future development, there might often have to be a cluster.isMaster check for certain tasks. Frameworks like geddy js have even gone so far as to make master and child modules to better handle / organize the separation of concerns between the two.

Shall I get the PR up to date?

felipecm · 2014-10-13T22:56:59Z

great work! I hope to see this merge soon...

luislobo · 2014-10-15T19:56:36Z

@jbielick Excellent work!
We are about to deploy a big sails app, and we need multi core/cluster capabilities. Several things that are an obstacle: .tmp being generated each time a sails app is run. When running on cluster, this should be avoided. This PR takes care of it making a master/child processes. I like the idea. Also, childs needs to run on different ports, or master should distribute that load. I don't know right now how this is being handled. Finally, how does this work with PM2? Can they work together? Any specific warning? Thanks!

@mikermcneil what about this? Do you think this could be added?

campbellanderson · 2014-10-15T20:52:09Z

@luislobo we are running cluster mode in PM2 and its launching on different CPU cores. Its been running fine in production environment for a while now.

You do get errors on startup about the .tmp directory, and binding to your preferred port.

handleror · 2014-10-17T09:47:09Z

I need this. Hope it fixed and merged soon.

gmickel · 2014-10-28T13:27:55Z

awesome, hope to see this merged soon too 👍

thomasfr · 2014-10-28T13:28:23Z

Nice work! Hope this gets merged soon. Really a missing feature!
Anyone already using this somewhere?

arcseldon · 2014-10-28T13:40:49Z

+1 Have done something very similar using cluster / siege on several occasions with MEAN stack apps. But have found in development there was rarely a need for clustering. In fact, it can get in the way of debugging as your debugger attaches to the master process instance, and breakpoints that occur on a child process are not hit. Therefore, in development simply using nodemon and a single process works well. However, for PRD deployments having this built-in cluster support would be excellent. As others have said, a much needed feature that is currently missing.

jbielick · 2014-11-04T04:32:34Z

@luislobo

I like the idea. Also, childs needs to run on different ports, or master should distribute that load. I don't know right now how this is being handled. Finally, how does this work with PM2? Can they work together? Any specific warning? Thanks!

Node.js Cluster module supports the load balancing and sharing of the port between the worker processes. Node v0.12 will even support round-robin load balancing. As for using pm2 and cluster, I think that'd be redundant. It seems that from a DevOps perspective, pm2 would suit the needs of a production environment, clustered application with worker processes and load balancing. I think it's pretty typical to find a clustering / load balancer outside of the application framework, but Node is definitely moving forward with cluster support and the API is such that it's best integrated with the application bootstrapping / setup IMO. As stated by campbellanderson, the only thing stopping sails for running error-free in multiple processes is IO collision. There's no reason pm2 couldn't be the production-ready solution for sails apps, but it means that the sails asset build task ought to be separated from the code that starts the application. Much like having rake assets:precompile and rails s production, etc.

@arcseldon, I think the only real goal of clustering in development was to be able to kill / fork (cycle) one worker process to restart the application with new (changed) code. The current climate for debugging worker processes isn't great and I'm afraid this PR doesn't address that. The best case scenario I've seen is in Node ~0.11, each worker will listen on it's own debugger port, but killing a worker and forking means a new port. It isn't convenient, but, I've got some working code based on that. I've also experimented with a monitor script that uses child_process.fork to fork a new processes (in development) each time code changes which may yield better results, but I'm not eager to go down that path yet.

It sounds like the consensus is a worker process or fork and cycle in development only would be ideal and the production clustering be left to the way it is until someone wants to address that or the sails team has comments on a roadmap there. Does that sound accurate?

luislobo · 2014-11-20T21:21:02Z

@jbielick Thank you for your answer and though analysis. I'm not a dedicated DevOp, but a developer. Anyway, right now, I need to manage our production servers. Can you explain further about "stopping sails for running error-free in multiple processes is IO collision".
Right now, we have an NGINX EC2 server load balancing to N medium single core servers (EC2), each running only one instance of a sails app (using PM2). I was thinking of replacing the single core servers to multiple core servers, using PM2, running each app in different ports. Nginx would then load balance to those servers:ports. Do you see any issue on doing that?

jbielick · 2014-11-29T17:39:04Z

@luislobo Sounds pretty interesting. I'd love to see that in action. To clarify about

error-free in multiple processes

I haven't looked into it in a while, but as of this PR the only issue with clustering the sails process currently would be that it's trying to build assets and things into the .tmp/ directory for each process. Since those processes (or forks) are all running in the same directory, they're all going to try to build into the same directory. Just a filesystem race condition. Otherwise, there might not be anything to worry about. The cool part about Node's cluster API is that it forks the process and runs the forks on the same port and does load balancing for you. This is typically handled by nginx, PM2 or the like.

In fact, it can get in the way of debugging as your debugger attaches to the master process instance, and breakpoints that occur on a child process are not hit.

I've been looking into this quite a bit and there's no elegant solution yet. If you start a master process with --debug and a port, each child process spawned by cluster will use an incremented port. Therefore, if one proc is forked by the cluster master it is given (default) port 5859 for debug connections, but if that child process is killed and another one forked, the port it's given will then be 5860, 5861, and so on.

Consider this gist
@arcseldon has a good point—this is cumbersome and won't work because you'd have to change the port of your node debug every time the fork was cycled.

But what about this?

I came up with this pattern in an effort to support development environment server restarts. Obviously the trigger for a cycle wouldn't be an interval, but a file watch. In this pattern, process.args are passed through (like sails CLI args) and execArgv are passed in except that the child process is forked with a different debug port than the master, but when the fork is sent a kill signal, it releases the debug port and by the time it closes we can fork another process and bind to the same port. This way, you can keep master bound on whatever you want and all of the forks that are created can be bound to the same port—making debugging in dev as simple as it should be. The only part that's potentially hackish is ~~adding a --worker argument to the fork because I don't know of another way for a fork to know that it's...a fork.~~ Ok, apparently process.send does not exist for non-forked processes. Without much further thought, it may be safe to check for the existence of process.send to know if it's a fork or not.

Anyway, some of the thoughts here are great. PM2 is probably a better option for clustering a sails app because you really may want more granular control / monitoring than just ps -ef | grep node. But we'll see what happens with the cluster api.
For a development solution, is this a viable solution to server restarts for file changes? I know much less about how api/ objects are loaded / required, but I guess another option is to play with deleting them from the require warm cache (but that's more hacky).

mikermcneil · 2014-12-03T19:36:48Z

server restarts for file changes

we've got a hook in-progress for that part

mikermcneil · 2014-12-03T19:40:58Z

@jbielick re: clustering and this PR: This has spawned a really awesome discussion. Seems to me the right approach for clustering right now is probably pm2. It seems like the big annoyance is dealing w/ production grunt stuff-- the trick I always use when I run into trouble w/ that is to disable grunt in production by setting {hooks: { grunt: false }} as part of production config (i.e. using a custom sailsrc file)

jbielick added 6 commits May 17, 2014 12:28

api file tree watch, worker process option and auto-cycle workers

54db8ed

pass env to fork

9776d58

remove watch dep from package.json

6422295

only fork if running from sails, not mocha process

5c8aae0

hard kill on cluster forks

9458664

whitespace fix, child proc send heartbeat to master on lift

862e137

RWOverdijk reviewed May 17, 2014
View reviewed changes

jbielick added 9 commits May 18, 2014 10:59

revise, include some lift tests with clustering.

ae61698

fix port conflicts in lift tests

87bd4a0

manual spawn kills for better cleanup in lift tests

1e2de72

:1342 all the things for travis

c5e21aa

file watch and worker cycle test

3cc1e2d

request after lift test wait for later log line

7e1abc8

better listener removal on stdouts

baef66d

code cleanup, allow 0 workers for console or single-process preference

afa8d2a

actually lift the server when workers is 0

ce52cf5

mikermcneil mentioned this pull request May 20, 2014

watch for changes to models, controllers, policies, and services #366

Closed

loicsaintroch added Need answer labels Jul 17, 2014

mikermcneil closed this Dec 3, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sails Cluster / Auto Reload #1716

Sails Cluster / Auto Reload #1716

jbielick commented May 17, 2014

RWOverdijk May 17, 2014

jbielick May 17, 2014

RWOverdijk May 18, 2014

jbielick May 18, 2014

RWOverdijk May 18, 2014

RWOverdijk commented May 18, 2014

mikermcneil commented May 20, 2014

luislobo commented May 24, 2014

jbielick commented May 25, 2014

loicsaintroch commented Sep 1, 2014

Globegitter commented Sep 5, 2014

jbielick commented Sep 5, 2014

felipecm commented Oct 13, 2014

luislobo commented Oct 15, 2014

campbellanderson commented Oct 15, 2014

handleror commented Oct 17, 2014

gmickel commented Oct 28, 2014

thomasfr commented Oct 28, 2014

arcseldon commented Oct 28, 2014

jbielick commented Nov 4, 2014

luislobo commented Nov 20, 2014

jbielick commented Nov 29, 2014

mikermcneil commented Dec 3, 2014

mikermcneil commented Dec 3, 2014

Sails Cluster / Auto Reload #1716

Sails Cluster / Auto Reload #1716

Conversation

jbielick commented May 17, 2014

Approach

Todo

Benchmarks

RWOverdijk May 17, 2014

Choose a reason for hiding this comment

jbielick May 17, 2014

Choose a reason for hiding this comment

RWOverdijk May 18, 2014

Choose a reason for hiding this comment

jbielick May 18, 2014

Choose a reason for hiding this comment

RWOverdijk May 18, 2014

Choose a reason for hiding this comment

RWOverdijk commented May 18, 2014

mikermcneil commented May 20, 2014

luislobo commented May 24, 2014

jbielick commented May 25, 2014

loicsaintroch commented Sep 1, 2014

Globegitter commented Sep 5, 2014

jbielick commented Sep 5, 2014

felipecm commented Oct 13, 2014

luislobo commented Oct 15, 2014

campbellanderson commented Oct 15, 2014

handleror commented Oct 17, 2014

gmickel commented Oct 28, 2014

thomasfr commented Oct 28, 2014

arcseldon commented Oct 28, 2014

jbielick commented Nov 4, 2014

luislobo commented Nov 20, 2014

jbielick commented Nov 29, 2014

mikermcneil commented Dec 3, 2014

mikermcneil commented Dec 3, 2014