Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sails Cluster / Auto Reload #1716

Closed
wants to merge 15 commits into from
Closed

Sails Cluster / Auto Reload #1716

wants to merge 15 commits into from

Conversation

jbielick
Copy link

screen shot 2014-05-17 at 3 51 47 pm
Attempting to address the auto reloading feature in development environment with cluster processes.

It also allows sails to run with a cluster of processes.

Approach

The master proc bootstraps the app and starts grunt task, but stops before actually starting the http server. It then forks n workers in start.js which all listen and share the app's port using node's very own cluster. Worker processes are delegated the server.listen() and the worker count can be set in --workers n or in sails.config.workers.

In the development environment, the master proc will also watch the ./api file tree and on change-like events, cycle the cluster of workers with a kill signal. Didn't see a need to gracefully restart workers since cycling the workers is a development-only task.

In the event that a worker dies, another is immediately forked in its place. You can test this by sending a kill through your terminal to one of the forks. It will replace itself.

Re-forking processes that die will also occur in production environment. Ensuring that they are cycled in a long-running application.

There's a monkey patch for npm cluster in start.js addressing the missing port attribute on server(). It's not yet been addressed in node, but with the provided patch, the tests and servers all run correctly. (the server runs correctly without it, but attempts to check address().port throw exceptions).

I've screenshot some preliminary benchmarking on concurrency and request timing with siege and results can be seen below.

I'm sure I've missed some issues, but all tests pass and I'd be glad to work through whatever issues arise as well as strategize how to write better/more tests for this.

screen shot 2014-05-18 at 11 09 36 pm

Todo

  • Include warnings about too many workers.
  • Give workers a time to live and cycle them for freshness.
  • Test performance / connectivity with regards to database requests.
  • Gracefully cycle workers (if there's a need?)

Benchmarks

500 concurrent connections, 1 process
screen shot 2014-05-18 at 10 23 30 pm

500 concurrent connections, 10 workers
screen shot 2014-05-18 at 10 21 39 pm

300 concurrent connections, 1 process (master)
300 concurrent connections

300 concurrent connections, 8 workers (master)
300 concurrent connections

1000 concurrent connections, 1 process (master)
1000 concurrent connections

1000 concurrent connections, 8 workers (sails-cluster)
1000 concurrent connections

@@ -28,12 +29,15 @@ module.exports = function() {

// console.time('cli_rc');
var log = captains(rconf.log);
var workerCount = (1 * rconf.workers) || 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 1*?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple, concise string to int coercion, returning NaN if not a coercible string thus shorting to 1.

http://jibbering.com/faq/notes/type-conversion/#tcNumber

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doesn't parseInt do the same thing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, parseInt counts on its fingers—out loud—while 1 * gets the answer from the person sitting next to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clearing that up..

@RWOverdijk
Copy link
Contributor

Are there any tests for these changes?

@mikermcneil
Copy link
Member

this looks really neat guys, just need to test it out

@jbielick thanks!

@luislobo
Copy link
Contributor

It looks very promising!

@jbielick
Copy link
Author

There are probably some things I didn't consider, but I wanted to open this PR to start a conversation about what some possible "gotchas" will be so that I can address them. Let me know how I can help!

@loicsaintroch
Copy link
Contributor

@jbielick Any news about your interesting pull request? It looks great and we would love to merge it.

@Globegitter
Copy link
Member

Just started to look into clustering myself as well using pm2 and seem to be running into some issues, so this PR would be a greatly appreciate @jbielick is there still some major work that needs to be done?
Would be more than happy to help out with some testing, etc.

@jbielick
Copy link
Author

jbielick commented Sep 5, 2014

@loicsaintroch Nothing new to report; I've mostly been waiting for feedback regarding the idea as a whole and any unforeseeable issue (from a new contributor's perspective). I was mainly looking for a response on whether this was something the sails contributing team wanted to include in the roadmap and support as it would obviously change the development environment a little bit. I imagine with all the work that's been done since this was created that some of the merge conflicts will take some time to go through.

I could see the want to keep the multi-threading enhancement a concern outside of the sails framework, but it isn't very obtrusive and the heavy lifting is done in nodejs source. The work I did in these commits was to cluster the whole application, but this same concept can be applied just to the development environment to have code-reloading available.

I can get this PR up to speed, but unless there's interest in merging it, there will continue to be merge conflicts as time goes by. The reason for that is that I had to separate the sails framework hooks and bootstrapping into two separate concerns: 1.) the master process (running grunt, holding the cluster, bootstrapping) and 2.) the child processes (which should only run express threads and take requests). So for future development, there might often have to be a cluster.isMaster check for certain tasks. Frameworks like geddy js have even gone so far as to make master and child modules to better handle / organize the separation of concerns between the two.

Shall I get the PR up to date?

@felipecm
Copy link

great work! I hope to see this merge soon...

@luislobo
Copy link
Contributor

@jbielick Excellent work!
We are about to deploy a big sails app, and we need multi core/cluster capabilities. Several things that are an obstacle: .tmp being generated each time a sails app is run. When running on cluster, this should be avoided. This PR takes care of it making a master/child processes. I like the idea. Also, childs needs to run on different ports, or master should distribute that load. I don't know right now how this is being handled. Finally, how does this work with PM2? Can they work together? Any specific warning? Thanks!

@mikermcneil what about this? Do you think this could be added?

@campbellanderson
Copy link
Contributor

@luislobo we are running cluster mode in PM2 and its launching on different CPU cores. Its been running fine in production environment for a while now.

You do get errors on startup about the .tmp directory, and binding to your preferred port.

@handleror
Copy link

I need this. Hope it fixed and merged soon.

@gmickel
Copy link

gmickel commented Oct 28, 2014

awesome, hope to see this merged soon too 👍

@thomasfr
Copy link

Nice work! Hope this gets merged soon. Really a missing feature!
Anyone already using this somewhere?

@arcseldon
Copy link

+1 Have done something very similar using cluster / siege on several occasions with MEAN stack apps. But have found in development there was rarely a need for clustering. In fact, it can get in the way of debugging as your debugger attaches to the master process instance, and breakpoints that occur on a child process are not hit. Therefore, in development simply using nodemon and a single process works well. However, for PRD deployments having this built-in cluster support would be excellent. As others have said, a much needed feature that is currently missing.

@jbielick
Copy link
Author

jbielick commented Nov 4, 2014

@luislobo

I like the idea. Also, childs needs to run on different ports, or master should distribute that load. I don't know right now how this is being handled. Finally, how does this work with PM2? Can they work together? Any specific warning? Thanks!

Node.js Cluster module supports the load balancing and sharing of the port between the worker processes. Node v0.12 will even support round-robin load balancing. As for using pm2 and cluster, I think that'd be redundant. It seems that from a DevOps perspective, pm2 would suit the needs of a production environment, clustered application with worker processes and load balancing. I think it's pretty typical to find a clustering / load balancer outside of the application framework, but Node is definitely moving forward with cluster support and the API is such that it's best integrated with the application bootstrapping / setup IMO. As stated by campbellanderson, the only thing stopping sails for running error-free in multiple processes is IO collision. There's no reason pm2 couldn't be the production-ready solution for sails apps, but it means that the sails asset build task ought to be separated from the code that starts the application. Much like having rake assets:precompile and rails s production, etc.

@arcseldon, I think the only real goal of clustering in development was to be able to kill / fork (cycle) one worker process to restart the application with new (changed) code. The current climate for debugging worker processes isn't great and I'm afraid this PR doesn't address that. The best case scenario I've seen is in Node ~0.11, each worker will listen on it's own debugger port, but killing a worker and forking means a new port. It isn't convenient, but, I've got some working code based on that. I've also experimented with a monitor script that uses child_process.fork to fork a new processes (in development) each time code changes which may yield better results, but I'm not eager to go down that path yet.

It sounds like the consensus is a worker process or fork and cycle in development only would be ideal and the production clustering be left to the way it is until someone wants to address that or the sails team has comments on a roadmap there. Does that sound accurate?

@luislobo
Copy link
Contributor

@jbielick Thank you for your answer and though analysis. I'm not a dedicated DevOp, but a developer. Anyway, right now, I need to manage our production servers. Can you explain further about "stopping sails for running error-free in multiple processes is IO collision".
Right now, we have an NGINX EC2 server load balancing to N medium single core servers (EC2), each running only one instance of a sails app (using PM2). I was thinking of replacing the single core servers to multiple core servers, using PM2, running each app in different ports. Nginx would then load balance to those servers:ports. Do you see any issue on doing that?

@jbielick
Copy link
Author

@luislobo Sounds pretty interesting. I'd love to see that in action. To clarify about

error-free in multiple processes

I haven't looked into it in a while, but as of this PR the only issue with clustering the sails process currently would be that it's trying to build assets and things into the .tmp/ directory for each process. Since those processes (or forks) are all running in the same directory, they're all going to try to build into the same directory. Just a filesystem race condition. Otherwise, there might not be anything to worry about. The cool part about Node's cluster API is that it forks the process and runs the forks on the same port and does load balancing for you. This is typically handled by nginx, PM2 or the like.

In fact, it can get in the way of debugging as your debugger attaches to the master process instance, and breakpoints that occur on a child process are not hit.

I've been looking into this quite a bit and there's no elegant solution yet. If you start a master process with --debug and a port, each child process spawned by cluster will use an incremented port. Therefore, if one proc is forked by the cluster master it is given (default) port 5859 for debug connections, but if that child process is killed and another one forked, the port it's given will then be 5860, 5861, and so on.

Consider this gist
@arcseldon has a good point—this is cumbersome and won't work because you'd have to change the port of your node debug every time the fork was cycled.

But what about this?

I came up with this pattern in an effort to support development environment server restarts. Obviously the trigger for a cycle wouldn't be an interval, but a file watch. In this pattern, process.args are passed through (like sails CLI args) and execArgv are passed in except that the child process is forked with a different debug port than the master, but when the fork is sent a kill signal, it releases the debug port and by the time it closes we can fork another process and bind to the same port. This way, you can keep master bound on whatever you want and all of the forks that are created can be bound to the same port—making debugging in dev as simple as it should be. The only part that's potentially hackish is adding a --worker argument to the fork because I don't know of another way for a fork to know that it's...a fork. Ok, apparently process.send does not exist for non-forked processes. Without much further thought, it may be safe to check for the existence of process.send to know if it's a fork or not.

Anyway, some of the thoughts here are great. PM2 is probably a better option for clustering a sails app because you really may want more granular control / monitoring than just ps -ef | grep node. But we'll see what happens with the cluster api.
For a development solution, is this a viable solution to server restarts for file changes? I know much less about how api/ objects are loaded / required, but I guess another option is to play with deleting them from the require warm cache (but that's more hacky).

@mikermcneil
Copy link
Member

server restarts for file changes

we've got a hook in-progress for that part

@mikermcneil
Copy link
Member

@jbielick re: clustering and this PR: This has spawned a really awesome discussion. Seems to me the right approach for clustering right now is probably pm2. It seems like the big annoyance is dealing w/ production grunt stuff-- the trick I always use when I run into trouble w/ that is to disable grunt in production by setting {hooks: { grunt: false }} as part of production config (i.e. using a custom sailsrc file)

@mikermcneil mikermcneil closed this Dec 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.