Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance issues (pcntl_fork overhead?) #34

Closed
kballenegger opened this issue Jan 6, 2012 · 16 comments
Closed

performance issues (pcntl_fork overhead?) #34

kballenegger opened this issue Jan 6, 2012 · 16 comments

Comments

@kballenegger
Copy link

So I've started playing with a deployment of this in production and seem to be having performance issues with pcntl_fork. Processing an empty job (that just contain an error_log statement) takes over 50ms, and our queue needs to be able to process more than 1000 items per second.

I'm thinking I might have to fork the project and change the behavior so that instead of forking on every job, to fork on every X jobs (where X > 100). My concern is that I haven't figured out a way to communicate between processes in PHP that would let me pass back the job object about to be processed from the child to the parent. Ideas?

@chrisboulton
Copy link
Owner

I know it's not really a solution, but running multiple workers checking the same queues might be beneficial for you to get higher throughput.

Forking is by its very nature "slow", so I'm not too shocked to hear the performance isn't where you'd need it to be. Forking every X jobs might be a nice idea and something that could be built in to the core.

In terms of communication between the processes, you could use socket_create_pair to create a set of sockets between the parent and child processes. You want to look at IPC (inter-process communication)

Let me know what you're thinking.

@kballenegger
Copy link
Author

The performance for per-job forking is that it takes about 50ms per job. This means about 20 jobs per second per worker maximum. To handle our queue we would need 50-100 workers, which is INSANE. Because of this, I'm leaning towards a fix that won't fork on every job.

I noticed that redis keeps a record of what the worker is working on. I'm working on a fix that will catch failures in a child that processes multiple items, and use the redis re-create the job object and fail it. This would be an option, which means it could play nicely with the current implementation.

If this doesn't work out so well, I'll see what I can do w/ socket_create_pair. Maybe I could serialize the job and pass it through the socket to the parent before each job is attempted… but this seems like a wonkier solution.

@kballenegger
Copy link
Author

See pull request here: #35

@kballenegger
Copy link
Author

Any news on this or my pull request above?

@roynasser
Copy link

I'd be interested in hearing more pros and especially about the cons of this?

@danhunsaker
Copy link
Contributor

Cons include the loss of process-level isolation of jobs. In the current model (1 fork every 1 jobs), each job operates in complete and total isolation (memory-wise) from every other job. This is a very good idea for failure-tolerance - especially when any given job could spontaneously encounter a fatal error, or even segfault. It is possible to figure out some kind of failure detection mechanism and work around this, but the options are each more hackish and/or unreliable than the last - and you still have to replace the now-dead worker and figure out what job(s) to hand it - a process which is likely to take longer than the original fork itself.

I want to also point out, albeit rather belatedly, that error_log() has a certain amount of overhead all its own, namely that it writes to the disk. Solid state drives and RAM disks are about the only places where this overhead isn't going to be particularly noticeable, but even then I wouldn't rely on that for performance. My point being that error_log() is probably a bad function to use to gauge the speed of anything other than your target disk and its accompanying filesystem. The best bet, in fact, is to use a truly empty perform() method, one with literally nothing in it, as this will most accurately gauge the speed of PHP-Resque itself.

Something else that will help is having a PHP binary that doesn't contain (and a config that doesn't load) any features you aren't going to use in your workers. So will anything else that limits the amount of memory consumed per process. Each fork creates a full copy of the parent process's memory space, so the less you load, the less the OS needs to copy, and the less time it takes to do so.

I'm guessing you already knew most of that, though, so I hope you take the bits you knew as advice for others who didn't. :)

@Dynom
Copy link

Dynom commented Aug 30, 2013

I know that your initial post is about 2 years old. But since it's till open.. I'm curious what you came up with since then @kballenegger.

Another possibility I'm considering is to not fork at all and do some alternate process control (like you can with Supervisord). That removes a tremendous amount of overhead. I have jobs of around 100ms, that is just too slow. Shaving 50ms off would save a couple of servers.

@kballenegger
Copy link
Author

@Dynom I kind of gave up; making nice things in PHP is too hard / impossible. These days I built everything in Ruby or C.

@Rockstar04
Copy link
Contributor

I agree with @Dynom, as far as resque itself goes, since it was written with Ruby first. I am going to try to run my daemon and jobs in Ruby, but leave the app code itself PHP since we would never get approval to port all of our apps to Ruby.

@Dynom
Copy link

Dynom commented Aug 31, 2013

An alternative approach might be a solution like this: https://github.com/salimane/php-resque-jobs-per-fork , this idea introduces (at least) the following:

con's:

  • It introduces potential problems with context, memory, etc. As will any implementation that doesn't always fork.

pro's:

  • It can significantly speed up jobs, by spreading the fork overhead over N jobs. While still allowing to "clean up" after N jobs.
  • It's also still fairly "robust" since it will still fork all workers. On any fatal errors, only the fork dies, not the main process. A mild "pro", but still :-)

@Dynom
Copy link

Dynom commented Aug 31, 2013

I've done some tests and I have very promising results. Heavy jobs that used to finished in 200ms now finish in 30ms and other jobs have very similar results. As soon as I find some time I will make a PR and I hope that @chrisboulton will merge it in asap

@Dynom
Copy link

Dynom commented Aug 31, 2013

I've done some test and I've seen some promising results. Large jobs taking 200ms now take 35ms. I've created PR #130, which is an up-to-date implementation of the work by Salimane's work here: https://github.com/salimane/php-resque-jobs-per-fork

This PR introduces an environment variable JOBS_PER_FORK, that will define the amount of jobs being processed per fork. A larger value introduces a slightly more risk (in case of errors or unexpected dependencies), but also reduces overhead quite a bit. I'm not sure yet what the optimum value is, but that will be different for each job.

@kballenegger
Copy link
Author

@Dynom - In case you're interested, I have a fork / pull request for this already (#35). Never got merged in though:

#35

@Dynom
Copy link

Dynom commented Aug 31, 2013

Hi @kballenegger, sorry I did not see your PR. Is it still running successfully in production ?

@kballenegger
Copy link
Author

@Dynom - We don't use Resque in our production systems any longer; we moved to more durable queuing & event stream systems as we grew even further in scale (combination of SQS, Rabbit, & Kafka, nowadays).

From what I recall, however, the fork worked great. We ran it with quite a bit of traffic for a while.

@Dynom
Copy link

Dynom commented Aug 31, 2013

Ok. I actually think that this won't hold for long and I'm already looking at alternatives. But we need some improvements now so I'll switch to this fix using either PR and create some time to investigate alternatives. SQS is not an option and I'm unsure about Rabbit.

I liked the Resque approach because it gives me great reliability in the queue. Instead of having SPF's with mindless brokers and the like. How do you handle that ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants