[Feature] Support for running jobs in child processes #488

manast · 2017-04-06T09:42:09Z

Many times, processes are long and synchronous. Currently design does not allow to run these kind of processes properly due to the locking mechanism. However, we can spawn a child process where the job is run instead. If the same child process is reused for every job (and we will need as many child processes as concurrency levels), the overhead will be minimum. Also as an added benefit, if the child process dies for any reason (memory leaks, or other cause), the worker will still be able to spawn a child replacement and continue working normally.

…r the following reasons: * A lot of people complain about jobs being double processed currently. So there must be a lot of poorly written job processor code out there that stalls the event loop :). Or folks, like me, forget that we're running our code on tiny instances in the cloud where the CPU is so limited that a tiny bit of JS work will max the CPU. A 30sec timeout would give a bit more buffer. At least until we figure out a generic solution like OptimalBits#488. * An expired lock (due to event loop stalling) is quite fatal now that we check the lock prior to moving the job to the completed or failed (previously we would still move it if there wasn't a lock). So if a long running job (let's say 2min) stalls the event loop for even just 5sec, it means that the job can never complete at that point. It might still continue processing, but another worker would have likely picked it up as a stalled job and processed it again. Or if it doesn't happen to get picked up as stalled, when it finally completes it still won't be moved to completed because it lost the lock at one time. * The tradeoff is that it will take longer for jobs to be considered 'stalled'. So instead waiting max 5sec to find out if a job was stalled, we'd wait max 30sec. I think this is generally OK and that most people aren't running jobs that are that time-sensitive. Actual stalled jobs [due to process crashes] should be extremely rare anyways. This also sets the stalledInterval to 30sec since it doesn't do much good to have it run more frequently than the lock timeout. It's also slightly expensive to run in Redis as it iterates all jobs in the 'active' queue (see moveUnlockedJobsToWait.lua), so it'd be nice to run this less often anyways.

This was referenced Apr 6, 2017

Properly debugging job stalled more than allowable limit #412

Closed

[Feature] Implement Time to live (TTL) for jobs #479

Open

DevBrent mentioned this issue Apr 6, 2017

Graceful shutdown for stalled jobs ( lock renewal ) #484

Open

bradvogel mentioned this issue Apr 23, 2017

Sanitized options #504

Merged

manast added this to the v3.1.0 milestone May 24, 2017

manast added the enhancement label Jun 29, 2017

manast mentioned this issue Sep 11, 2017

implementation threaded processors #689

Merged

manast closed this as completed in #689 Sep 11, 2017

zhang988925 mentioned this issue Oct 12, 2019

【请教】bull bee-queue队列中process如何在多个work上执行一次 eggjs/egg#3969

Closed

eric-hc mentioned this issue Dec 2, 2019

Update README.md #1580

Merged

manast pushed a commit that referenced this issue Mar 21, 2020

docs: add a hyperlink to #488 in README

c61cd84

jtassin pushed a commit to jtassin/bull that referenced this issue Jul 3, 2020

docs: add a hyperlink to OptimalBits#488 in README

98296d9

benoitguigal mentioned this issue Apr 10, 2024

Increase lock duration and paginate in getBsdsIdentifiers MTES-MCT/trackdechets#3246

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support for running jobs in child processes #488

[Feature] Support for running jobs in child processes #488

manast commented Apr 6, 2017 •

edited

[Feature] Support for running jobs in child processes #488

[Feature] Support for running jobs in child processes #488

Comments

manast commented Apr 6, 2017 • edited

manast commented Apr 6, 2017 •

edited