Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically reset "Started/Enquired" jobs to "Pedning" on Odoo Start #386

Open
ventor-dev opened this issue Nov 3, 2021 · 7 comments
Open
Labels
enhancement no stale Use this label to prevent the automated stale action from closing this PR/Issue.

Comments

@ventor-dev
Copy link

Problem

When the Odoo server crashes or is otherwise force-stopped, running jobs are interrupted while the runner has no chance to know they have been aborted. In such situations, jobs may remain in started or enqueued state after the Odoo server is halted. Since the runner has no way to know if they are actually running or not, and does not know for sure if it is safe to restart the jobs, it does not attempt to restart them automatically. Such stale jobs therefore fill the running queue and prevent other jobs to start. You must therefore requeue them manually, either from the Jobs view, or by running the following SQL statement before starting Odoo:
update queue_job set state='pending' where state in ('started', 'enqueued')

Result of this - channel is lost as system is thinking that job is started, while it is not really doing anything

Solution

This problem exists since beginning of time (means from the beginning of queue_job). So I guess lot's of brilliant brains were considering different solutions. But I was thinking that problem sounds not so hard so be solved with below approach.

  1. Currently we are running Job Runner Thread on Odoo Startup using this patch https://github.com/OCA/queue/blob/14.0/queue_job/jobrunner/__init__.py#L69

  2. Than we will call initialize_database() method before we are starting processing jobs https://github.com/OCA/queue/blob/14.0/queue_job/jobrunner/runner.py#L501

  3. And here we already have possibility to connect to the database. https://github.com/OCA/queue/blob/14.0/queue_job/jobrunner/runner.py#L414

So I suggest method that will run this script below before. We of course can make "SELECT FOR UPDATE" to be on the safe side and do not conflict with other processes that may query for the same records.

update queue_job set state='pending' where state in ('started', 'enqueued')

TO me this fix sounds safe.

But I believe @guewen you was considering this already. So before suggesting PR, maybe you see issues with above method?

@mlaitinen
Copy link

This kind of exists already. It's not just connected to Odoo starting up, but it's executed by the cron.

<record id="ir_cron_queue_job_garbage_collector" model="ir.cron">
<field name="name">Jobs Garbage Collector</field>
<field name="interval_number">5</field>
<field name="interval_type">minutes</field>
<field name="numbercall">-1</field>
<field ref="model_queue_job" name="model_id" />
<field name="state">code</field>
<field name="code">model.requeue_stuck_jobs()</field>
</record>

def requeue_stuck_jobs(self, enqueued_delta=5, started_delta=0):
"""Fix jobs that are in a bad states
:param in_queue_delta: lookup time in minutes for jobs
that are in enqueued state
:param started_delta: lookup time in minutes for jobs
that are in enqueued state,
0 means that it is not checked
"""
self._get_stuck_jobs_to_requeue(
enqueued_delta=enqueued_delta, started_delta=started_delta
).requeue()
return True

@github-actions
Copy link

github-actions bot commented Jun 5, 2022

There hasn't been any activity on this issue in the past 6 months, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 30 days.
If you want this issue to never become stale, please ask a PSC member to apply the "no stale" label.

@github-actions github-actions bot added the stale PR/Issue without recent activity, it'll be soon closed automatically. label Jun 5, 2022
@sbidoul sbidoul added no stale Use this label to prevent the automated stale action from closing this PR/Issue. and removed stale PR/Issue without recent activity, it'll be soon closed automatically. labels Jul 1, 2022
@sbidoul
Copy link
Member

sbidoul commented Jul 1, 2022

A proposed implementation is in #423

@angelvilaplana
Copy link

Hi, where I work, we have the same problem. I've uploaded a pull request with the changes we've that solves this problem. I hope it can be as useful to you as it has been to us.

@ventor-dev
Copy link
Author

@angelvilaplana thanks for your suggestion, that looks interesting

@mlaitinen it is not really working. I'm not sure why. But this cron never do what it should do (cleanup stuck jobs)

@mlaitinen
Copy link

@ventor-dev You might have to tweak the cron method call arguments.

def requeue_stuck_jobs(self, enqueued_delta=5, started_delta=0):

The cron job calls this method without arguments, which means that by default the job only deals with enqueued jobs, not started. In my experience more jobs are stuck in the "started" state, so just changing the cron call to

model.requeue_stuck_jobs(5, 15)

will take care of jobs stuck in the started state for more than 15 minutes.

@ventor-dev
Copy link
Author

@mlaitinen thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement no stale Use this label to prevent the automated stale action from closing this PR/Issue.
Projects
None yet
Development

No branches or pull requests

4 participants