-
-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[10.0] Add identity key on job to allow limiting redundant execution #66
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I didn't expected to have a PR on this so soon, thanks for your work :)
We should document somewhere the limitations (and the link to the blueprint #57): job is still created if another job with the same identity key is done or failed and it won't prevent another job to be created with the same identity key in another transaction if they happen at the same time. And then ultimately jobs must still be idempotent.
queue_job/job.py
Outdated
.. attribute::identity_key | ||
|
||
A key referencing the job, multiple job with the same key will not | ||
be added to a channel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth to add that it concerns only jobs not yet executed
@@ -60,4 +60,5 @@ def with_delay(self, priority=None, eta=None, | |||
eta=eta, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the argument in the docstring?
queue_job/job.py
Outdated
jobs = env['queue.job'].search( | ||
[('identity_key', '=', identity_key), | ||
('state', 'in', [PENDING, ENQUEUED, STARTED])], | ||
limit=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must have a partial index on the state and identity key:
CREATE INDEX queue_job_identity_key_state_partial_index ON queue_job (identity_key) WHERE state in ('pending', 'enqueued', 'started');
It can be created in the init
method of the module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guewen Do you mean as a post_init_hook in the Odoo module ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No in the init
method, you can find examples in odoo's code, example in the mail addon:
@api.model_cr
def init(self):
self._cr.execute('SELECT indexname FROM pg_indexes WHERE indexname = %s', ('mail_channel_partner_seen_message_id_idx',))
if not self._cr.fetchone():
self._cr.execute('CREATE INDEX mail_channel_partner_seen_message_id_idx ON mail_channel_partner (channel_id,partner_id,seen_message_id)')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for your answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beside Guewen's remarks LGTM
queue_job/job.py
Outdated
"""Check if a job to be executed with the same key exists.""" | ||
jobs = env['queue.job'].search( | ||
[('identity_key', '=', identity_key), | ||
('state', 'in', [PENDING, ENQUEUED, STARTED])], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl; dr: Please remove STARTED
from the search
On a second though, I'm wondering if we want all states. If the process is quite long. Wouldn't we have an issue when we dismiss the creation of a new job if one is already STARTED
?
What I have in mind is for an instance:
- an update on record X creates a job to export
- later the job A starts
- while job A is in started state, a second update is made on record X
- it doesn't trigger a new export job
Thus you will have a mismatch between local recordA and it's exported values.
And you won't have any job to update it. And might never have another update on that record.
This could happen if the current job is running on outdated values.
I see 2 options to avoid this.
- destroy/create or restart the existing job
When next job should do exactly the same we want to do a retry with udpated data.
(as far as I know an ongoing job won't be killed by changing the state so it is more a do again than a restart)
Or could we imagine interrupting the ongoing job to really restart it? But we would have to add some mechanisms to interrupt nicely.
- don't consider
STARTED
as duplicate
This would clearly mean sometime doing the same thing and create a new job. But it would close the breach.
Having 2 jobs would show more clearly that you did 2 calls, even if the first was useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @yvaucher is correct and a started job should not be considered as a duplicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
destroy/create or restart the existing job
No please 🗡️
don't consider STARTED as duplicate
Yes
The last fixup commit answers the previous remarks. |
queue_job/models/queue_job.py
Outdated
self._cr.execute( | ||
"CREATE INDEX queue_job_identity_key_state_partial_index " | ||
"ON queue_job (identity_key) WHERE state in ('pending', " | ||
"'enqueued');" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only realize now that it would probably be way more efficient to have WHERE state in ('pending', 'enqueued') AND identity_key IS NOT NULL
in the index, so the index will not keep updating the index for jobs without identity_key (don't slowing down jobs that don't use this feature is a target).
I pushed 2 new commits:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reapproving with last changes
Providing a hash will probably be insufficient for most of the needs, because we'd want to include the called model, method, record at least, and maybe some arguments. A function that takes a job and returns a key allows to generate this key based on the properties of the job. A identity key function 'identity_exact' provides an exact match on the delayed method.
…execution Backport from OCA/queue#66
Backported identity key (OCA#66)
This to implement a solution to the issue raised on issue #57