[10.0] Add identity key on job to allow limiting redundant execution #66

TDu · 2018-04-26T13:06:49Z

This to implement a solution to the issue raised on issue #57

guewen

Great, I didn't expected to have a PR on this so soon, thanks for your work :)

We should document somewhere the limitations (and the link to the blueprint #57): job is still created if another job with the same identity key is done or failed and it won't prevent another job to be created with the same identity key in another transaction if they happen at the same time. And then ultimately jobs must still be idempotent.

@flotho Could you have a look as you reported #52

guewen · 2018-05-24T10:12:51Z

queue_job/job.py

+    .. attribute::identity_key
+
+        A key referencing the job, multiple job with the same key will not
+        be added to a channel.


Worth to add that it concerns only jobs not yet executed

guewen · 2018-05-24T10:13:57Z

queue_job/models/base.py

@@ -60,4 +60,5 @@ def with_delay(self, priority=None, eta=None,
                                  eta=eta,


Can you add the argument in the docstring?

guewen · 2018-05-24T10:21:28Z

queue_job/job.py

+        jobs = env['queue.job'].search(
+            [('identity_key', '=', identity_key),
+             ('state', 'in', [PENDING, ENQUEUED, STARTED])],
+            limit=1


We must have a partial index on the state and identity key:

CREATE INDEX queue_job_identity_key_state_partial_index ON queue_job (identity_key) WHERE state in ('pending', 'enqueued', 'started');

It can be created in the init method of the module.

@guewen Do you mean as a post_init_hook in the Odoo module ?

No in the init method, you can find examples in odoo's code, example in the mail addon:

@api.model_cr def init(self): self._cr.execute('SELECT indexname FROM pg_indexes WHERE indexname = %s', ('mail_channel_partner_seen_message_id_idx',)) if not self._cr.fetchone(): self._cr.execute('CREATE INDEX mail_channel_partner_seen_message_id_idx ON mail_channel_partner (channel_id,partner_id,seen_message_id)')

I see, thanks for your answer.

simahawk

beside Guewen's remarks LGTM

yvaucher · 2018-05-24T12:21:17Z

queue_job/job.py

+        """Check if a job to be executed with the same key exists."""
+        jobs = env['queue.job'].search(
+            [('identity_key', '=', identity_key),
+             ('state', 'in', [PENDING, ENQUEUED, STARTED])],


tl; dr: Please remove STARTED from the search

On a second though, I'm wondering if we want all states. If the process is quite long. Wouldn't we have an issue when we dismiss the creation of a new job if one is already STARTED?

What I have in mind is for an instance:

an update on record X creates a job to export

later the job A starts

while job A is in started state, a second update is made on record X

it doesn't trigger a new export job

Thus you will have a mismatch between local recordA and it's exported values.
And you won't have any job to update it. And might never have another update on that record.

This could happen if the current job is running on outdated values.

I see 2 options to avoid this.

destroy/create or restart the existing job

When next job should do exactly the same we want to do a retry with udpated data.
(as far as I know an ongoing job won't be killed by changing the state so it is more a do again than a restart)

Or could we imagine interrupting the ongoing job to really restart it? But we would have to add some mechanisms to interrupt nicely.

don't consider STARTED as duplicate

This would clearly mean sometime doing the same thing and create a new job. But it would close the breach.
Having 2 jobs would show more clearly that you did 2 calls, even if the first was useless.

I think @yvaucher is correct and a started job should not be considered as a duplicate.

destroy/create or restart the existing job

No please 🗡️

don't consider STARTED as duplicate

Yes

TDu · 2018-05-25T07:22:36Z

The last fixup commit answers the previous remarks.

guewen · 2018-05-25T12:26:07Z

queue_job/models/queue_job.py

+            self._cr.execute(
+                "CREATE INDEX queue_job_identity_key_state_partial_index "
+                "ON queue_job (identity_key) WHERE state in ('pending', "
+                "'enqueued');"


I only realize now that it would probably be way more efficient to have WHERE state in ('pending', 'enqueued') AND identity_key IS NOT NULL in the index, so the index will not keep updating the index for jobs without identity_key (don't slowing down jobs that don't use this feature is a target).

guewen · 2018-06-28T10:08:50Z

I pushed 2 new commits:

The first (fixup) adds the identity_key IS NOT NULL clause in the partial index
The seconds extends the identity key feature with ability to use identity functions (taking the job as parameter). There is a provided identity function (identity_exact) that should cover most of the cases: if the method call and arguments are the same for the same record(s), skip it. The possibility to provide a hash string still exists but is probably an edge case, I guess that nearly always you want the hash key to be dependent of the job method, record and arguments, and it makes no sense to compute the hash beforehand with these properties rather than computing the hash from the job properties.

yvaucher

Reapproving with last changes

Providing a hash will probably be insufficient for most of the needs, because we'd want to include the called model, method, record at least, and maybe some arguments. A function that takes a job and returns a key allows to generate this key based on the properties of the job. A identity key function 'identity_exact' provides an exact match on the delayed method.

coveralls · 2018-07-30T07:22:45Z

Coverage increased (+0.7%) to 77.658% when pulling 17b0220 on TDu:no-duplicate-job into a2a14e0 on OCA:10.0.

…execution Backport from OCA/queue#66

Backported identity key (OCA#66)

yvaucher approved these changes May 24, 2018

View reviewed changes

guewen requested changes May 24, 2018

View reviewed changes

simahawk approved these changes May 24, 2018

View reviewed changes

yvaucher reviewed May 24, 2018

View reviewed changes

guewen reviewed May 25, 2018

View reviewed changes

guewen approved these changes Jun 28, 2018

View reviewed changes

yvaucher approved these changes Jun 29, 2018

View reviewed changes

TDu and others added 2 commits July 30, 2018 09:06

[10.0] Add identity key on job to allow limiting redundant execution

29045e6

guewen force-pushed the no-duplicate-job branch from bc32e2f to 17b0220 Compare July 30, 2018 07:07

guewen merged commit b5a09c8 into OCA:10.0 Jul 30, 2018

lmignon mentioned this pull request Aug 14, 2018

10.0 shopinvader stock refactor shopinvader/odoo-shopinvader#182

Merged

guewen mentioned this pull request Aug 31, 2018

[Feature]Prevent duplicate Task in the queue #52

Closed

lmignon mentioned this pull request Sep 24, 2018

[MIG][11.0] mail_queue_job #93

Merged

6 tasks

acsonefho mentioned this pull request Sep 25, 2018

[11.0] Forward port. changes about identity_key (on queue_job) #99

Merged

3 tasks

guewen mentioned this pull request Mar 7, 2019

Blueprint: limit redundant jobs #57

Closed

lmignon mentioned this pull request May 22, 2019

[9.0] backport identity key OCA/connector#335

Merged

lmignon added a commit to acsone/connector that referenced this pull request May 22, 2019

[IMP] connector: Add identity key on job to allow limiting redundant …

e5fc5d7

…execution Backport from OCA/queue#66

anikeenko-viktor pushed a commit to anikeenko-viktor/queue that referenced this pull request Nov 17, 2020

Backported identity key (OCA#66)

cc536b1

okuryan added a commit to xpansa/queue that referenced this pull request Nov 17, 2020

Merge pull request #3 from anikeenko-viktor/backport-identity-key

a0f6e3c

Backported identity key (OCA#66)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[10.0] Add identity key on job to allow limiting redundant execution #66

[10.0] Add identity key on job to allow limiting redundant execution #66

TDu commented Apr 26, 2018

guewen left a comment

guewen May 24, 2018

guewen May 24, 2018

guewen May 24, 2018

TDu May 24, 2018

guewen May 24, 2018

TDu May 24, 2018

simahawk left a comment

yvaucher May 24, 2018 •

edited

TDu May 24, 2018

guewen May 24, 2018

TDu commented May 25, 2018

guewen May 25, 2018

guewen commented Jun 28, 2018 •

edited

yvaucher left a comment

coveralls commented Jul 30, 2018 •

edited

		@@ -60,4 +60,5 @@ def with_delay(self, priority=None, eta=None,
		eta=eta,

[10.0] Add identity key on job to allow limiting redundant execution #66

[10.0] Add identity key on job to allow limiting redundant execution #66

Conversation

TDu commented Apr 26, 2018

guewen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simahawk left a comment

Choose a reason for hiding this comment

yvaucher May 24, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TDu commented May 25, 2018

Choose a reason for hiding this comment

guewen commented Jun 28, 2018 • edited

yvaucher left a comment

Choose a reason for hiding this comment

coveralls commented Jul 30, 2018 • edited

yvaucher May 24, 2018 •

edited

guewen commented Jun 28, 2018 •

edited

coveralls commented Jul 30, 2018 •

edited