Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[10.0] Add identity key on job to allow limiting redundant execution #66

Merged
merged 2 commits into from Jul 30, 2018

Conversation

TDu
Copy link
Member

@TDu TDu commented Apr 26, 2018

This to implement a solution to the issue raised on issue #57

Copy link
Member

@guewen guewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I didn't expected to have a PR on this so soon, thanks for your work :)

We should document somewhere the limitations (and the link to the blueprint #57): job is still created if another job with the same identity key is done or failed and it won't prevent another job to be created with the same identity key in another transaction if they happen at the same time. And then ultimately jobs must still be idempotent.

@flotho Could you have a look as you reported #52

queue_job/job.py Outdated
.. attribute::identity_key

A key referencing the job, multiple job with the same key will not
be added to a channel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth to add that it concerns only jobs not yet executed

@@ -60,4 +60,5 @@ def with_delay(self, priority=None, eta=None,
eta=eta,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the argument in the docstring?

queue_job/job.py Outdated
jobs = env['queue.job'].search(
[('identity_key', '=', identity_key),
('state', 'in', [PENDING, ENQUEUED, STARTED])],
limit=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must have a partial index on the state and identity key:

CREATE INDEX queue_job_identity_key_state_partial_index ON queue_job (identity_key) WHERE state in ('pending', 'enqueued', 'started');

It can be created in the init method of the module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guewen Do you mean as a post_init_hook in the Odoo module ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No in the init method, you can find examples in odoo's code, example in the mail addon:

    @api.model_cr
    def init(self):
        self._cr.execute('SELECT indexname FROM pg_indexes WHERE indexname = %s', ('mail_channel_partner_seen_message_id_idx',))
        if not self._cr.fetchone():
            self._cr.execute('CREATE INDEX mail_channel_partner_seen_message_id_idx ON mail_channel_partner (channel_id,partner_id,seen_message_id)')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for your answer.

Copy link
Contributor

@simahawk simahawk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beside Guewen's remarks LGTM

queue_job/job.py Outdated
"""Check if a job to be executed with the same key exists."""
jobs = env['queue.job'].search(
[('identity_key', '=', identity_key),
('state', 'in', [PENDING, ENQUEUED, STARTED])],
Copy link
Member

@yvaucher yvaucher May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl; dr: Please remove STARTED from the search


On a second though, I'm wondering if we want all states. If the process is quite long. Wouldn't we have an issue when we dismiss the creation of a new job if one is already STARTED?

What I have in mind is for an instance:

  • an update on record X creates a job to export
  • later the job A starts
  • while job A is in started state, a second update is made on record X
  • it doesn't trigger a new export job

Thus you will have a mismatch between local recordA and it's exported values.
And you won't have any job to update it. And might never have another update on that record.

This could happen if the current job is running on outdated values.

I see 2 options to avoid this.

  1. destroy/create or restart the existing job

When next job should do exactly the same we want to do a retry with udpated data.
(as far as I know an ongoing job won't be killed by changing the state so it is more a do again than a restart)

Or could we imagine interrupting the ongoing job to really restart it? But we would have to add some mechanisms to interrupt nicely.

  1. don't consider STARTED as duplicate

This would clearly mean sometime doing the same thing and create a new job. But it would close the breach.
Having 2 jobs would show more clearly that you did 2 calls, even if the first was useless.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @yvaucher is correct and a started job should not be considered as a duplicate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

destroy/create or restart the existing job

No please 🗡️

don't consider STARTED as duplicate

Yes

@TDu
Copy link
Member Author

TDu commented May 25, 2018

The last fixup commit answers the previous remarks.

self._cr.execute(
"CREATE INDEX queue_job_identity_key_state_partial_index "
"ON queue_job (identity_key) WHERE state in ('pending', "
"'enqueued');"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only realize now that it would probably be way more efficient to have WHERE state in ('pending', 'enqueued') AND identity_key IS NOT NULL in the index, so the index will not keep updating the index for jobs without identity_key (don't slowing down jobs that don't use this feature is a target).

@guewen
Copy link
Member

guewen commented Jun 28, 2018

I pushed 2 new commits:

  • The first (fixup) adds the identity_key IS NOT NULL clause in the partial index
  • The seconds extends the identity key feature with ability to use identity functions (taking the job as parameter). There is a provided identity function (identity_exact) that should cover most of the cases: if the method call and arguments are the same for the same record(s), skip it. The possibility to provide a hash string still exists but is probably an edge case, I guess that nearly always you want the hash key to be dependent of the job method, record and arguments, and it makes no sense to compute the hash beforehand with these properties rather than computing the hash from the job properties.

Copy link
Member

@yvaucher yvaucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reapproving with last changes

TDu and others added 2 commits July 30, 2018 09:06
Providing a hash will probably be insufficient for most of the needs,
because we'd want to include the called model, method, record at least,
and maybe some arguments.

A function that takes a job and returns a key allows to generate this
key based on the properties of the job. A identity key function
'identity_exact' provides an exact match on the delayed method.
@coveralls
Copy link

coveralls commented Jul 30, 2018

Coverage Status

Coverage increased (+0.7%) to 77.658% when pulling 17b0220 on TDu:no-duplicate-job into a2a14e0 on OCA:10.0.

@guewen guewen merged commit b5a09c8 into OCA:10.0 Jul 30, 2018
@lmignon lmignon mentioned this pull request Sep 24, 2018
6 tasks
lmignon added a commit to acsone/connector that referenced this pull request May 22, 2019
anikeenko-viktor pushed a commit to anikeenko-viktor/queue that referenced this pull request Nov 17, 2020
okuryan added a commit to xpansa/queue that referenced this pull request Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants