Set job back to 'pending' in case of exception #150

Merged
merged 1 commit into from Nov 19, 2015

Conversation

Projects
None yet
5 participants
@mdietrichc2c
Contributor

mdietrichc2c commented Nov 18, 2015

This PR is for issue #148 in 7.0 ; the goal is to set back the job in state pending if execution of the asynchronous job raised an exception (timeout or other).

connector/jobrunner/runner.py
+ with closing(conn.cursor()) as cr:
+ cr.execute("UPDATE queue_job SET state=%s, "
+ "date_enqueued=NULL, date_started=NULL, worker_uuid=NULL "
+ "WHERE uuid=%s", (PENDING, job_uuid))

This comment has been minimized.

@lmignon

lmignon Nov 18, 2015

Contributor

we should add 'state='enqueued'` to the where clause

@lmignon

lmignon Nov 18, 2015

Contributor

we should add 'state='enqueued'` to the where clause

This comment has been minimized.

@guewen

guewen Nov 19, 2015

Member

Yes, because if the controller changes the state to started and we get a timeout or no response, the jobrunner could reset a running job to enqueued (not sure that could really happen but anyway that's safer).

@guewen

guewen Nov 19, 2015

Member

Yes, because if the controller changes the state to started and we get a timeout or no response, the jobrunner could reset a running job to enqueued (not sure that could really happen but anyway that's safer).

connector/jobrunner/runner.py
try:
# we are not interested in the result, so we set a short timeout
# but not too short so we trap and log hard configuration errors
requests.get(url, timeout=1)
except requests.Timeout:
- pass
+ _logger.exception("timeout in GET %s", url)

This comment has been minimized.

@guewen

guewen Nov 19, 2015

Member

Could you change this log level to error? We don't need the traceback on this one.

@guewen

guewen Nov 19, 2015

Member

Could you change this log level to error? We don't need the traceback on this one.

@mdietrichc2c

This comment has been minimized.

Show comment
Hide comment
@mdietrichc2c

mdietrichc2c Nov 19, 2015

Contributor

@lmignon @guewen thanks for the feedback ; I did the requested changes.

Contributor

mdietrichc2c commented Nov 19, 2015

@lmignon @guewen thanks for the feedback ; I did the requested changes.

@guewen

This comment has been minimized.

Show comment
Hide comment
@guewen

guewen Nov 19, 2015

Member

Thanks for the fix! 👍

Member

guewen commented Nov 19, 2015

Thanks for the fix! 👍

@lmignon

This comment has been minimized.

Show comment
Hide comment
@lmignon

lmignon Nov 19, 2015

Contributor

@mdietrichc2c Thank you for the changes! The code looks good to me. 👍

Contributor

lmignon commented Nov 19, 2015

@mdietrichc2c Thank you for the changes! The code looks good to me. 👍

@laetitia-gangloff

This comment has been minimized.

Show comment
Hide comment
@laetitia-gangloff

laetitia-gangloff Nov 19, 2015

It seems ok 👍
Thank you !

It seems ok 👍
Thank you !

@guewen

This comment has been minimized.

Show comment
Hide comment
@guewen

guewen Nov 19, 2015

Member

Not yet ready to merge, still we have an issue with this fix.

Member

guewen commented Nov 19, 2015

Not yet ready to merge, still we have an issue with this fix.

@lmignon

This comment has been minimized.

Show comment
Hide comment
@lmignon

lmignon Nov 19, 2015

Contributor

@guewen, on our side @laetitia-gangloff is also testing these changes and that's why I don't have merged these changes despite the 3 thumbs up.

Contributor

lmignon commented Nov 19, 2015

@guewen, on our side @laetitia-gangloff is also testing these changes and that's why I don't have merged these changes despite the 3 thumbs up.

@laetitia-gangloff

This comment has been minimized.

Show comment
Hide comment
@laetitia-gangloff

laetitia-gangloff Nov 19, 2015

Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(_self.__args, *_self.__kwargs)
File "/home/lga/project/openerp/addons-connector/connector/jobrunner/runner.py", line 129, in urlopen
set_job_pending()
File "/home/lga/project/openerp/addons-connector/connector/jobrunner/runner.py", line 111, in set_job_pending
"WHERE uuid=%s and state=%s", (PENDING, job_uuid, ENQUEUED)
ProgrammingError: column "worker_uuid" of relation "queue_job" does not exist
LINE 1: ...'pending', date_enqueued=NULL, date_started=NULL, worker_uui...

Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(_self.__args, *_self.__kwargs)
File "/home/lga/project/openerp/addons-connector/connector/jobrunner/runner.py", line 129, in urlopen
set_job_pending()
File "/home/lga/project/openerp/addons-connector/connector/jobrunner/runner.py", line 111, in set_job_pending
"WHERE uuid=%s and state=%s", (PENDING, job_uuid, ENQUEUED)
ProgrammingError: column "worker_uuid" of relation "queue_job" does not exist
LINE 1: ...'pending', date_enqueued=NULL, date_started=NULL, worker_uui...

@guewen

This comment has been minimized.

Show comment
Hide comment
@guewen

guewen Nov 19, 2015

Member

👎 until it is fixed

Member

guewen commented Nov 19, 2015

👎 until it is fixed

@laetitia-gangloff

This comment has been minimized.

Show comment
Hide comment
@laetitia-gangloff

laetitia-gangloff Nov 19, 2015

I think ", worker_uuid=NULL" can be removed...
I did the following test :

start odoo
set a lot of job in the queue (7000)
start runner
wait 30 seconds
stop odoo
remove started job from my database
stop runner
start odoo
start runner

And all seems ok.

I think ", worker_uuid=NULL" can be removed...
I did the following test :

start odoo
set a lot of job in the queue (7000)
start runner
wait 30 seconds
stop odoo
remove started job from my database
stop runner
start odoo
start runner

And all seems ok.

@mdietrichc2c

This comment has been minimized.

Show comment
Hide comment
@mdietrichc2c

mdietrichc2c Nov 19, 2015

Contributor

I have another issue: even when removeing the autocommit and adding cr.commit() at the end, the commit does not happen on an enqueued job (I created a timeout for testing).

Contributor

mdietrichc2c commented Nov 19, 2015

I have another issue: even when removeing the autocommit and adding cr.commit() at the end, the commit does not happen on an enqueued job (I created a timeout for testing).

connector/jobrunner/runner.py
try:
# we are not interested in the result, so we set a short timeout
# but not too short so we trap and log hard configuration errors
requests.get(url, timeout=1)
except requests.Timeout:
- pass
+ _logger.error("timeout in GET %s", url)
+ set_job_pending()

This comment has been minimized.

@sbidoul

sbidoul Nov 19, 2015

Member

Timeout is a normal behaviour for long running jobs. So I would not log this as an error.

@sbidoul

sbidoul Nov 19, 2015

Member

Timeout is a normal behaviour for long running jobs. So I would not log this as an error.

@mdietrichc2c

This comment has been minimized.

Show comment
Hide comment
@mdietrichc2c

mdietrichc2c Nov 19, 2015

Contributor

Turns out the error I had was due to my test (instant timeout => job was enqueued, set as pending, then running, enqueued, timeout...)

@sbidoul @laetitia-gangloff I corrected both points : removed the absent column from the SQL request , and the message is now logged as a debug message.

Contributor

mdietrichc2c commented Nov 19, 2015

Turns out the error I had was due to my test (instant timeout => job was enqueued, set as pending, then running, enqueued, timeout...)

@sbidoul @laetitia-gangloff I corrected both points : removed the absent column from the SQL request , and the message is now logged as a debug message.

@mdietrichc2c

This comment has been minimized.

Show comment
Hide comment
@mdietrichc2c

mdietrichc2c Nov 19, 2015

Contributor

Actually, after discussing it with @guewen , every job longer than a second will log a timeout message ; it was decided to remove them for timeouts, and I rebased the whole thing in one commit.

Contributor

mdietrichc2c commented Nov 19, 2015

Actually, after discussing it with @guewen , every job longer than a second will log a timeout message ; it was decided to remove them for timeouts, and I rebased the whole thing in one commit.

@guewen

This comment has been minimized.

Show comment
Hide comment
@guewen

guewen Nov 19, 2015

Member

Thanks a lot @mdietrichc2c :-)
I lift my thumb down 👍

Member

guewen commented Nov 19, 2015

Thanks a lot @mdietrichc2c :-)
I lift my thumb down 👍

try:
# we are not interested in the result, so we set a short timeout
# but not too short so we trap and log hard configuration errors
requests.get(url, timeout=1)
except requests.Timeout:
- pass
+ set_job_pending()

This comment has been minimized.

@lmignon

lmignon Nov 19, 2015

Contributor

What will happen if a job is longer than 1 min? The state will be set to 'pending' and the running job will fails with a concurrent update error?

@lmignon

lmignon Nov 19, 2015

Contributor

What will happen if a job is longer than 1 min? The state will be set to 'pending' and the running job will fails with a concurrent update error?

This comment has been minimized.

@mdietrichc2c

mdietrichc2c Nov 19, 2015

Contributor

Wouldn't the started job have started as a state, and hence not be reset as pending ?

@mdietrichc2c

mdietrichc2c Nov 19, 2015

Contributor

Wouldn't the started job have started as a state, and hence not be reset as pending ?

This comment has been minimized.

@guewen

guewen Nov 19, 2015

Member

Why? The state of the job is set to started when /runjob is called. From my understanding, this use case only cover a timeout that happen before we could hit the controller. If we hit the controller, the job is set to started so the job won't be reset to pending.

@guewen

guewen Nov 19, 2015

Member

Why? The state of the job is set to started when /runjob is called. From my understanding, this use case only cover a timeout that happen before we could hit the controller. If we hit the controller, the job is set to started so the job won't be reset to pending.

This comment has been minimized.

@lmignon

lmignon Nov 19, 2015

Contributor

@mdietrichc2c you're right

@lmignon

lmignon Nov 19, 2015

Contributor

@mdietrichc2c you're right

@lmignon

This comment has been minimized.

Show comment
Hide comment
@lmignon

lmignon Nov 19, 2015

Contributor

Thank you @mdietrichc2c . I can now comment my 👍 with 'Code review and functional test'

Contributor

lmignon commented Nov 19, 2015

Thank you @mdietrichc2c . I can now comment my 👍 with 'Code review and functional test'

guewen added a commit that referenced this pull request Nov 19, 2015

Merge pull request #150 from mdietrichc2c/exception-set-job-pending
Set job back to 'pending' in case of exception

@guewen guewen merged commit bde178d into OCA:7.0 Nov 19, 2015

1 check passed

ci/runbot runbot build 3135449-150-b9ad58 (runtime 14s)
Details

@guewen guewen removed the needs review label Nov 19, 2015

@laetitia-gangloff

This comment has been minimized.

Show comment
Hide comment
@laetitia-gangloff

laetitia-gangloff Nov 19, 2015

I redo the same test than before, and it works as expected 👍

I redo the same test than before, and it works as expected 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment