Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling result.ready() twice before task is completed makes result.ready() return false forever dedpite task is completed #6

Closed
zimnyx opened this issue Aug 18, 2010 · 7 comments

Comments

@zimnyx
Copy link

zimnyx commented Aug 18, 2010

One important note is that this doesn't happen under celery alone - occured after I tried task under django-celery.
Below is output from bpython:

In [50]: res = sleep_10_sec_and_add.delay(1,9)

In [51]: res.ready()
Out[51]: False

In [52]: # 10 sec passes

In [53]: res.ready()
Out[53]: True

In [54]: 

In [55]: res = sleep_10_sec_and_add.delay(1,9)

In [56]: res.ready()
Out[56]: False

In [57]: res.ready()
Out[57]: False

In [58]: ## 10 sec passes

In [59]: res.ready()
Out[59]: False    #despite celeryd logs show task was completed successfuly

I provided you very few information - if you cannot reproduce the bug - contact me.

@zimnyx
Copy link
Author

zimnyx commented Aug 18, 2010

My workmate noticed that this bug is connected with result backend used by django-celery (DatabaseBackend).

@ask
Copy link
Contributor

ask commented Aug 19, 2010

Can't reproduce here. This is what I tried:

@task
def sleep_and_add(secs, x, y):
    from time import sleep
    sleep(secs)
    return x + y

>>> r = sleep_and_add.delay(10, 2, 2)
>>> r.ready()
False
>>> r.ready()
False
# 10 seconds passes
>>> r.ready()
True

>>> r = sleep_and_add.delay(10, 2, 2); r.ready(); r.ready()
False
False
# 10 seconds passes
>>> r.ready()
True

@zimnyx
Copy link
Author

zimnyx commented Aug 19, 2010

I've found the reason.

In short: it's issue caused by transaction isolation level - default REPEATABLE READ for MySQL/InnoDB.

Hitting more details:
step 1) r.ready() is performed before task is completed, using connection X to MySQL.
MySQLdb is by default in autocommit=OFF mode, so this SELECT was done inside transaction. Any further same-SELECT will return same result, no matter if row will be modified inside other connection. It's because how transaction works in isolation level REPEATABLE READ.

step 2) Task id completed, celeryd updates task status from connection Y, and commits.

step 3) r.ready() is performed , using connection X to MySQL. Despite row was modified, SELECT retuns same, obsolete result. So we would expect r.ready() to be True, but we get False.

@zimnyx
Copy link
Author

zimnyx commented Aug 19, 2010

Take look on SQL:

SELECT for result is done from conection 466
UPDATE is done from conection 467

100819 14:32:07   466 Connect   root@localhost on test
          466 Query SET storage_engine=INNODB, time_zone=UTC
          466 Query SET NAMES utf8
          466 Query set autocommit=0
          466 Query INSERT INTO `celery_taskmeta` (`task_id`, `status`, `result`, `date_done`, `traceback`) VALUES ('6e1c4e07-4274-42d2-ba1c-f610ccc89c03', 'PENDING', NULL, '2010-08-19 07:32:07', NULL)
          466 Query commit
100819 14:32:08   466 Query SELECT * FROM `celery_taskmeta` WHERE `celery_taskmeta`.`task_id` = '6e1c4e07-4274-42d2-ba1c-f610ccc89c03'
100819 14:32:15   467 Connect   root@localhost on test
          467 Query SET storage_engine=INNODB, time_zone=UTC
          467 Query SET NAMES utf8
          467 Query set autocommit=0
          467 Query SELECT * FROM `celery_taskmeta` WHERE `celery_taskmeta`.`task_id` = '6e1c4e07-4274-42d2-ba1c-f610ccc89c03'
          467 Query SELECT * FROM `celery_taskmeta` WHERE `celery_taskmeta`.`id` = 6  LIMIT 1
          467 Query UPDATE `celery_taskmeta` SET `task_id` = '6e1c4e07-4274-42d2-ba1c-f610ccc89c03', `status` = 'SUCCESS', `result` = 'STMKLg==', `date_done` = '2010-08-19 07:32:15', `traceback` = NULL WHERE `celery_taskmeta`.`id` = 6
          467 Query commit
          467 Quit  
100819 14:32:20   466 Query SELECT * FROM `celery_taskmeta` WHERE `celery_taskmeta`.`task_id` = '6e1c4e07-4274-42d2-ba1c-f610ccc89c03'

@zimnyx
Copy link
Author

zimnyx commented Aug 20, 2010

Reassuming ,

  • Bug occours when Django ORM uses transactional database table for result backend, for example MySQL/InnoDB.
  • Asking for result before task is completed (db SELECT) makes all consecutive SELECTS inside same transaction return same result (ready == False), despite worker has updated task status (SUCCESSFUL) from other, separate connection.
  • I cannot see easy solution, if one want up-to-date result one should use non-transactional table for storing results, like MyISAM for MySQL.
  • As many peoples are using transactional engines, it should be emhasized in documentation for celery.result.AsyncResult that asking for resul using ready() or result properties can return obsolete value, if transaction tables are used by database result backend.
  • It's not a big deal in production systems, because if someone puts task into queue he rather doesn't need result in the same code block, so this is more likely terminal-man bug.

@zimnyx
Copy link
Author

zimnyx commented Aug 20, 2010

My proposition
if settings.DATABASE_SUPPORTS_TRANSACTIONS = True and inside_transaction=True and previous_call_to_result_returned_false:
raise Exception('Result is unknown - you're inside transaction')

@ask
Copy link
Contributor

ask commented Aug 20, 2010

Show warning if polling results with transaction isolation level repeatable-read on MySQL. Closed by b1fb37e. Thanks to zimnyz

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants