Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch UnicodeDecodeError when opening corrupt beat-schedule.db #8806

Merged

Conversation

andyzickler
Copy link
Contributor

There is existing code to detect if celerybeat-schedule.db is corrupted and recreate it, however sometimes a UnicodeDecodeError is thrown in the process of throwing the KeyError. This catches that error and allows Beat to use the existing code to recreate the database.

(Fixes #2907) This issue was closed because the workaround posted helped the user, but the underlying issue was not resolved.

Example of a stack trace this fixes:

[2024-01-19 00:54:01,637: CRITICAL/MainProcess] beat raised exception <class 'UnicodeDecodeError'>: UnicodeDecodeError('ascii', b'\x07\xe4\x02\x0c\x13\x11\x00\t\x11)', 1, 2, 'ordinal not in range(128)')
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/shelve.py", line 111, in __getitem__
    value = self.cache[key]
            ~~~~~~~~~~^^^^^
KeyError: 'entries'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/andy/src/django/venv3.11/lib/python3.11/site-packages/celery/apps/beat.py", line 113, in start_scheduler
    service.start()
  File "/Users/andy/src/django/venv3.11/lib/python3.11/site-packages/celery/beat.py", line 634, in start
    humanize_seconds(self.scheduler.max_interval))
                     ^^^^^^^^^^^^^^
  File "/Users/andy/src/django/venv3.11/lib/python3.11/site-packages/kombu/utils/objects.py", line 40, in __get__
    return super().__get__(instance, owner)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/Users/andy/src/django/venv3.11/lib/python3.11/site-packages/celery/beat.py", line 677, in scheduler
    return self.get_scheduler()
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/andy/src/django/venv3.11/lib/python3.11/site-packages/celery/beat.py", line 668, in get_scheduler
    return symbol_by_name(self.scheduler_cls, aliases=aliases)(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/andy/src/django/venv3.11/lib/python3.11/site-packages/celery/beat.py", line 513, in __init__
    super().__init__(*args, **kwargs)
  File "/Users/andy/src/django/venv3.11/lib/python3.11/site-packages/celery/beat.py", line 264, in __init__
    self.setup_schedule()
  File "/Users/andy/src/django/venv3.11/lib/python3.11/site-packages/celery/beat.py", line 541, in setup_schedule
    self._create_schedule()
  File "/Users/andy/src/django/venv3.11/lib/python3.11/site-packages/celery/beat.py", line 570, in _create_schedule
    self._store['entries']
    ~~~~~~~~~~~^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/shelve.py", line 114, in __getitem__
    value = Unpickler(f).load()
            ^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 1: ordinal not in range(128)

@auvipy auvipy self-requested a review January 19, 2024 09:33
Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting catch! can we add some unit test for it? although we can go without it as well for this case

@auvipy auvipy added this to the 5.4 milestone Jan 19, 2024
Copy link

codecov bot commented Jan 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (d7700e2) 81.19% compared to head (2cebf34) 81.22%.
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8806      +/-   ##
==========================================
+ Coverage   81.19%   81.22%   +0.03%     
==========================================
  Files         148      148              
  Lines       18523    18523              
  Branches     3165     3165              
==========================================
+ Hits        15039    15045       +6     
+ Misses       3196     3191       -5     
+ Partials      288      287       -1     
Flag Coverage Δ
unittests 81.20% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@Nusnus Nusnus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, On the same page as @auvipy

@andyzickler
Copy link
Contributor Author

Thanks for checking. I'll see if i can make a test. It doesn't look like there is one for PersistentScheduler._create_schedule, but there is one for setup_schedule that I think I can use as a reference.

There is existing code to detect if celerybeat-schedule.db is corrupted and recreate it, however sometimes a UnicodeDecodeError or TypeError is thrown in the process of throwing the KeyError. This catches that error and allows Beat to use the existing code to recreate the database.

(Fixes celery#2907)
@andyzickler andyzickler force-pushed the fix/celerybeat-schedule-unicodedecodeerror branch from fb1951e to 2cebf34 Compare January 19, 2024 23:42
@Nusnus
Copy link
Member

Nusnus commented Jan 19, 2024

CI approved to run - it will take time though.
Should start in about 1-2h.

@andyzickler
Copy link
Contributor Author

@Nusnus @auvipy I've added two tests that exercise how _create_schedule handles errors. I also added a catch for TypeError as I found that exception would sometimes happen when opening a corrupt database. Here's the trace for that

KeyError: 'entries'
  File "shelve.py", line 111, in __getitem__
    value = self.cache[key]
TypeError: cannot unpack non-iterable ellipsis object
  File "celery/apps/beat.py", line 113, in start_scheduler
    service.start()
  File "celery/beat.py", line 634, in start
    humanize_seconds(self.scheduler.max_interval))
  File "kombu/utils/objects.py", line 31, in __get__
    return super().__get__(instance, owner)
  File "functools.py", line 1001, in __get__
    val = self.func(instance)
  File "celery/beat.py", line 677, in scheduler
    return self.get_scheduler()
  File "celery/beat.py", line 668, in get_scheduler
    return symbol_by_name(self.scheduler_cls, aliases=aliases)(
  File "celery/beat.py", line 513, in __init__
    super().__init__(*args, **kwargs)
  File "celery/beat.py", line 264, in __init__
    self.setup_schedule()
  File "celery/beat.py", line 541, in setup_schedule
    self._create_schedule()
  File "celery/beat.py", line 570, in _create_schedule
    self._store['entries']
  File "shelve.py", line 113, in __getitem__
    f = BytesIO(self.dict[key.encode(self.keyencoding)])
  File "dbm/dumb.py", line 148, in __getitem__
    pos, siz = self._index[key]     # may raise KeyError

It seems a little smelly that we're catching a few different exception types, but I'm not sure of a better method. It feels like shelve's parsing code could be a little more robust.

@andyzickler
Copy link
Contributor Author

Hmm, i see that setup_schedule catches any exception and then destroys the database.

celery/celery/beat.py

Lines 529 to 539 in 8f38999

def setup_schedule(self):
try:
self._store = self._open_schedule()
# In some cases there may be different errors from a storage
# backend for corrupted files. Example - DBPageNotFoundError
# exception from bsddb. In such case the file will be
# successfully opened but the error will be raised on first key
# retrieving.
self._store.keys()
except Exception as exc: # pylint: disable=broad-except
self._store = self._destroy_open_corrupted_schedule(exc)

Should I change _create_schedule to work the same? What do you think? I don't really like blanket catching, but there might be some other parsing error that we don't know about yet.

@auvipy
Copy link
Member

auvipy commented Jan 23, 2024

Should I change _create_schedule to work the same? What do you think? I don't really like blanket catching, but there might be some other parsing error that we don't know about yet.

would love to see how that works. but not fully sure now. may be in another PR? or you want that to try here?

@andyzickler
Copy link
Contributor Author

Let's go with what I've got here for now. I think i have everything covered and we can add more catches later if they appear. I don't want to catch an exception that can't actually be resolved by recreating the database, like an IO error or something.

With that question resolved, I think this is good to merge. 🎉

@Nusnus
Copy link
Member

Nusnus commented Jan 23, 2024

Let's go with what I've got here for now. I think i have everything covered and we can add more catches later if they appear. I don't want to catch an exception that can't actually be resolved by recreating the database, like an IO error or something.

With that question resolved, I think this is good to merge. 🎉

LGTM, your call on the merge @auvipy

@auvipy auvipy merged commit 3e98049 into celery:main Jan 23, 2024
82 checks passed
@auvipy
Copy link
Member

auvipy commented Jan 23, 2024

thanks guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Celery beat UnicodeDecodeError (Python 3.4) issue
3 participants