Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue #604 in django-celery-beat: Celery Beat Crashing at the End of Daylight Savings #7901

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

polarmt
Copy link

@polarmt polarmt commented Nov 11, 2022

Dependency

celery/django-celery-beat#605

Note: The two PRs are codependent and need to be merged together.

Description

These changes are necessary to fix celery/django-celery-beat#604 and is a dependency for celery/django-celery-beat#605. Both issues have more detailed information about the problem and how testing was performed for the changes.

@polarmt polarmt changed the title Remove logic to convert one timestamp to be the same as other Fix Issue #604 in django-celery-beat: Celery Beat Crashing at the End of Daylight Savings Nov 11, 2022
celery/utils/time.py Outdated Show resolved Hide resolved
@auvipy
Copy link
Member

auvipy commented Nov 15, 2022

can you rebase on top top of master again please?

@polarmt
Copy link
Author

polarmt commented Nov 15, 2022

can you rebase on top top of master again please?

Done rebasing.

@auvipy
Copy link
Member

auvipy commented Nov 17, 2022

I am not sure why builds are failing, can you check please? are the tests passing locally?

@polarmt polarmt force-pushed the master branch 2 times, most recently from 36c6fc3 to 405ddfd Compare November 17, 2022 19:46
@polarmt
Copy link
Author

polarmt commented Nov 17, 2022

I am a little confused as to what the behavior should be for Celery beat. Currently, the only unit test that is failing is from https://github.com/celery/celery/blob/master/t/unit/app/test_schedules.py#L441. This unit test was added to deal with #1604.

This is where the confusion lies. During the end of DST, my understanding is that we do want the tasks to run twice. For example, a Celery task runs at 1:30 PDT and 1:30 PST. From our business perspective, this makes the most sense. However, the problem in #1604 seems to be that they do not want to the tasks to run twice.

What should be the expected behavior? Should we add a configuration parameter to control this? If so, what is the default parameter?

@auvipy auvipy requested a review from a team November 18, 2022 05:11
@auvipy
Copy link
Member

auvipy commented Nov 29, 2022

I will come back to this soon

@auvipy
Copy link
Member

auvipy commented Dec 14, 2022

can you please elaborate why a task should run twice?

@Nusnus Nusnus removed this from the 5.3 milestone Feb 19, 2023
@auvipy
Copy link
Member

auvipy commented Mar 2, 2023

I am a little confused as to what the behavior should be for Celery beat. Currently, the only unit test that is failing is from https://github.com/celery/celery/blob/master/t/unit/app/test_schedules.py#L441. This unit test was added to deal with #1604.

This is where the confusion lies. During the end of DST, my understanding is that we do want the tasks to run twice. For example, a Celery task runs at 1:30 PDT and 1:30 PST. From our business perspective, this makes the most sense. However, the problem in #1604 seems to be that they do not want to the tasks to run twice.

What should be the expected behavior? Should we add a configuration parameter to control this? If so, what is the default parameter?

can you please revisit this? I want to know more of your thoughts

@polarmt
Copy link
Author

polarmt commented Jun 30, 2023

For instance, let us assume that we have a task that we want to run every hour. An example of such important task is sending a timely email every hour at the 30 minute mark. Then, we would want this task to run both at 1:30 AM PDT AND 1:30 AM PST.

In our business, we send such timely emails for many of our features hourly at different minute marks and having one hour of downtime is detrimental.

@auvipy
Copy link
Member

auvipy commented Jul 2, 2023

For instance, let us assume that we have a task that we want to run every hour. An example of such important task is sending a timely email every hour at the 30 minute mark. Then, we would want this task to run both at 1:30 AM PDT AND 1:30 AM PST.

In our business, we send such timely emails for many of our features hourly at different minute marks and having one hour of downtime is detrimental.

I think running a task multiple times in different timezone should be allowed.

@polarmt
Copy link
Author

polarmt commented Jul 3, 2023

Thanks, @auvipy ! In that case, keep us posted on the progress of the review.

@auvipy
Copy link
Member

auvipy commented Jul 6, 2023

can't review it as the builds are failing and there are merge conflicts. we need those two resolved before moving further.

@polarmt
Copy link
Author

polarmt commented Jul 6, 2023

Can you run the builds again?

@auvipy
Copy link
Member

auvipy commented Jul 8, 2023

re started and builds are failing

@auvipy
Copy link
Member

auvipy commented Nov 6, 2023

can you restart this?

@polarmt
Copy link
Author

polarmt commented Nov 7, 2023

Yes. Yesterday, we discovered a potential bug with this change that was exposed by the latest DST transition. I will summarize it and fix it when I have time.

@polarmt
Copy link
Author

polarmt commented Nov 7, 2023

Root Cause of Bug

In the existing logic of the Celery source code, it will convert the start timezone to the current timezone (PDT to PST) and then compute the difference between start and now. If we compare 1:15 PM PST and 1:15 PM PDT, the difference should be one hour apart. However, with the unnecessary conversion, the remaining function would return 0 minutes. To fix this issue, we skipped the conversion of start from PDT to PST before computing the expected end time.

This works for periodic tasks that run every X minutes, but the logic seems to be incorrect if the beat task uses a crontab. time.remaining takes in three main parameters: now, ends_in, and start. The ends_in represents when we expect the task to run next. This value can either be a timedelta object or an ffwd object.

An ffwd object does not represent the time difference between two times. Instead, "adding" an ffwd object will replace the current datetime object with the times specified in the ffwd object. For instance, if we add ffwd(hour=7, minute=0, weekday=6) to 1:15 AM on Saturday, the resulting time would become 7:00 AM on the following Sunday.

If the Saturday in the example was prior to the transition and the Sunday was afterwards, then we expect the start (1:15 AM) to be in PDT and end_date (7:00 AM) to be PST. Because we skip the conversion of start in the new logic, the end_date will be incorrectly in PDT even though we have transitioned to PST. For ffwd, the changes will not work.

Symptoms of Bug

The first time that the Celery beat task will run will be an hour earlier than the scheduled time. Then, the Celery beat task will run again at the expected time. This causes the Celery beat task to run an additional time unnecessarily.

Proposed Fix

We can check if ends_in is an ffwd object or not before deciding whether to convert ends_in or not:

   if str(start.tzinfo) == str(now.tzinfo) and now.utcoffset() != start.utcoffset():
        if now.utcoffset() > start.utcoffset() or isinstance(ends_in, ffwd):
            # DST started
            start = start.replace(tzinfo=now.tzinfo)

@polarmt
Copy link
Author

polarmt commented Nov 8, 2023

I think I fixed the issue. We can see whether the CI passes.

celery/utils/time.py Outdated Show resolved Hide resolved
Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following test fails as per CI report. we need to fix it to make everything green again

=================================== FAILURES ===================================
________________________________ test_remaining ________________________________

  def test_remaining():
      # Relative
      remaining(datetime.utcnow(), timedelta(hours=1), relative=True)
  
      """
      The upcoming cases check whether the next run is calculated correctly
      """
      eastern_tz = ZoneInfo("US/Eastern")
      tokyo_tz = ZoneInfo("Asia/Tokyo")
      eastern_tz_pytz = pytz.timezone("US/Eastern")
      tokyo_tz_pytz = pytz.timezone("Asia/Tokyo")
  
      # Case 1: `start` in UTC and `now` in other timezone
      start = datetime.now(ZoneInfo("UTC"))
      now = datetime.now(eastern_tz)
      delta = timedelta(hours=1)
      assert str(start.tzinfo) == str(ZoneInfo("UTC"))
      assert str(now.tzinfo) == str(eastern_tz)
      rem_secs = remaining(start, delta, now).total_seconds()
      # assert remaining time is approximately equal to delta
      assert rem_secs == pytest.approx(delta.total_seconds(), abs=1)
  
      # Case 2: `start` and `now` in different timezones (other than UTC)
      start = datetime.now(eastern_tz)
      now = datetime.now(tokyo_tz)
      delta = timedelta(hours=1)
      assert str(start.tzinfo) == str(eastern_tz)
      assert str(now.tzinfo) == str(tokyo_tz)
      rem_secs = remaining(start, delta, now).total_seconds()
      assert rem_secs == pytest.approx(delta.total_seconds(), abs=1)
  
      """
      Case 3: DST check
      Suppose start (which is last_run_time) is in EST while next_run is in EDT,
      then check whether the `next_run` is actually the time specified in the
      start (i.e. there is not an hour diff due to DST).
      In 2019, DST starts on March 10
      """
      start = datetime(day=9, month=3, year=2019, hour=10, minute=0, tzinfo=eastern_tz)         # EST
      now = datetime(day=11, month=3, year=2019, hour=1, minute=0, tzinfo=eastern_tz)           # EDT
      delta = ffwd(hour=10, year=2019, microsecond=0, minute=0, second=0, day=11, weeks=0, month=3)
      # `next_actual_time` is the next time to run (derived from delta)
      next_actual_time = datetime(
          day=11, month=3, year=2019, hour=10, minute=0, tzinfo=eastern_tz)  # EDT
      assert start.tzname() == "EST"
      assert now.tzname() == "EDT"
      assert next_actual_time.tzname() == "EDT"
      rem_time = remaining(start, delta, now)
      next_run = now + rem_time
      assert next_run == next_actual_time
  
      """
      Case 4: DST check (ZoneInfo, timedelta)
      Suppose start (which is last_run_time) is in PDT while next_run is in PST,
      Check whether there is an hour added to the time between now and the `next_run`
      In 2022, DST ends on Nov 6
      """
      start = datetime(day=6, month=11, year=2022, hour=1, minute=15, tzinfo=eastern_tz, fold=0)
      now = datetime(day=6, month=11, year=2022, hour=1, minute=34, tzinfo=eastern_tz, fold=1)
      ends_in = timedelta(minutes=80)
      next_actual_time = datetime(day=6, month=11, year=2022, hour=1, minute=35, tzinfo=eastern_tz, fold=1)
      assert start.tzname() == "EDT"
      assert now.tzname() == "EST"
      assert next_actual_time.tzname() == "EST"
      rem_time = remaining(start, ends_in, now)
      print(start + ends_in - now)
      next_run = now + rem_time
  assert next_run == next_actual_time

E AssertionError: assert datetime.datetime(2022, 11, 6, 2, 35, tzinfo=ZoneInfo(key='US/Eastern')) == datetime.datetime(2022, 11, 6, 1, 35, tzinfo=ZoneInfo(key='US/Eastern'), fold=1)

t/unit/utils/test_time.py:193: AssertionError

t/unit/utils/test_time.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Nov 9, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (7a27725) 87.24% compared to head (f079f86) 87.33%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7901      +/-   ##
==========================================
+ Coverage   87.24%   87.33%   +0.09%     
==========================================
  Files         148      148              
  Lines       18637    18526     -111     
  Branches     3199     3167      -32     
==========================================
- Hits        16260    16180      -80     
+ Misses       2080     2060      -20     
+ Partials      297      286      -11     
Flag Coverage Δ
unittests 87.30% <100.00%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@auvipy
Copy link
Member

auvipy commented Nov 9, 2023

It seems unit tests are now passing. I restarted the failing pypy build as it seems to be a network issue

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to check/fix the lint issues as well. thanks for all your efforts here!! I will come back to this next morning to check if it is merge able with 5.3.5 release

@auvipy
Copy link
Member

auvipy commented Nov 9, 2023

Integration tests are now passing

@auvipy
Copy link
Member

auvipy commented Nov 9, 2023

t/unit/utils/test_time.py:136:5: F841 local variable 'tokyo_tz_pytz' is assigned to but never used
t/unit/utils/test_time.py:193:1: W293 blank line contains whitespace
t/unit/utils/test_time.py:223:118: E501 line too long (118 > 117 characters)
t/unit/utils/test_time.py:230:1: W293 blank line contains whitespace

@auvipy
Copy link
Member

auvipy commented Nov 10, 2023

Will try to push lint fixes

@auvipy auvipy added this to the 5.4 milestone Nov 12, 2023
@auvipy
Copy link
Member

auvipy commented Nov 12, 2023

I might consider this for 5.3.6 if it do not break backward compat

@auvipy
Copy link
Member

auvipy commented Nov 13, 2023

@Nusnus can we consider this for a patch release?

@Nusnus
Copy link
Member

Nusnus commented Nov 21, 2023

@auvipy

@Nusnus can we consider this for a patch release?

I haven't fully reviewed the entire PR, but from the original description:

celery/django-celery-beat#605

Note: The two PRs are codependent and need to be merged together.

It looks like the other PR isn't ready, besides this PR having failures in CI.

That being said, if this work can be baked enough, I support merging on, but we'll have it for v5.4 if it will be ready by then.

@auvipy
Copy link
Member

auvipy commented Nov 21, 2023

Yeah it's better off for 5.4

@@ -218,12 +218,15 @@ def remaining(
~datetime.timedelta: Remaining time.
"""
now = now or datetime.utcnow()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Celery Beat Crashing at the End of Daylight Savings
4 participants