Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substantial drift issue #1802

Closed
ashchristopher opened this issue Jan 18, 2014 · 43 comments
Closed

Substantial drift issue #1802

ashchristopher opened this issue Jan 18, 2014 · 43 comments

Comments

@ashchristopher
Copy link

Similar to the other "Substantial drift" issues, I am seeing the following:

11:57:32 flower.1 | [W 140118 11:57:32 state:163] Substantial drift from celery@vagrant.wave-local.com may mean clocks are out of sync.  Current drift is
11:57:32 flower.1 |     3600 seconds.  [orig: 2014-01-18 11:57:32.070727 recv: 2014-01-18 12:57:32.069728]
11:57:32 flower.1 |     
11:57:34 flower.1 | [W 140118 11:57:34 state:163] Substantial drift from celery@vagrant.wave-local.com may mean clocks are out of sync.  Current drift is
11:57:34 flower.1 |     3600 seconds.  [orig: 2014-01-18 11:57:34.657175 recv: 2014-01-18 12:57:34.656147]
11:57:34 flower.1 |     
11:57:37 flower.1 | [W 140118 11:57:37 state:163] Substantial drift from celery@vagrant.wave-local.com may mean clocks are out of sync.  Current drift is
11:57:37 flower.1 |     3600 seconds.  [orig: 2014-01-18 11:57:37.047058 recv: 2014-01-18 12:57:37.045609]
11:57:37 flower.1 |     
11:57:39 flower.1 | [W 140118 11:57:39 state:163] Substantial drift from celery@vagrant.wave-local.com may mean clocks are out of sync.  Current drift is
11:57:39 flower.1 |     3600 seconds.  [orig: 2014-01-18 11:57:39.054193 recv: 2014-01-18 12:57:39.052635]

As you can see, the tasks are being created at the correct time, but it is saying it was received an hour later. I suspect it's a DST issue - do we think this is a problem on Flower's end or a Celery issue?

@ask
Copy link
Contributor

ask commented Jan 20, 2014

Are you running the latest version? If someone changed the timezone of the machine you may have to restart the worker/flower.

Sent from my iPhone

On Jan 18, 2014, at 5:00 PM, Ash Christopher notifications@github.com wrote:

Similar to the other "Substantial drift" issues, I am seeing the following:

11:57:32 flower.1 | [W 140118 11:57:32 state:163] Substantial drift from celery@vagrant.wave-local.com may mean clocks are out of sync. Current drift is
11:57:32 flower.1 | 3600 seconds. [orig: 2014-01-18 11:57:32.070727 recv: 2014-01-18 12:57:32.069728]
11:57:32 flower.1 |

11:57:34 flower.1 | [W 140118 11:57:34 state:163] Substantial drift from celery@vagrant.wave-local.com may mean clocks are out of sync. Current drift is
11:57:34 flower.1 | 3600 seconds. [orig: 2014-01-18 11:57:34.657175 recv: 2014-01-18 12:57:34.656147]
11:57:34 flower.1 |

11:57:37 flower.1 | [W 140118 11:57:37 state:163] Substantial drift from celery@vagrant.wave-local.com may mean clocks are out of sync. Current drift is
11:57:37 flower.1 | 3600 seconds. [orig: 2014-01-18 11:57:37.047058 recv: 2014-01-18 12:57:37.045609]
11:57:37 flower.1 |

11:57:39 flower.1 | [W 140118 11:57:39 state:163] Substantial drift from celery@vagrant.wave-local.com may mean clocks are out of sync. Current drift is
11:57:39 flower.1 | 3600 seconds. [orig: 2014-01-18 11:57:39.054193 recv: 2014-01-18 12:57:39.052635]

As you can see, the tasks are being created at the correct time, but it is saying it was received an hour later. I suspect it's a DST issue - do we think this is a problem on Flower's end or a Celery issue?


Reply to this email directly or view it on GitHub.

@ashchristopher
Copy link
Author

[vagrant@vagrant wave_drop]$ pip freeze
...
celery==3.1.8
flower==0.6.0
...

I am running the latest celery. The timezone of the box has not been changed. This happens every time flower is run against celery.

@ask
Copy link
Contributor

ask commented Jan 21, 2014

You should find out if celery.utils.timeutils.utcoffset() returns the correct value for each node

Sent from my iPhone

On Jan 20, 2014, at 5:02 PM, Ash Christopher notifications@github.com wrote:

[vagrant@vagrant wave_drop]$ pip freeze
...
celery==3.1.8
flower==0.6.0
...
I am running the latest celery. The timezone of the box has not been changed. This happens every time flower is run against celery.


Reply to this email directly or view it on GitHub.

@ashchristopher
Copy link
Author

This is currently running on my local machine. When I run
celery.utils.timeutils.utcoffset() it returns 0.
On Jan 21, 2014 9:25 AM, "Ask Solem Hoel" notifications@github.com wrote:

You should find out if celery.utils.timeutils.utcoffset() returns the
correct value for each node

Sent from my iPhone

On Jan 20, 2014, at 5:02 PM, Ash Christopher notifications@github.com
wrote:

[vagrant@vagrant wave_drop]$ pip freeze
...
celery==3.1.8
flower==0.6.0
...
I am running the latest celery. The timezone of the box has not been
changed. This happens every time flower is run against celery.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1802#issuecomment-32888911
.

@skyl
Copy link

skyl commented Jan 22, 2014

I'm having a similar issue. I'm passing in my whole Django settings.

Celery == 3.1.8
Django == 1.6.X
Flower == 0.6.0

I'm in PST.

[1] % date
Wed Jan 22 15:00:48 PST 2014

In my Django settings, which I'm passing to celery with --config='mysettings', I have:

TIME_ZONE = "America/New_York"

I start flower with the same config:

Current drift is
    10800 seconds.  [orig: 2014-01-22 15:03:12.572711 recv: 2014-01-22 12:03:12.571017]

Even when I try to set:

CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = "UTC"

The drift exists. However, if I set the TIME_ZONE in my Django settings to America/Los_Angeles, the drift is gone. It seems that the celery worker, rather than flower is the one sensitive to the changing TIME_ZONE setting.

@ashchristopher
Copy link
Author

I've pasted the relevant info from my settings file.

TIME_ZONE = 'UTC'
USE_TZ = True

CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = "UTC"

I am in Toronto ("America/Toronto").

@ask
Copy link
Contributor

ask commented Jan 23, 2014

I wonder if maybe Django modifies the timezone in the time module.

@ask
Copy link
Contributor

ask commented Jan 23, 2014

Note that the event timestamps does not use TIME_ZONE, CELERY_ENABLE_UTC or any of the datetime related settings, as it's using the local timezone of the machine only.

@lukesneeringer
Copy link

We're having this problem as well.

Oh, and it absolutely spams logging output to Loggly, and a ton of bandwidth out from Rackspace. :( Does this really need to be logged on every single task, rather than at worker startup?

Please just give us a way to turn this message off.

@vlcinsky
Copy link

I have the same problem.
Running two servers at Digital Ocean with Ubuntu 12.04.3 x64

  • turbo machine runs Redis
  • archiver machine runs just workers and are being connected over SSH tunnel with port forwarding (to see redis port on localhost).

The tasks are rather long running, at least 300 seconds each, some are multiple times longer. None task runs longer then 90 minutes.

The "substantial drift" is reported only on that archiver machine.

Using redis 2.2.12 (so a bit older - using the package which comes with my Ubuntu).

Load on both machines is about 8 to 10 - running 3 workers on turbo and 4-5 workers on archiver. Each worker has --autoscale=5,3. The task is is I/O intensive, reading data from AWS S3 buckets and posting them back (so generally HTTP traffic), sometime posting 1GB large file (in smaller chunks).

@NotSqrt
Copy link
Contributor

NotSqrt commented Apr 10, 2014

I also get a Substantial drift issue with the following setup:

celery==3.1.10, flower==0.6.0
My machine is on the "Europe/Paris" timezone.

On a regular python shell:

import time
def utcoffset():
    if time.daylight:
        return time.altzone // 3600
    return time.timezone // 3600

returns -2 (we are indeed at UTC+2 at the moment).

With a celery shell, and parameters CELERY_ENABLE_UTC = True and CELERY_TIMEZONE = 'UTC', from celery.utils.timeutils import utcoffset; utcoffset() gives 1.

Events in celery events -d report utcoffset=1

But Flower reports:
Substantial drift from celery@hostname may mean clocks are out of sync. Current drift is 3600 seconds. [orig: 2014-04-10 14:07:51.240687 recv: 2014-04-10 15:07:51.238797]

There does not seem to be any effect if I switch CELERY_TIMEZONE to something else..

I am not particularly annoyed by this, I'm just trying to understand !

@ask
Copy link
Contributor

ask commented Apr 10, 2014

@NotSqrt Are you using Django? I seem to remember seing Django having some side effect on timezones, maybe it does some patching of sorts?

@NotSqrt
Copy link
Contributor

NotSqrt commented Apr 10, 2014

I use Django, so yes, that might be it.

@NotSqrt
Copy link
Contributor

NotSqrt commented Apr 10, 2014

Okay : If I set django parameter TIME_ZONE to the timezone of my machine, the Substantial drift message disappears. If I set it to something else (UTC, Pacific/Tahiti or somewhere else), Flower reports a current drift of 3600 seconds all the time..

@ask
Copy link
Contributor

ask commented May 30, 2014

It's definitely Django that is doing this:

$ python manage.py shell
>>> from django.conf import settings
>>> settings.TIME_ZONE
'America/Chicago'
>>> import time, datetime
>>> time.timezone
21600
>>> time.altzone
18000
>>> str(datetime.datetime.fromtimestamp(time.time()))
>>> '2014-05-30 07:48:39.947606'

Without Django:

 $ python
 >>> import time, datetime
 >>> time.timezone
 0
 >>> time.altzone
 -3600
 >>> str(datetime.datetime.fromtimestamp(time.time()))
 '2014-05-30 13:49:38.935562'

So django is changing these somehow

@wolever
Copy link

wolever commented Sep 17, 2014

I should note (for future people who find this issue by googling) is that it can also come up when using 3.1.9 and 3.1.10 together:

>>> import celery
>>> celery.__version__
'3.1.9'
>>> from celery.utils import timeutils
>>> timeutils.utcoffset()
-4

Versus:

>>> import celery
>>> celery.__version__
'3.1.10'
>>> from celery.utils import timeutils
>>> timeutils.utcoffset()
4

@ask
Copy link
Contributor

ask commented Sep 18, 2014

@wolever, you need to upgrade to a more recent version I think, it was probably fixed later than 3.1.9 at least.

@ask
Copy link
Contributor

ask commented Sep 18, 2014

But right, yeah that you are mixing versions may not be immediately obvious, so good idea to remind people to verify that they have upgraded all the hosts :)

@pcompassion
Copy link

I still see this

USE_TZ = True
TIME_ZONE = 'Asia/Seoul'
CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = 'Asia/Seoul'

celery version 3.1.18
flower 0.8.2

when I run 'date' in my ubuntu box
it shows time in UTC, => I see substantial drift

another ubuntu box with timezone KST => no substantial drift message

@thedrow
Copy link
Member

thedrow commented Jul 18, 2015

@pcompassion Is ntpd configured correctly?

@pcompassion
Copy link

      1. (월) 17:11:06 KST (ubuntu box which shows no warning message)
      1. (월) 08:11:45 UTC (ubuntu box with error message)

They look correct.

@antonagestam
Copy link

I'm still seeing this as well:

WARNING 2015-07-29 15:50:26,590 state 20212 140561769498432 Substantial drift from celery@together-dev may mean clocks are out of sync.  Current drift is
7200 seconds.  [orig: 2015-07-29 15:50:26.504807 recv: 2015-07-29 13:50:26.503507]

Server is UTC with these settings:

CELERY_ENABLE_UTC = True
TIME_ZONE = 'Europe/Stockholm'
USE_TZ = True

@ask
Copy link
Contributor

ask commented Oct 22, 2015

Please include celery version, I fixed an issue with this a long time ago

@bowlofeggs
Copy link

I believe that the following line is the offender:

if time.daylight:

According to the Python documentation, time.daylight is non-zero if a DST is defined:

https://docs.python.org/2/library/time.html#time.daylight

However, Celery is treating that value as if it means whether the local time is currently in daylight savings time or standard time. I have a machine set to the America/New_York time zone right now and time.daylight returns 1 even though we are currently in EST not EDT. This causes the utcoffset() method to be off by one in the winter here, which causes adjust_timestamp to be wrong by one hour.

To fix this, I suggest dropping the usage of local time everywhere in Celery and use UTC only for all timestamps. I believe this will greatly simplify things!

@vlcinsky
Copy link

vlcinsky commented Dec 4, 2015

NIce catch @rbarlow .
I completely agree with using only UTC. It helps to have something constant when dealing with moving targets like a time (even when it is only timezone offset). I am dealing with traffic information being created and processed in multiple timezones and without keeping timestamps in UTC it would be one big mess.
On the other hand, someone has to dive into the code to do that and this can be challenging amount of work (probably resulting in better working solution being written in less lines of code).

bowlofeggs pushed a commit to bowlofeggs/pulp that referenced this issue Dec 4, 2015
Pulp had been using the timestamp attribute of Celery's Events to determine when the workers were most
recently available. Due to a bug in Celery, this cause an off-by-one-hour bug for users who live in a locale
that does daylight savings time that is not currently observing daylight savings time (such as New York in
the winter):

celery/celery#1802 (comment)

This fixes the issue by using the local_received time which is computed in the receiver by Celery with
time.time() without any timezone shifting attempts.

https://pulp.plan.io/issues/1380

fixes pulp#1380
@bowlofeggs
Copy link

I believe this was fixed by 85fbe12

@ask
Copy link
Contributor

ask commented Jan 26, 2016

And this patch is part of 3.1.20, so please try upgrading :)

The events don't actually use timestamps for anything but displaying the human time of the event, as fallback ordering, so I just tried finding the simplest backward compatible solution that I could find.

Using pytz/datetime added lots of overhead which was a problem when we have to process 4 times as many events, as there are tasks in the cluster.

@fillest
Copy link

fillest commented Mar 2, 2016

On 3.1.20 I sometimes get e.g. Substantial drift from celery@saver may mean clocks are out of sync. Current drift is 17 seconds. [orig: 2016-03-02 20:31:10.494412 recv: 2016-03-02 20:30:53.695555] too (as on 3.1.19). Host timezone is UTC. ntp is installed and ntpdate -q pool.ntp.org shows offset 0.002794 sec

@Aug-G
Copy link

Aug-G commented Mar 30, 2016

me too ! @fillest +1

@thedrow
Copy link
Member

thedrow commented Mar 31, 2016

@ask This won't go away eh?

@webmakin
Copy link

adding,

USE_TZ = False
TIME_ZONE = 'Asia/Kolkata'

to django settings and adding,
CELERY_TIMEZONE = 'Asia/Kolkata'
then restarting celery and uwsgi, solved the issue for me

@ask
Copy link
Contributor

ask commented Jun 23, 2016

It doesn't matter if the clocks are synced by NTP actually. The clock drift is based on the timestamp in the message, and the time we received it.

So if the worker is not able to keep up with the events, and process them too late, the drift warning will be produced.

You should set the CELERY_EVENT_QUEUE_TTL, CELERY_EVENT_QUEUE_EXPIRES setting to avoid this (these will be enabled by default in 4.0)

@ask ask closed this as completed Jun 23, 2016
antoine-de added a commit to antoine-de/tartare that referenced this issue Aug 1, 2016
we have some strange 'clock out of sync' problem between the workers
because of the timezones of the different machine

seems to be corrected in newer celery version:
celery/celery#1802
@w-A-L-L-e
Copy link

We also noticed this to happen today with celery version 3.1.23 !!!
celery (3.1.23)

2017-01-25 11:36:07,506: WARNING/MainProcess] Substantial drift from celery@Walters-MacBook-Pro.local may mean clocks are out of sync. Current drift is
842 seconds. [orig: 2017-01-25 11:36:07.015840 recv: 2017-01-25 11:22:05.133577]

@thedrow
Copy link
Member

thedrow commented Jan 30, 2017

Did you try upgrading to 4.x?

@SergiiShcherbak
Copy link

I had the same issue, but changing $ celery flower to $ celery proj flower fixed it.

@maxandron
Copy link

maxandron commented Jun 21, 2017

Hi,
I have the same issue unfortunately...

Celery version: 4.0.2
Django version: 1.10.6

USE_TZ = True
TIME_ZONE = 'Israel'
enable_utc = True
timezone = 'Israel'

[2017-06-21 14:52:59,483: WARNING/MainProcess] Substantial drift from celery@2fcd73b4-6bf3-4f39-8e8b-f157d4548bf5 may mean clocks are out of sync. Current drift is 10800 seconds. [orig: 2017-06-21 14:52:59.483377 recv: 2017-06-21 17:52:59.470257]

@thedrow
Copy link
Member

thedrow commented Jul 9, 2017

@maxandron The information provided isn't enough to debug this issue. Please open a new issue with the necessary information for us to debug this problem.

@maxandron
Copy link

@thedrow Thank you for the reply, what kind of information will be helpful?

@thedrow
Copy link
Member

thedrow commented Jul 11, 2017

@maxandron See the checklist when opening a new issue.
Also, please include any ntpd configuration you have.

@hotea
Copy link

hotea commented Sep 1, 2018

I got this warning and can't fix with set timezone, finally I found that I only restart the worker but didn't restart the web app...

@armadadean
Copy link

armadadean commented Aug 17, 2019

It doesn't matter if the clocks are synced by NTP actually. The clock drift is based on the timestamp in the message, and the time we received it.

So if the worker is not able to keep up with the events, and process them too late, the drift warning will be produced.

You should set the CELERY_EVENT_QUEUE_TTL, CELERY_EVENT_QUEUE_EXPIRES setting to avoid this (these will be enabled by default in 4.0)

@ask . How to ensure that the worker can keep up with the events? This happened to me when I scaled my worker to 100 and up.. Should I increase the resource of my RabbitMQ that serves as a broker on my celery workers?

I am using celery 4.2.0

@thedrow
Copy link
Member

thedrow commented Aug 18, 2019

Please use the mailing list, IRC or stack overflow for such questions.

@celery celery locked as off-topic and limited conversation to collaborators Aug 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests