Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate month name in AutoDateLocator on DST timezones #2737

Closed
nepix32 opened this issue Jan 16, 2014 · 4 comments
Closed

Duplicate month name in AutoDateLocator on DST timezones #2737

nepix32 opened this issue Jan 16, 2014 · 4 comments

Comments

@nepix32
Copy link
Contributor

nepix32 commented Jan 16, 2014

Problem

When executing the following snippet of code on matplolib 1.3.1, the month name November will show up twice on plot_date using a timezone with daylight saving time (meaning the second november tick is november 30, 23:00, almost december, but not quite). This is misleading and annoying when interpreting the plot because events after the december tick are already in january.

# -*- coding: utf-8 -*-

import matplotlib.pyplot as plt

import pytz
import datetime
from dateutil.rrule import rrule, MONTHLY

mytz = pytz.timezone('America/Los_Angeles')

startdt = datetime.datetime(2013, 8, 1, 0, 0) # start during DST period 

# create positions where monthly ticks should be
myrule = rrule(MONTHLY, dtstart = startdt, count = 6)

localizer = lambda dt: mytz.localize(dt)
localized = [localizer(el) for el in myrule] # convert naive to timezone

plt.plot_date(localized, [1 for el in localized], '+-') # shows month name November twice
plt.grid(color = 'k', which = 'major', linestyle = ':', linewidth = 0.5)
plt.tight_layout()

plt.show()

Reason

In the example above, the proper tick positions are produced when using a naive datetime as startdt.

The reason is presumably that dateutil.rrule results in "wrong" dates used as tick locations when startdt is already localized and normalized afterwards. When doing something in the line of

myrule = rrule(MONTHLY, dtstart = mytz.localize(startdt), count = 6)
print [mytz.normalize(dt) for dt in myrule]

the datetimes from November onward will be off by the DST offset (which is not what I would like to have for the ticks position).

Could this probably also happen in matplotlib?

@pelson
Copy link
Member

pelson commented Jan 17, 2014

Definitely the same bug and should be fixed upstream at dateutil.

As a workaround, it might be possible to override your rrule to check the distance from the next month and if it is within an hour, tick over to that. I don't think that is something we can add to the underlying AutoDateLocator though.

I believe this is a wont-fix at the matplotlib level, but thank you for making the problem easy to reproduce and isolating the problem to the rrule function itself.

Please feel free to clarify if I'm missed something glaring.

All the best,

@pelson pelson closed this as completed Jan 17, 2014
@nepix32
Copy link
Contributor Author

nepix32 commented Jan 17, 2014

I am not so sure if dateutil.rrule should receive the full blame. It exactly does what it has been asked for, returning MONTHLY occurrences. However, the semantics of what that actually means, are disambiguous depending on dtstart parameter:

  • Case 1, starting with a naive datetime: Not ambiguous, simply give me the first of each month, starting at 0:00
  • Case 2, starting with a localized datetime, for example PDT (Pacific daylight time): Ambiguous, what did we ask for: Give me the first of each month, starting at 0:00 in PDT!. Of course this translates to 1:00 in PST during the non-DST period after normalization, which is correct (!).

Probably the maintainers of rrule should clarify, what they expected to be the correct answer when having a localized datetime in a daylight saving zone.

In my opinion, matplotlib.dates AutoDateLocator should ask for ticks in naive datetime (see case 1 above). The first of a month at 0:00 as naive datetime will always be the first of a month, 0:00 in all possible timezone representations! Unfortunately I do not know matplotlib well enough to understand how to pull that off.

See following dirty hack, which is incorrect, but working around the issue, so that the plot is at least usable. By ommiting the timezome by creating the AutoDateLocator it falls back to UTC, so that the month names are what they should be.

# -*- coding: utf-8 -*-

import matplotlib.pyplot as plt

import pytz
import datetime
from dateutil.rrule import rrule, MONTHLY

from matplotlib.dates import AutoDateLocator
from matplotlib.dates import AutoDateFormatter

mytz = pytz.timezone('America/Los_Angeles')

startdt = datetime.datetime(2013, 8, 1, 0, 0) # start during DST period 

# create positions where monthly ticks should be
myrule = rrule(MONTHLY, dtstart = startdt, count = 6)

localizer = lambda dt: mytz.localize(dt)
localized = [localizer(el) for el in myrule] # convert naive to timezone

plt.plot_date(localized, [1 for el in localized], '+-') # month names correct, but ticks are 8 hours off (PST/PDT - UTC) 
plt.grid(color = 'k', which = 'major', linestyle = ':', linewidth = 0.5)
loc = AutoDateLocator() # tz is missing, meaning UTC
plt.gca().xaxis.set_major_locator(loc)

# tz has to be specified here, otherwise labelling is incorrect when zooming in to hourly resolution  
plt.gca().xaxis.set_major_formatter(AutoDateFormatter(loc, tz = mytz))

plt.tight_layout()

plt.show()

It would be very nice if this issue could be opened again so it stays in the open until the pest possible solution has been found.

@stefs
Copy link

stefs commented Oct 26, 2015

To work around this bug, I wrote my own (correct, as far as I can tell) implementation of datetime tick locators. They are not as generic as the original ones, because I tailored them to my use case. They can be adjusted easily, though. The formatter is not changed. I'm posting them here with the hope that someone might find them useful.

import dateutil.rrule

def month_locator(start, end, tz):
    lower = start.astimezone(tz).date().replace(day=1)
    upper = end.astimezone(tz).date()
    rule = dateutil.rrule.rrule(dateutil.rrule.MONTHLY,
                                dtstart=lower, until=upper)
    return [tz.localize(dt) for dt in rule if start <= tz.localize(dt) <= end]

def week_locator(start, end, tz):
    lower = start.astimezone(tz).date()
    upper = end.astimezone(tz).date()
    rule = dateutil.rrule.rrule(dateutil.rrule.WEEKLY,
                                byweekday=dateutil.rrule.MO,
                                dtstart=lower, until=upper)
    return [tz.localize(dt) for dt in rule if start <= tz.localize(dt) <= end]

def day_locator(start, end, tz):
    lower = start.astimezone(tz).date()
    upper = end.astimezone(tz).date()
    rule = dateutil.rrule.rrule(dateutil.rrule.DAILY,
                                dtstart=lower, until=upper)
    return [tz.localize(dt) for dt in rule if start <= tz.localize(dt) <= end]

start and end should be timezone-aware datetime instances, tz are pytz timezones. Intended usage in matplotlib:

ax.set_xticks(month_locator(frame_start, frame_end, TIMEZONE), minor=False)

Reference to dateutil issue: dateutil/dateutil#102

@pganssle
Copy link
Member

My plan moving forward is to see what can be done to help this situation from the dateutil side, but I'm honestly not sure that there's much that can be done. See the following example:

>>> from datetime import datetime, timedelta
>>> from dateutil import tz
>>> import pytz
>>> 
>>> relocalize = lambda dt, tzinfo: dt.astimezone(pytz.UTC).astimezone(tzinfo)
>>> 
>>> BERLIN = pytz.timezone('Europe/Berlin')
>>> 
>>> dt1 = BERLIN.localize(datetime(2015, 10, 8, 19))
>>> dt2 = dt1 + timedelta(days=31)
>>> dt3 = dt1.replace(month=11)
>>> 
>>> print(relocalize(dt1, BERLIN))
2015-10-08 19:00:00+02:00
>>> print(relocalize(dt2, BERLIN))
2015-11-08 18:00:00+01:00
>>> print(relocalize(dt3, BERLIN))
2015-11-08 18:00:00+01:00
>>> 
>>> print(dt3 - dt1)
31 days, 0:00:00
>>> BERLIN2 = tz.gettz('Europe/Berlin')
>>> dt1 = datetime(2015, 10, 8, 19, tzinfo=BERLIN2)
>>> dt2 = dt1 + timedelta(days=31)
>>> dt3 = dt1.replace(month=11)
>>> 
>>> print(relocalize(dt1, BERLIN2))
2015-10-08 19:00:00+02:00
>>> print(relocalize(dt2, BERLIN2))
2015-11-08 19:00:00+01:00
>>> print(relocalize(dt3, BERLIN2))
2015-11-08 19:00:00+01:00 
>>> print(dt3 - dt1)
31 days, 0:00:00

The problem arises with the caching behavior of the UTC offset in pytz and how it deals with arithmetic on dates as they cross DST boundaries. pytz even acknowledges that it's a problem on their website:

This library differs from the documented Python API for tzinfo implementations; if you want to create local wallclock times you need to use the localize() method documented in this document. In addition, if you perform date arithmetic on local times that cross DST boundaries, the result may be in an incorrect timezone (ie. subtract 1 minute from 2002-10-27 1:00 EST and you get 2002-10-27 0:59 EST instead of the correct 2002-10-27 1:59 EDT). A normalize() method is provided to correct this. Unfortunately these issues cannot be resolved without modifying the Python datetime implementation (see PEP-431).

This seems likely to be a wontfix in dateutil, and based on that disclaimer, it's likely to be a wontfix in pytz as well, as both libraries are acting as expected, so downstream consumers should be aware of the implications of the tzinfo objects they are using.

Dateutil timezones work as expected (they calculate the UTC offset more lazily than pytz does), so if you can, just use those. If you need to use pytz timezones, they should be normalized and/or localized after rrule has been calculated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants