Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tzfile reader only reads 32-bit (verson 0) zoneinfo files #462

Open
pganssle opened this issue Sep 20, 2017 · 4 comments · May be fixed by #1130
Open

tzfile reader only reads 32-bit (verson 0) zoneinfo files #462

pganssle opened this issue Sep 20, 2017 · 4 comments · May be fixed by #1130

Comments

@pganssle
Copy link
Member

pganssle commented Sep 20, 2017

The output of zic includes a 32-bit legacy section and a 64-bit section that encodes the transition times as 64-bit epoch times. Currently we're only reading the first one. What we should do is read the zoneinfo header to determine how long the first section is, seek to the end of that section and see if there's a second "version 2" section, and if so read that one instead, otherwise go back and finish reading the 32-bit version.

This will have almost no real effect at the moment.

@ryanpetrello
Copy link
Contributor

ryanpetrello commented May 10, 2018

@pganssle would this issue explain why dates beyond the 2038 year for certain timezones seem to have offset discrepancies? e.g.,

for x in rrulestr('DTSTART;TZID=America/Sao_Paulo:20200101T000001 RRULE:FREQ=YEARLY;INTERVAL=1;COUNT=20').xafter(now):
    print x
2020-01-01 00:00:01-02:00
2021-01-01 00:00:01-02:00
2022-01-01 00:00:01-02:00
2023-01-01 00:00:01-02:00
2024-01-01 00:00:01-02:00
2025-01-01 00:00:01-02:00
2026-01-01 00:00:01-02:00
2027-01-01 00:00:01-02:00
2028-01-01 00:00:01-02:00
2029-01-01 00:00:01-02:00
2030-01-01 00:00:01-02:00
2031-01-01 00:00:01-02:00
2032-01-01 00:00:01-02:00
2033-01-01 00:00:01-02:00
2034-01-01 00:00:01-02:00
2035-01-01 00:00:01-02:00
2036-01-01 00:00:01-02:00
2037-01-01 00:00:01-02:00
2038-01-01 00:00:01-03:00           # <-----------
2039-01-01 00:00:01-03:00

@pganssle
Copy link
Member Author

@ryanpetrello Yes. dateutil reverts to STD, pytz holds the last value, and the correct thing to do is to create something like a fallback tzstr object from the 64-bit data.

@md-magenta
Copy link

md-magenta commented May 1, 2019

I hit this or a related bug today while working on some failing unittests. The tests used year 1900, 2000, 2100 and so on for far between dates. datetime.datetime(1901,1,1, tzinfo=dateutil.tz.gettz('Europe/Copenhagen')) returns different (and wrong) datetimes depending on the system.

# macOS
md@macos> python3 -c "import datetime, dateutil.tz; print(datetime.datetime(1901,1,1, tzinfo=dateutil.tz.gettz('Europe/Copenhagen')))"
1901-01-01 00:00:00+01:00

> zdump -v "Europe/Copenhagen" | head -n 5
Europe/Copenhagen  Fri Dec 13 20:45:52 1901 UTC = Fri Dec 13 21:45:52 1901 CET isdst=0
Europe/Copenhagen  Sat Dec 14 20:45:52 1901 UTC = Sat Dec 14 21:45:52 1901 CET isdst=0
Europe/Copenhagen  Sun May 14 21:59:59 1916 UTC = Sun May 14 22:59:59 1916 CET isdst=0
Europe/Copenhagen  Sun May 14 22:00:00 1916 UTC = Mon May 15 00:00:00 1916 CEST isdst=1
Europe/Copenhagen  Sat Sep 30 20:59:59 1916 UTC = Sat Sep 30 22:59:59 1916 CEST isdst=1

# docker image based on Debian
md@docker> docker run -it python:3.7 bash
> pip3 install python-dateutil
#
> python3 -c "import datetime, dateutil.tz; print(datetime.datetime(1901,1,1, tzinfo=dateutil.tz.gettz('Europe/Copenhagen')))"
1901-01-01 00:00:00+00:50:20

> python3 -c "import datetime
import dateutil.tz
tzinfo = dateutil.tz.gettz('Europe/Copenhagen')
for trans in tzinfo._trans_list_utc[:5]:
    print(datetime.datetime.utcfromtimestamp(trans))"

1901-12-13 20:45:52 # Should have been arround 1894-01-01 
1916-05-14 22:00:00
1916-09-30 21:00:00
1940-05-14 23:00:00
1942-11-02 01:00:00

> zdump -v "Europe/Copenhagen" | head
Europe/Copenhagen  -9223372036854775808 = NULL
Europe/Copenhagen  -9223372036854689408 = NULL
Europe/Copenhagen  Tue Dec 31 23:09:39 1889 UT = Tue Dec 31 23:59:59 1889 LMT isdst=0 gmtoff=3020
Europe/Copenhagen  Tue Dec 31 23:09:40 1889 UT = Wed Jan  1 00:00:00 1890 CMT isdst=0 gmtoff=3020
Europe/Copenhagen  Sun Dec 31 23:09:39 1893 UT = Sun Dec 31 23:59:59 1893 CMT isdst=0 gmtoff=3020
Europe/Copenhagen  Sun Dec 31 23:09:40 1893 UT = Mon Jan  1 00:09:40 1894 CET isdst=0 gmtoff=3600
Europe/Copenhagen  Sun May 14 21:59:59 1916 UT = Sun May 14 22:59:59 1916 CET isdst=0 gmtoff=3600
Europe/Copenhagen  Sun May 14 22:00:00 1916 UT = Mon May 15 00:00:00 1916 CEST isdst=1 gmtoff=7200
Europe/Copenhagen  Sat Sep 30 20:59:59 1916 UT = Sat Sep 30 22:59:59 1916 CEST isdst=1 gmtoff=7200
Europe/Copenhagen  Sat Sep 30 21:00:00 1916 UT = Sat Sep 30 22:00:00 1916 CET isdst=0 gmtoff=3600

md-magenta added a commit to OS2mo/os2mo that referenced this issue May 2, 2019
dateutil/dateutil#462

Dates before 1901-12-13T20:45:52Z get a wrong timezone offset.
1901-12-13T20:45:52Z is the unix time of -2147483648 = −2^(31). For some reason
the internal structure of `dateutil.tz.gettz()` does not correctly handle this
and gives the first available offset for a given timezone. For the case of
`Europe/Copenhagen` it is a offset of +00:50:20 (3020 min) if the tzdb is
sufficiently updated with historical data.

> python3 -c "import datetime
import dateutil.tz
tzinfo = dateutil.tz.gettz('Europe/Copenhagen')
for trans in tzinfo._trans_list_utc[:10]:
    print(datetime.datetime.utcfromtimestamp(trans))"

1901-12-13 20:45:52 # Should have been arround 1894-01-01
1916-05-14 22:00:00
1916-09-30 21:00:00
1940-05-14 23:00:00
1942-11-02 01:00:00
1943-03-29 01:00:00
1943-10-04 01:00:00
1944-04-03 01:00:00
1944-10-02 01:00:00
1945-04-02 01:00:00

> zdump -v /usr/share/zoneinfo/Europe/Copenhagen | head

/usr/share/zoneinfo/Europe/Copenhagen  -9223372036854775808 = NULL
/usr/share/zoneinfo/Europe/Copenhagen  -9223372036854689408 = NULL
/usr/share/zoneinfo/Europe/Copenhagen  Tue Dec 31 23:09:39 1889 UT = Tue Dec 31 23:59:59 1889 LMT isdst=0 gmtoff=3020
/usr/share/zoneinfo/Europe/Copenhagen  Tue Dec 31 23:09:40 1889 UT = Wed Jan  1 00:00:00 1890 CMT isdst=0 gmtoff=3020
/usr/share/zoneinfo/Europe/Copenhagen  Sun Dec 31 23:09:39 1893 UT = Sun Dec 31 23:59:59 1893 CMT isdst=0 gmtoff=3020
/usr/share/zoneinfo/Europe/Copenhagen  Sun Dec 31 23:09:40 1893 UT = Mon Jan  1 00:09:40 1894 CET isdst=0 gmtoff=3600
/usr/share/zoneinfo/Europe/Copenhagen  Sun May 14 21:59:59 1916 UT = Sun May 14 22:59:59 1916 CET isdst=0 gmtoff=3600
/usr/share/zoneinfo/Europe/Copenhagen  Sun May 14 22:00:00 1916 UT = Mon May 15 00:00:00 1916 CEST isdst=1 gmtoff=7200
/usr/share/zoneinfo/Europe/Copenhagen  Sat Sep 30 20:59:59 1916 UT = Sat Sep 30 22:59:59 1916 CEST isdst=1 gmtoff=7200
/usr/share/zoneinfo/Europe/Copenhagen  Sat Sep 30 21:00:00 1916 UT = Sat Sep 30 22:00:00 1916 CET isdst=0 gmtoff=3600

We will hit a similar problem at year 2038 if the upstream bug is not solved by tehn

> python3 -c "import datetime
import dateutil.tz
tzinfo = dateutil.tz.gettz('Europe/Copenhagen')
for trans in tzinfo._trans_list_utc[-10:]:
    print(datetime.datetime.utcfromtimestamp(trans))"

2033-03-27 01:00:00
2033-10-30 01:00:00
2034-03-26 01:00:00
2034-10-29 01:00:00
2035-03-25 01:00:00
2035-10-28 01:00:00
2036-03-30 01:00:00
2036-10-26 01:00:00
2037-03-29 01:00:00
2037-10-25 01:00:00
md-magenta added a commit to OS2mo/os2mo that referenced this issue May 9, 2019
dateutil/dateutil#462

Dates before 1901-12-13T20:45:52Z get a wrong timezone offset.
1901-12-13T20:45:52Z is the unix time of -2147483648 = −2^(31). For some reason
the internal structure of `dateutil.tz.gettz()` does not correctly handle this
and gives the first available offset for a given timezone. For the case of
`Europe/Copenhagen` it is a offset of +00:50:20 (3020 min) if the tzdb is
sufficiently updated with historical data.

> python3 -c "import datetime
import dateutil.tz
tzinfo = dateutil.tz.gettz('Europe/Copenhagen')
for trans in tzinfo._trans_list_utc[:10]:
    print(datetime.datetime.utcfromtimestamp(trans))"

1901-12-13 20:45:52 # Should have been arround 1894-01-01
1916-05-14 22:00:00
1916-09-30 21:00:00
1940-05-14 23:00:00
1942-11-02 01:00:00
1943-03-29 01:00:00
1943-10-04 01:00:00
1944-04-03 01:00:00
1944-10-02 01:00:00
1945-04-02 01:00:00

> zdump -v /usr/share/zoneinfo/Europe/Copenhagen | head

/usr/share/zoneinfo/Europe/Copenhagen  -9223372036854775808 = NULL
/usr/share/zoneinfo/Europe/Copenhagen  -9223372036854689408 = NULL
/usr/share/zoneinfo/Europe/Copenhagen  Tue Dec 31 23:09:39 1889 UT = Tue Dec 31 23:59:59 1889 LMT isdst=0 gmtoff=3020
/usr/share/zoneinfo/Europe/Copenhagen  Tue Dec 31 23:09:40 1889 UT = Wed Jan  1 00:00:00 1890 CMT isdst=0 gmtoff=3020
/usr/share/zoneinfo/Europe/Copenhagen  Sun Dec 31 23:09:39 1893 UT = Sun Dec 31 23:59:59 1893 CMT isdst=0 gmtoff=3020
/usr/share/zoneinfo/Europe/Copenhagen  Sun Dec 31 23:09:40 1893 UT = Mon Jan  1 00:09:40 1894 CET isdst=0 gmtoff=3600
/usr/share/zoneinfo/Europe/Copenhagen  Sun May 14 21:59:59 1916 UT = Sun May 14 22:59:59 1916 CET isdst=0 gmtoff=3600
/usr/share/zoneinfo/Europe/Copenhagen  Sun May 14 22:00:00 1916 UT = Mon May 15 00:00:00 1916 CEST isdst=1 gmtoff=7200
/usr/share/zoneinfo/Europe/Copenhagen  Sat Sep 30 20:59:59 1916 UT = Sat Sep 30 22:59:59 1916 CEST isdst=1 gmtoff=7200
/usr/share/zoneinfo/Europe/Copenhagen  Sat Sep 30 21:00:00 1916 UT = Sat Sep 30 22:00:00 1916 CET isdst=0 gmtoff=3600

We will hit a similar problem at year 2038 if the upstream bug is not solved by tehn

> python3 -c "import datetime
import dateutil.tz
tzinfo = dateutil.tz.gettz('Europe/Copenhagen')
for trans in tzinfo._trans_list_utc[-10:]:
    print(datetime.datetime.utcfromtimestamp(trans))"

2033-03-27 01:00:00
2033-10-30 01:00:00
2034-03-26 01:00:00
2034-10-29 01:00:00
2035-03-25 01:00:00
2035-10-28 01:00:00
2036-03-30 01:00:00
2036-10-26 01:00:00
2037-03-29 01:00:00
2037-10-25 01:00:00
md-magenta added a commit to OS2mo/os2mo that referenced this issue May 9, 2019
dateutil/dateutil#462

Dates before 1901-12-13T20:45:52Z get a wrong timezone offset.
1901-12-13T20:45:52Z is the unix time of -2147483648 = −2^(31). For some reason
the internal structure of `dateutil.tz.gettz()` does not correctly handle this
and gives the first available offset for a given timezone. For the case of
`Europe/Copenhagen` it is a offset of +00:50:20 (3020 min) if the tzdb is
sufficiently updated with historical data.

> python3 -c "import datetime
import dateutil.tz
tzinfo = dateutil.tz.gettz('Europe/Copenhagen')
for trans in tzinfo._trans_list_utc[:10]:
    print(datetime.datetime.utcfromtimestamp(trans))"

1901-12-13 20:45:52 # Should have been arround 1894-01-01
1916-05-14 22:00:00
1916-09-30 21:00:00
1940-05-14 23:00:00
1942-11-02 01:00:00
1943-03-29 01:00:00
1943-10-04 01:00:00
1944-04-03 01:00:00
1944-10-02 01:00:00
1945-04-02 01:00:00

> zdump -v /usr/share/zoneinfo/Europe/Copenhagen | head

/usr/share/zoneinfo/Europe/Copenhagen  -9223372036854775808 = NULL
/usr/share/zoneinfo/Europe/Copenhagen  -9223372036854689408 = NULL
/usr/share/zoneinfo/Europe/Copenhagen  Tue Dec 31 23:09:39 1889 UT = Tue Dec 31 23:59:59 1889 LMT isdst=0 gmtoff=3020
/usr/share/zoneinfo/Europe/Copenhagen  Tue Dec 31 23:09:40 1889 UT = Wed Jan  1 00:00:00 1890 CMT isdst=0 gmtoff=3020
/usr/share/zoneinfo/Europe/Copenhagen  Sun Dec 31 23:09:39 1893 UT = Sun Dec 31 23:59:59 1893 CMT isdst=0 gmtoff=3020
/usr/share/zoneinfo/Europe/Copenhagen  Sun Dec 31 23:09:40 1893 UT = Mon Jan  1 00:09:40 1894 CET isdst=0 gmtoff=3600
/usr/share/zoneinfo/Europe/Copenhagen  Sun May 14 21:59:59 1916 UT = Sun May 14 22:59:59 1916 CET isdst=0 gmtoff=3600
/usr/share/zoneinfo/Europe/Copenhagen  Sun May 14 22:00:00 1916 UT = Mon May 15 00:00:00 1916 CEST isdst=1 gmtoff=7200
/usr/share/zoneinfo/Europe/Copenhagen  Sat Sep 30 20:59:59 1916 UT = Sat Sep 30 22:59:59 1916 CEST isdst=1 gmtoff=7200
/usr/share/zoneinfo/Europe/Copenhagen  Sat Sep 30 21:00:00 1916 UT = Sat Sep 30 22:00:00 1916 CET isdst=0 gmtoff=3600

We will hit a similar problem at year 2038 if the upstream bug is not solved by tehn

> python3 -c "import datetime
import dateutil.tz
tzinfo = dateutil.tz.gettz('Europe/Copenhagen')
for trans in tzinfo._trans_list_utc[-10:]:
    print(datetime.datetime.utcfromtimestamp(trans))"

2033-03-27 01:00:00
2033-10-30 01:00:00
2034-03-26 01:00:00
2034-10-29 01:00:00
2035-03-25 01:00:00
2035-10-28 01:00:00
2036-03-30 01:00:00
2036-10-26 01:00:00
2037-03-29 01:00:00
2037-10-25 01:00:00
md-magenta added a commit to OS2mo/os2mo that referenced this issue May 23, 2019
dateutil/dateutil#462

Dates before 1901-12-13T20:45:52Z get a wrong timezone offset.
1901-12-13T20:45:52Z is the unix time of -2147483648 = −2^(31). For some reason
the internal structure of `dateutil.tz.gettz()` does not correctly handle this
and gives the first available offset for a given timezone. For the case of
`Europe/Copenhagen` it is a offset of +00:50:20 (3020 min) if the tzdb is
sufficiently updated with historical data.

> python3 -c "import datetime
import dateutil.tz
tzinfo = dateutil.tz.gettz('Europe/Copenhagen')
for trans in tzinfo._trans_list_utc[:10]:
    print(datetime.datetime.utcfromtimestamp(trans))"

1901-12-13 20:45:52 # Should have been arround 1894-01-01
1916-05-14 22:00:00
1916-09-30 21:00:00
1940-05-14 23:00:00
1942-11-02 01:00:00
1943-03-29 01:00:00
1943-10-04 01:00:00
1944-04-03 01:00:00
1944-10-02 01:00:00
1945-04-02 01:00:00

> zdump -v /usr/share/zoneinfo/Europe/Copenhagen | head

/usr/share/zoneinfo/Europe/Copenhagen  -9223372036854775808 = NULL
/usr/share/zoneinfo/Europe/Copenhagen  -9223372036854689408 = NULL
/usr/share/zoneinfo/Europe/Copenhagen  Tue Dec 31 23:09:39 1889 UT = Tue Dec 31 23:59:59 1889 LMT isdst=0 gmtoff=3020
/usr/share/zoneinfo/Europe/Copenhagen  Tue Dec 31 23:09:40 1889 UT = Wed Jan  1 00:00:00 1890 CMT isdst=0 gmtoff=3020
/usr/share/zoneinfo/Europe/Copenhagen  Sun Dec 31 23:09:39 1893 UT = Sun Dec 31 23:59:59 1893 CMT isdst=0 gmtoff=3020
/usr/share/zoneinfo/Europe/Copenhagen  Sun Dec 31 23:09:40 1893 UT = Mon Jan  1 00:09:40 1894 CET isdst=0 gmtoff=3600
/usr/share/zoneinfo/Europe/Copenhagen  Sun May 14 21:59:59 1916 UT = Sun May 14 22:59:59 1916 CET isdst=0 gmtoff=3600
/usr/share/zoneinfo/Europe/Copenhagen  Sun May 14 22:00:00 1916 UT = Mon May 15 00:00:00 1916 CEST isdst=1 gmtoff=7200
/usr/share/zoneinfo/Europe/Copenhagen  Sat Sep 30 20:59:59 1916 UT = Sat Sep 30 22:59:59 1916 CEST isdst=1 gmtoff=7200
/usr/share/zoneinfo/Europe/Copenhagen  Sat Sep 30 21:00:00 1916 UT = Sat Sep 30 22:00:00 1916 CET isdst=0 gmtoff=3600

We will hit a similar problem at year 2038 if the upstream bug is not solved by tehn

> python3 -c "import datetime
import dateutil.tz
tzinfo = dateutil.tz.gettz('Europe/Copenhagen')
for trans in tzinfo._trans_list_utc[-10:]:
    print(datetime.datetime.utcfromtimestamp(trans))"

2033-03-27 01:00:00
2033-10-30 01:00:00
2034-03-26 01:00:00
2034-10-29 01:00:00
2035-03-25 01:00:00
2035-10-28 01:00:00
2036-03-30 01:00:00
2036-10-26 01:00:00
2037-03-29 01:00:00
2037-10-25 01:00:00
nickgeorge pushed a commit to google/fhir that referenced this issue Oct 26, 2020
Previously, `_primitive_time_utils.py` leveraged `dateutil` for datetime
timezone arithmetic. However, `dateutil` is unable to properly account for dst
offsets when performing datetime arithemtic for dates in the distant future.
This is a known `dateutil` bug and more info can be found at:
* dateutil/dateutil#462
* dateutil/dateutil#590

This is addressed with PEP615 (https://www.python.org/dev/peps/pep-0615/) for
interpreters >=3.9. `backports.zoneinfo` provides backwards compatibility API
for interpreters >=3.6, <3.9 which we leverage here.

PiperOrigin-RevId: 336909567
@eggert
Copy link

eggert commented Oct 31, 2020

I have created PR#1091, which should fix this issue.

@pganssle pganssle linked a pull request May 20, 2021 that will close this issue
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants