-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent date offsets with timezone handling #11979
Comments
Isn't duckdb's duckdb_plus60 the correct answer according to the linked blog post? |
If your problem is that import datetime
import pandas
import dateutil
print(pandas.Timestamp("2018-02-14 12:00:00-05:00") + datetime.timedelta(days=60))
print(pandas.Timestamp("2018-02-14 12:00:00", tz="America/New_York") + datetime.timedelta(days=60))
print(pandas.Timestamp("2018-02-14 12:00:00", tz=dateutil.tz.gettz("America/New_York")) + datetime.timedelta(days=60))
|
I interpret as the opposite: that the correct timezone aware offset is the one given by the python. And that duckdb is returning the non normalized version. But, let me mull over the pandas version you sent! |
The first example (UTC-05:00) isn't reproducing, since it's a fixed offset w/ no daylight savings component. The behavior I'm expecting is given by your second and third examples:
|
What time zone are you in? |
Could you summarize what you think is wrong in duckdb, at the moment? (I still don't think anything is wrong.) Also, note that your results depend on duckdb's timezone setting, so you should explicitly set that in your example via
If you use UTC instead, you get different results (*): import duckdb
conn=duckdb.connect()
conn.query("SET TimeZone='UTC'")
query = "SELECT ('2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval 60 day) - '2018-04-15 17:00:00+00'::TIMESTAMPTZ"
print(conn.query(query))
conn.query("SET TimeZone='America/New_York'")
print(conn.query(query))
(*) and I believe correctly so: a sensible definition of |
from duckdb_settings() where name='TimeZone';
select '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ as d, '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval 60 day as base_date;
This looks correct for me (at least this is what ICU does when you cross a DST boundary: preserve wall clock time). |
Pandas may do something different here, but what adding 60 days across a DST boundary should return is a matter for theologians really. I'd also like to point out that pandas uses |
I'll answer the other questions this evening, but doesn't DuckDB also use pytz? Or am I misreading the discussion here: #11974 |
No, we use ICU internally (which doesn't have these problems). We only use |
Yup, that's what I was referring to. I was expecting it (a timezone aware date plus an interval 60 day) to behave the same as the second example above. I haven't (yet) had a chance to review your subsequent comments which seems to contradict the examples I had at the beginning. |
Not sure what you meant by "the second example above"? It sounds to me like you are expecting the date arithmetic to add exactly 60 days to the instant, which is what pandas does? ICU does not behave that way, so duckdb does not either. Moreover, this is not what Postgres does: hawkfish=# select '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ, '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval '60 day' as base_date;
timestamptz | base_date
------------------------+------------------------
2018-02-14 09:00:00-08 | 2018-04-15 09:00:00-07
(1 row) (BTW I am in |
Yes that does what @paultiq is expecting: select
'2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ d,
'2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval (24 * 60) hours as base_date;
|
Sorry, didn't meant to make you look talking to yourself; deleted my comment so I could check first whether it actually works because I was on my phone at the time. Anyway, glad it works, so duckdb doesn't only do the theologically right thing from the perspective of the blog post, but also offers the heretic option in a simple way. Only sad thing to be taken from this is that duckdb can't currently do "add X wall clock days to timestamptz Y from the perspective of timezone Z" with any Z other than the one on the settings (in particular not with a Z that's variable across a table). Not that that's surprising given that TIMESTAMPTZ doesn't store timezones, and not that I can see myself needing it anytime soon, but potentially something for a future separate feature request. (Which would strictly speaking also require specifying what to do if the result lands in the midst of a daylight savings fold or a gap, so really should be a request for a function |
Yes, thanks... that's the part I was hung up on. That this version is I totally accept this is a. a question for the theologians and b. that this is how icu works. So, feel free to close as "as designed" :) Thanks for the discussion. import duckdb
conn=duckdb.connect()
conn.query("SET TimeZone='UTC'")
query = "SELECT '2018-02-14 23:00:00 America/New_York'::TIMESTAMPTZ start_date , '2018-02-14 23:00:00 America/New_York'::TIMESTAMPTZ + interval 60 day end_date"
print(conn.query(query))
conn.query("SET TimeZone='America/New_York'")
print(conn.query(query))
Yet this version: import datetime
import pandas
pts = pandas.Timestamp("2018-02-14 23:00:00", tz="America/New_York")
print(pts)
print(pts + datetime.timedelta(days=60)) is ```
|
(You can close issues yourself as "won't do') |
What happens?
I noticed pytz mentioned in a recent issue and noted the use of localize() matched the issues raised in this blog post (that comes up every time pytz is mentioned): https://blog.ganssle.io/articles/2018/03/pytz-fastest-footgun.html.
Output:
May be related to #9431 ?
To Reproduce
given above
OS:
Windows
DuckDB Version:
10.2 and duckdb-0.10.3.dev781
DuckDB Client:
Python
Full Name:
Paul Timmins
Affiliation:
Iqmo
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a nightly build
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: