Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert postgresql date_trunc() to UTC to prevent pandas error #4319

Merged
merged 3 commits into from
Feb 7, 2018

Conversation

habalux
Copy link
Contributor

@habalux habalux commented Jan 31, 2018

Postgresql DATE_TRUNC() returns type "timestamp with time zone" when used with a "date" type. This causes an error with pandas library using utc=False and when used with a time grain different from the source column.

The problem is fixed here by casting the function call to "timestamp without time zone" regardless of the source type. This fix may cause wrong times to be shown in the query result though, depending on the data.

Reference to issue #4250

@mistercrunch
Copy link
Member

From a quick look at the docs, it looks like timestamp without time zone gets rid of the TZ adjustments, shouldn't we convert to UTC instead?

Sometimes I think about all of the cumulated engineering wasted time in the world spent thinking about time zones and unicode and I sob a little.

@habalux habalux changed the title cast postgresql date_trunc() to timestamp without time zone to prevent pandas error convert postgresql date_trunc() to UTC to prevent pandas error Feb 6, 2018
@habalux
Copy link
Contributor Author

habalux commented Feb 6, 2018

Couldn't agree more. I changed the cast to a timezone conversion and it seems to do the trick as well, maybe a bit more robust even. Though I'm still getting the same incorrect hours on the graphs as before (me and my data are on UTC+2 or +3 DST) but this may not be the PR to fix that one.

@habalux
Copy link
Contributor Author

habalux commented Feb 7, 2018

I ran some more tests of my own, and it seems that just converting to UTC will cause invalid times with the type "timestamp without time zone" as it adds the current (session) timezone to the timestamp. It also swaps the return type so "timestamp without time zone" becomes "timestamp with time zone" and vice versa.

I found that the most consistent approach would be to use "SET TIMEZONE to 'UTC';" at the start of a connection and then casting to "timestamp without time zone". This way all returned timestamps would be in UTC. Couldn't find a proper place to set a query to run at connection time though, any pointers? Also this would then require the UI to handle TZ conversions when rendering the data.

@mistercrunch mistercrunch merged commit 3b35ddf into apache:master Feb 7, 2018
michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
…e#4319)

* cast postgresql date_trunc() to timestamp without time zone to prevent pandas error

* fix formatting for flake8

* change cast to timezone conversion instead
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
…e#4319)

* cast postgresql date_trunc() to timestamp without time zone to prevent pandas error

* fix formatting for flake8

* change cast to timezone conversion instead
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.23.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.23.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants