-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot compare tz-naive and tz-aware timestamps on concat #6925
Comments
Thanks for raising this. I was able to reproduce using fastparquet, but found that it works as expected when using pyarrow ( It seems fastparquet isn't storing the timezone information properly in the metadata file, but it isn't clear to me whether or not this is a bug in the dask code base or if it lives in fastparquet. Either way there is an issue with Ping @martindurant for fastparquet expertise. |
Will look... I'll mention that parquet does not store time zones, this information only goes into the pandas metadata (i.e., only for data that came from pandas originally), so it is quite likely that this hasn't been implemented in fastparquet at all. |
More about fastparquet and timezones in dask/fastparquet#532 |
If I can get some guidance on this topic, I can try a PR. I have no view on what is happening before/after neither on the divisions/partitions... So maybe a bit though for me. |
Actually, it may be worth trying again versus fastparquet master, because previously failure to set the timezone was being ignored, and that particular failure should no longer happen. However, it might need more work to apply the same thing to the min/max values. |
I have checked with master for dask and fastparquet and the issue is still there. |
What happened:
When concatenating two dask dataframes with indices dype=datetime64[ns, UTC], I get a
TypeError: Cannot compare tz-naive and tz-aware timestamps
. One of the the dask dataframe was created withdd.from_pandas
and the other withdd.read_parquet
What you expected to happen:
An happy concatenation ;-)
Minimal Complete Verifiable Example:
Anything else we need to know?:
When printing both indices, we see that the dtype of both indices are
datetime64[ns, UTC]
yet the representation of the index coming from read-parquet shows tz-naive dates.Environment:
Dask version: 2.30.0
Python version: 3.8
Operating System: win10
Install method (conda, pip, source): pip
The text was updated successfully, but these errors were encountered: