New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Python] Cannot filter dataset with a timestamp (with timezone) column #37110
Comments
|
But I think you wanted this?:
or
The code that created this file set the timezone in the schema. |
>>> dataset_ts.schema.to_string()
'DATE_OF_BIRTH: timestamp[ns, tz=UTC]'
>>> date
datetime.datetime(1990, 1, 1, 0, 0, tzinfo=datetime.timezone.utc)
>>> dcast = pa.scalar(dt, type=pa.timestamp('ms', tz='UTC')
>>> dcast = pa.scalar(date, type=pa.timestamp('ms', tz='UTC'))
>>> filter = (ds.field('DATE_OF_BIRTH')>dcast)
>>> df = dataset_ts.to_table(filter=filter, columns=['DATE_OF_BIRTH'])
>>> df Emm I've try to use this in 9.0, seems it works, but in 12.0.1 it would failed. Let me checkout whats changed here |
Can you try like this to workaround first? |
( I guess it might related to b56b91e ) |
### Rationale for this change This patch ( #15180 ) adds a `SmallestTypeFor` to handling expression type. However, it lost timezone when handling. ### What changes are included in this PR? Add `timezone` in `SmallestTypeFor` ### Are these changes tested? Currently not ### Are there any user-facing changes? Yeah it's a bugfix * Closes: #37110 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Sorry just got back from vacation, do you still need more testing done on this? |
@brokenjacobs I've fixed this. but this is not in 13.0.0 release... |
…pache#37135) ### Rationale for this change This patch ( apache#15180 ) adds a `SmallestTypeFor` to handling expression type. However, it lost timezone when handling. ### What changes are included in this PR? Add `timezone` in `SmallestTypeFor` ### Are these changes tested? Currently not ### Are there any user-facing changes? Yeah it's a bugfix * Closes: apache#37110 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Describe the bug, including details regarding any error messages, version, and platform.
Pyarrow: 12.0.1
This is similar to #32366 but not exactly the same. I have a dataset where I am trying to filter by a column in my parquet files that is a timestamp with timezone type. Every way that I try to do the filter, pyarrow coerces the type to a non-timestamp type and fails to convert to a table. For example:
This was just my latest attempt to use an actual arrow scaler with timezone, and it still changes the type to a non timestamp type in the comparison. It also fails using a datetime in the filter directly.
Component(s)
Python
The text was updated successfully, but these errors were encountered: