-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Make casting timestamp and duration zero-copy when TimeUnit matches #34210
Comments
FWIW I found this comment about the behavior: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc#L156 |
Indeed, that seems to be due to not doing this a zero-copy cast because the data types are "different" (while only the unit matters, the timezone can be ignored for this operation). |
This would would definitely be nice to have. |
@rok This might be a silly question but why don't we dynamically dispatch to a zero copy / no op functions here if the units are the same? |
@icexelloss yeah, no op seems like the thing to do. I wonder if zero copy is possible or does the fact we're working with batches prevent that. @westonpace could you comment? |
I don't think batches should be a problem. It seems we only have the comment from @wesm to go on here. Given that it was made during a rather large refactor my guess is this is more of a "todo" and less of a "this is a concern". I think it should be pretty safe to zero copy (and it is concerning that we don't). |
Ok, I'll give it a shot. |
…meUnit matches (#34270) ### Rationale for this change Casting from e.g. `timestamp(s, "UTC")` to `timestamp(s)` could be a metadata only change, but is currently a multiplication operation. ### What changes are included in this PR? This change adds a zero-copy casting path for durations that have equal units and timestamps that have equal units and potentially different timezones. ### Are these changes tested? We test for correctness and zero-copy. ### Are there any user-facing changes? No. * Closes: #34210 Authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Rok Mihevc <rok@mihevc.org>
Describe the bug, including details regarding any error messages, version, and platform.
I am testing performance of casting datatypes with pyarrow Table and saw some unexpected performance.
In short, it seems that casting a column from "tz-naive" to "tz-utc" is much slower than casting from "tz-naive" to "int64", which is unexpected because I think both of these should be metadata-only change.
Here is a partial repo:
Component(s)
Python
The text was updated successfully, but these errors were encountered: