Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for casting Duration/Interval to Int64Array #1196

Merged
merged 2 commits into from
Jan 19, 2022

Conversation

e-dard
Copy link
Contributor

@e-dard e-dard commented Jan 18, 2022

Which issue does this PR close?

Closes #685.

What changes are included in this PR?

This PR adds support for casting from all the Duration types to Int64Array. It also adds support for casting from the following Interval types:

  • Interval(IntervalUnit::YearMonth) (Native=i64)
  • Interval(IntervalUnit::DayTime) (Native=i32)

However, I punted on Interval(IntervalUnit::MonthDayNano), which has a native type i128. Attempting to cast between that and an Int64Array will result in a cast error.

I'm happy to change that behaviour if there is a preferable alternative.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 18, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @e-dard -- I want to sort out the CI failures before merging any more PRs, and I will do so tomorrow morning

match from_type{
IntervalUnit::YearMonth => true,
IntervalUnit::DayTime => true,
IntervalUnit::MonthDayNano => false, // Native type is i128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible way to support this would be to cast to Decimal which can represent the full type

Though that is not very ideal for various reasons either.

So for now, this looks good 👍

IntervalUnit::YearMonth => {
cast_numeric_arrays::<IntervalYearMonthType, Int64Type>(array)
}
IntervalUnit::DayTime => cast_array_data::<Int64Type>(array, to_type.clone()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this makes sense (to pass back the underlying data). The DayTime value is actually made up of two fields (days / milliseconds): https://github.com/apache/arrow/blob/master/format/Schema.fbs#L361-L365

I wonder if using one of the temporal kernels might make more sense for your usecase: https://docs.rs/arrow/7.0.0/arrow/compute/kernels/temporal/index.html#

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of the DayTime value I was thinking along the lines of this comment #685 (comment) (emit it as it is and then the user can optionally do another operation on it if they need to).

@codecov-commenter
Copy link

Codecov Report

Merging #1196 (69f25df) into master (d68c4ae) will increase coverage by 0.00%.
The diff coverage is 94.11%.

❗ Current head 69f25df differs from pull request most recent head 535b684. Consider uploading reports for the commit 535b684 to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1196   +/-   ##
=======================================
  Coverage   82.67%   82.68%           
=======================================
  Files         175      175           
  Lines       51561    51595   +34     
=======================================
+ Hits        42630    42661   +31     
- Misses       8931     8934    +3     
Impacted Files Coverage Δ
arrow/src/compute/kernels/cast.rs 95.05% <94.11%> (-0.02%) ⬇️
parquet_derive/src/parquet_field.rs 65.98% <0.00%> (-0.23%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d68c4ae...535b684. Read the comment docs.

@alamb alamb merged commit 799330b into apache:master Jan 19, 2022
@e-dard e-dard deleted the er/feat/cast_durations branch January 20, 2022 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement casting between duration/intervals and numbers
3 participants