Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10947: [Rust][DataFusion] Optimize UTF8 to Date32 Conversion #8943

Closed
wants to merge 1 commit into from

Conversation

seddonm1
Copy link
Contributor

After adding benchmarking capability to the UTF8 to Date32/Date64 CAST functions there was opportunity to improve the performance.

This PR uses inbuilt chrono functionality to calculate the number of days since CE then uses a constant to calculate the offset days relative to 1970-01-01. This improves performance around 10% for this operation relative to the since function presumably as chrono does not have to ensure the from_ymd is a valid date.

Before:

cast utf8 to date32 512 time:   [41.966 us 42.508 us 43.087 us]
cast utf8 to date32 512 time:   [40.591 us 40.661 us 40.740 us]
cast utf8 to date32 512 time:   [40.825 us 40.878 us 40.916 us]

After:

cast utf8 to date32 512 time:   [36.557 us 36.839 us 37.200 us]
cast utf8 to date32 512 time:   [35.997 us 36.442 us 36.919 us]
cast utf8 to date32 512 time:   [35.750 us 35.969 us 36.160 us]

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach! LGTM

@github-actions
Copy link

@codecov-io
Copy link

Codecov Report

Merging #8943 (639fb71) into master (71e37e2) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #8943   +/-   ##
=======================================
  Coverage   83.26%   83.27%           
=======================================
  Files         195      195           
  Lines       48066    48062    -4     
=======================================
- Hits        40024    40022    -2     
+ Misses       8042     8040    -2     
Impacted Files Coverage Δ
rust/arrow/src/compute/kernels/cast.rs 96.77% <100.00%> (-0.01%) ⬇️
rust/arrow/src/csv/reader.rs 94.33% <100.00%> (+0.36%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6cedab0...639fb71. Read the comment docs.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, @seddonm1 !

@seddonm1 seddonm1 deleted the utf8-date32-optimize branch January 4, 2021 22:47
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
After adding benchmarking capability to the UTF8 to Date32/Date64 CAST functions there was opportunity to improve the performance.

This PR uses inbuilt `chrono` functionality to calculate the number of days since CE then uses a constant to calculate the offset days relative to 1970-01-01. This improves performance around 10% for this operation relative to the `since` function presumably as `chrono` does not have to ensure the `from_ymd` is a valid date.

Before:
```
cast utf8 to date32 512 time:   [41.966 us 42.508 us 43.087 us]
cast utf8 to date32 512 time:   [40.591 us 40.661 us 40.740 us]
cast utf8 to date32 512 time:   [40.825 us 40.878 us 40.916 us]
```

After:
```
cast utf8 to date32 512 time:   [36.557 us 36.839 us 37.200 us]
cast utf8 to date32 512 time:   [35.997 us 36.442 us 36.919 us]
cast utf8 to date32 512 time:   [35.750 us 35.969 us 36.160 us]
```

Closes apache#8943 from seddonm1/utf8-date32-optimize

Authored-by: Mike Seddon <seddonm1@gmail.com>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants