Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10030: [Rust] Add support for FromIter and IntoIter for primitive types #8211

Closed
wants to merge 7 commits into from
Closed

Conversation

jorgecarleitao
Copy link
Member

@jorgecarleitao jorgecarleitao commented Sep 17, 2020

This is the associated draft implementation of a draft proposal on ARROW-10030, with associated document here: https://docs.google.com/document/d/1d6rV1WmvIH6uW-bcHKrYBSyPddrpXH8Q4CtVfFHtI04/edit

Benchmarks where the builder was replaced by this from for the `cast`: +5% to -25% improvement over the builder (with the same memory utilization)

cast int32 to int32 512 time:   [25.493 ns 25.498 ns 25.505 ns]                                     
                        change: [+1.2840% +1.7037% +2.1989%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

cast int32 to uint32 512                                                                             
                        time:   [6.8259 us 6.8323 us 6.8390 us]
                        change: [+3.6216% +5.6472% +7.0263%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe

cast int32 to float32 512                                                                             
                        time:   [5.7733 us 5.7763 us 5.7795 us]
                        change: [-14.264% -13.671% -13.106%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe

cast int32 to float64 512                                                                             
                        time:   [5.2627 us 5.2667 us 5.2711 us]
                        change: [-23.365% -22.746% -22.147%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

cast int32 to int64 512 time:   [5.2307 us 5.2381 us 5.2438 us]                                     
                        change: [-21.490% -20.984% -20.509%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

cast float32 to int32 512                                                                             
                        time:   [6.2755 us 6.2794 us 6.2833 us]
                        change: [-10.606% -9.9724% -9.3770%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

cast float64 to float32 512                                                                             
                        time:   [6.1900 us 6.2032 us 6.2206 us]
                        change: [-14.505% -13.963% -13.416%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

cast float64 to uint64 512                                                                             
                        time:   [6.4047 us 6.4189 us 6.4347 us]
                        change: [-12.287% -11.651% -11.050%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

cast int64 to int32 512 time:   [6.3697 us 6.3913 us 6.4168 us]                                     
                        change: [+3.5998% +4.2837% +4.9649%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

cast date64 to date32 512                                                                             
                        time:   [8.4563 us 8.4631 us 8.4702 us]
                        change: [+0.2066% +0.9093% +1.6066%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

cast date32 to date64 512                                                                             
                        time:   [8.0482 us 8.0610 us 8.0823 us]
                        change: [+0.2428% +0.8807% +1.5657%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

cast time32s to time32ms 512                                                                             
                        time:   [2.2259 us 2.2294 us 2.2333 us]
                        change: [-2.3114% -1.6027% -0.7039%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

cast time32s to time64us 512                                                                             
                        time:   [7.8948 us 7.9060 us 7.9179 us]
                        change: [-27.022% -25.727% -24.361%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

cast time64ns to time32s 512                                                                             
                        time:   [11.546 us 11.558 us 11.571 us]
                        change: [-0.9659% -0.3235% +0.2847%] (p = 0.33 > 0.05)
                        No change in performance detected.
Found 19 outliers among 100 measurements (19.00%)
  2 (2.00%) low severe
  8 (8.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe

cast timestamp_ns to timestamp_s 512                                                                             
                        time:   [27.676 ns 27.690 ns 27.714 ns]
                        change: [+0.9641% +1.2917% +1.6195%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe

cast timestamp_ms to timestamp_ns 512                                                                             
                        time:   [2.8195 us 2.8238 us 2.8300 us]
                        change: [-0.4123% +0.7148% +1.6494%] (p = 0.19 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

cast timestamp_ms to i64 512                                                                            
                        time:   [398.49 ns 399.60 ns 400.91 ns]
                        change: [-6.2549% -5.5279% -4.7295%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  5 (5.00%) high mild
  12 (12.00%) high severe

@github-actions
Copy link

@jorgecarleitao jorgecarleitao changed the title ARROW-10030: [Rust] Add support for FromIter and IntoIter for primitive types [DRAFT] ARROW-10030: [Rust] Add support for FromIter and IntoIter for primitive types Sep 22, 2020
@jorgecarleitao jorgecarleitao marked this pull request as ready for review September 22, 2020 16:41
@nevi-me
Copy link
Contributor

nevi-me commented Sep 27, 2020

I'll take a look at this during the week

@alamb
Copy link
Contributor

alamb commented Sep 30, 2020

As @jorgecarleitao mentions on #8303 (comment) -- his goal in adding this functionality is to make it easier to improve the performance of common operations (by avoiding an intermediate Vec which is then just copied into a raw buffer)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks pretty good to me. I don't feel I know the code well enough to approve the PR, but I think it looks like a good step forward to me

rust/arrow/src/array/array.rs Outdated Show resolved Hide resolved
rust/arrow/src/compute/kernels/cast.rs Show resolved Hide resolved
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, thanks

@jorgecarleitao jorgecarleitao deleted the iterator branch October 7, 2020 16:44
emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 16, 2020
…mitive types

This is the associated draft implementation of a draft proposal on ARROW-10030, with associated document here: https://docs.google.com/document/d/1d6rV1WmvIH6uW-bcHKrYBSyPddrpXH8Q4CtVfFHtI04/edit

<details><summary>Benchmarks where the builder was replaced by this from for the `cast`: +5% to -25% improvement over the builder (with the same memory utilization)</summary>
<p>

```
cast int32 to int32 512 time:   [25.493 ns 25.498 ns 25.505 ns]
                        change: [+1.2840% +1.7037% +2.1989%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

cast int32 to uint32 512
                        time:   [6.8259 us 6.8323 us 6.8390 us]
                        change: [+3.6216% +5.6472% +7.0263%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe

cast int32 to float32 512
                        time:   [5.7733 us 5.7763 us 5.7795 us]
                        change: [-14.264% -13.671% -13.106%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe

cast int32 to float64 512
                        time:   [5.2627 us 5.2667 us 5.2711 us]
                        change: [-23.365% -22.746% -22.147%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

cast int32 to int64 512 time:   [5.2307 us 5.2381 us 5.2438 us]
                        change: [-21.490% -20.984% -20.509%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

cast float32 to int32 512
                        time:   [6.2755 us 6.2794 us 6.2833 us]
                        change: [-10.606% -9.9724% -9.3770%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

cast float64 to float32 512
                        time:   [6.1900 us 6.2032 us 6.2206 us]
                        change: [-14.505% -13.963% -13.416%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

cast float64 to uint64 512
                        time:   [6.4047 us 6.4189 us 6.4347 us]
                        change: [-12.287% -11.651% -11.050%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

cast int64 to int32 512 time:   [6.3697 us 6.3913 us 6.4168 us]
                        change: [+3.5998% +4.2837% +4.9649%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

cast date64 to date32 512
                        time:   [8.4563 us 8.4631 us 8.4702 us]
                        change: [+0.2066% +0.9093% +1.6066%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

cast date32 to date64 512
                        time:   [8.0482 us 8.0610 us 8.0823 us]
                        change: [+0.2428% +0.8807% +1.5657%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

cast time32s to time32ms 512
                        time:   [2.2259 us 2.2294 us 2.2333 us]
                        change: [-2.3114% -1.6027% -0.7039%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

cast time32s to time64us 512
                        time:   [7.8948 us 7.9060 us 7.9179 us]
                        change: [-27.022% -25.727% -24.361%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

cast time64ns to time32s 512
                        time:   [11.546 us 11.558 us 11.571 us]
                        change: [-0.9659% -0.3235% +0.2847%] (p = 0.33 > 0.05)
                        No change in performance detected.
Found 19 outliers among 100 measurements (19.00%)
  2 (2.00%) low severe
  8 (8.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe

cast timestamp_ns to timestamp_s 512
                        time:   [27.676 ns 27.690 ns 27.714 ns]
                        change: [+0.9641% +1.2917% +1.6195%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe

cast timestamp_ms to timestamp_ns 512
                        time:   [2.8195 us 2.8238 us 2.8300 us]
                        change: [-0.4123% +0.7148% +1.6494%] (p = 0.19 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

cast timestamp_ms to i64 512
                        time:   [398.49 ns 399.60 ns 400.91 ns]
                        change: [-6.2549% -5.5279% -4.7295%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  5 (5.00%) high mild
  12 (12.00%) high severe
```

</p>
</details>

Closes apache#8211 from jorgecarleitao/iterator

Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…mitive types

This is the associated draft implementation of a draft proposal on ARROW-10030, with associated document here: https://docs.google.com/document/d/1d6rV1WmvIH6uW-bcHKrYBSyPddrpXH8Q4CtVfFHtI04/edit

<details><summary>Benchmarks where the builder was replaced by this from for the `cast`: +5% to -25% improvement over the builder (with the same memory utilization)</summary>
<p>

```
cast int32 to int32 512 time:   [25.493 ns 25.498 ns 25.505 ns]
                        change: [+1.2840% +1.7037% +2.1989%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

cast int32 to uint32 512
                        time:   [6.8259 us 6.8323 us 6.8390 us]
                        change: [+3.6216% +5.6472% +7.0263%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe

cast int32 to float32 512
                        time:   [5.7733 us 5.7763 us 5.7795 us]
                        change: [-14.264% -13.671% -13.106%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe

cast int32 to float64 512
                        time:   [5.2627 us 5.2667 us 5.2711 us]
                        change: [-23.365% -22.746% -22.147%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

cast int32 to int64 512 time:   [5.2307 us 5.2381 us 5.2438 us]
                        change: [-21.490% -20.984% -20.509%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

cast float32 to int32 512
                        time:   [6.2755 us 6.2794 us 6.2833 us]
                        change: [-10.606% -9.9724% -9.3770%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

cast float64 to float32 512
                        time:   [6.1900 us 6.2032 us 6.2206 us]
                        change: [-14.505% -13.963% -13.416%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

cast float64 to uint64 512
                        time:   [6.4047 us 6.4189 us 6.4347 us]
                        change: [-12.287% -11.651% -11.050%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

cast int64 to int32 512 time:   [6.3697 us 6.3913 us 6.4168 us]
                        change: [+3.5998% +4.2837% +4.9649%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

cast date64 to date32 512
                        time:   [8.4563 us 8.4631 us 8.4702 us]
                        change: [+0.2066% +0.9093% +1.6066%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

cast date32 to date64 512
                        time:   [8.0482 us 8.0610 us 8.0823 us]
                        change: [+0.2428% +0.8807% +1.5657%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

cast time32s to time32ms 512
                        time:   [2.2259 us 2.2294 us 2.2333 us]
                        change: [-2.3114% -1.6027% -0.7039%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

cast time32s to time64us 512
                        time:   [7.8948 us 7.9060 us 7.9179 us]
                        change: [-27.022% -25.727% -24.361%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

cast time64ns to time32s 512
                        time:   [11.546 us 11.558 us 11.571 us]
                        change: [-0.9659% -0.3235% +0.2847%] (p = 0.33 > 0.05)
                        No change in performance detected.
Found 19 outliers among 100 measurements (19.00%)
  2 (2.00%) low severe
  8 (8.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe

cast timestamp_ns to timestamp_s 512
                        time:   [27.676 ns 27.690 ns 27.714 ns]
                        change: [+0.9641% +1.2917% +1.6195%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe

cast timestamp_ms to timestamp_ns 512
                        time:   [2.8195 us 2.8238 us 2.8300 us]
                        change: [-0.4123% +0.7148% +1.6494%] (p = 0.19 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

cast timestamp_ms to i64 512
                        time:   [398.49 ns 399.60 ns 400.91 ns]
                        change: [-6.2549% -5.5279% -4.7295%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  5 (5.00%) high mild
  12 (12.00%) high severe
```

</p>
</details>

Closes apache#8211 from jorgecarleitao/iterator

Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants