ARROW-10914: [Rust] Refactor simd arithmetic kernels to use chunked iteration #8929

jhorstmann · 2020-12-15T17:23:47Z

This should avoid out of bounds reads in those kernels and also removes some unsafe blocks. The performance actually seems to increase slightly, I will post the numbers later.

The division kernel looks a bit more complex now, but follows the same pattern as used in the simd aggregation functions, with two separate codepaths for the nullable/non-null cases.

The comparison kernels might also do some out of bounds reads, I will look at those in a separate ticket/PR.

… and avoid out of bounds reads

github-actions · 2020-12-15T17:43:17Z

https://issues.apache.org/jira/browse/ARROW-10914

jhorstmann · 2020-12-16T10:27:41Z

I'm seeing bigger variations on the benchmark numbers between runs, the results below between current master and this pr are from one of the faster runs:

~/arrow/rust/arrow$ cargo +nightly-2020-11-24 bench --features simd --bench arithmetic_kernels
   Compiling arrow v3.0.0-SNAPSHOT (/home/ubuntu/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 39s
     Running /home/ubuntu/arrow/rust/target/release/deps/arithmetic_kernels-4b00176b1c415f0c
Gnuplot not found, using plotters backend
add 512                 time:   [319.21 ns 319.38 ns 319.52 ns]                    
                        change: [-24.243% -24.193% -24.140%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  5 (5.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

subtract 512            time:   [387.54 ns 387.86 ns 388.40 ns]                         
                        change: [-19.909% -19.766% -19.578%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

multiply 512            time:   [515.01 ns 517.68 ns 522.09 ns]                          
                        change: [-9.9324% -9.6166% -9.1766%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

divide 512              time:   [430.74 ns 430.80 ns 430.86 ns]                       
                        change: [-41.347% -41.234% -41.143%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  3 (3.00%) high severe

limit 512, 512          time:   [107.84 ns 107.94 ns 108.12 ns]                           
                        change: [+0.6778% +0.9025% +1.2602%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

add_nulls_512           time:   [383.53 ns 383.61 ns 383.69 ns]                          
                        change: [-20.450% -20.350% -20.270%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

divide_nulls_512        time:   [543.48 ns 543.96 ns 544.84 ns]                              
                        change: [-23.819% -23.700% -23.587%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

jorgecarleitao

Thanks a lot, @jhorstmann !

I did not go through all the details of this PR, but what I read looks really good and makes the code much easier to read. So, 👍

…teration This should avoid out of bounds reads in those kernels and also removes some unsafe blocks. The performance actually seems to increase slightly, I will post the numbers later. The division kernel looks a bit more complex now, but follows the same pattern as used in the simd aggregation functions, with two separate codepaths for the nullable/non-null cases. The comparison kernels might also do some out of bounds reads, I will look at those in a separate ticket/PR. Closes apache#8929 from jhorstmann/ARROW-10914-simd-arithmetic-read-out-of-bounds Authored-by: Jörn Horstmann <joern.horstmann@signavio.com> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>

ARROW-10914 Refactor simd arithmetic kernels to use chunked iteration…

1a9e7fc

… and avoid out of bounds reads

github-actions bot added the Component: Rust label Dec 15, 2020

jhorstmann added 2 commits December 15, 2020 20:15

ARROW-10914 Extract helper functions to simplify simd divide kernel

262e776

ARROW-10914 Clippy fixes

9c64537

jhorstmann marked this pull request as ready for review December 15, 2020 19:43

alamb mentioned this pull request Dec 15, 2020

[Rust] [Experiment] Vec<u8> vs current allocations #8796

Closed

ARROW-10914 Clippy fixes

35515a8

jorgecarleitao approved these changes Dec 18, 2020

View reviewed changes

jorgecarleitao closed this in 091df20 Dec 19, 2020

asfimport mentioned this pull request Dec 19, 2020

[Rust]: SIMD implementation of arithmetic kernels reads out of bounds #26844

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-10914: [Rust] Refactor simd arithmetic kernels to use chunked iteration #8929

ARROW-10914: [Rust] Refactor simd arithmetic kernels to use chunked iteration #8929

jhorstmann commented Dec 15, 2020

github-actions bot commented Dec 15, 2020

jhorstmann commented Dec 16, 2020

jorgecarleitao left a comment

ARROW-10914: [Rust] Refactor simd arithmetic kernels to use chunked iteration #8929

ARROW-10914: [Rust] Refactor simd arithmetic kernels to use chunked iteration #8929

Conversation

jhorstmann commented Dec 15, 2020

github-actions bot commented Dec 15, 2020

jhorstmann commented Dec 16, 2020

jorgecarleitao left a comment

Choose a reason for hiding this comment