ARROW-6718: [DRAFT] [Rust] Remove packed_simd #7037

nevi-me · 2020-04-24T21:11:39Z

This removes the dependency on packed_simd. I initially thought that boolean kernels were slower than with explicit SIMD, but this was a false alarm as the benchmarks weren't comparing SIMD vs non-SIMD.

While doing this, I noticed that the divide kernel appears to be unsound, as it checks if a null is 0 (which can be true when the default data behind the bitmask is 0).

Below is the performance comparison:

From 0.15.0 to 0.16.0

     Running target/release/deps/arithmetic_kernels-ba6ab3db9f184b40
add 512                 time:   [15.565 us 15.623 us 15.694 us]
                        change: [-66.359% -66.104% -65.861%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

add 512 simd            time:   [14.939 us 16.768 us 18.744 us]
                        change: [+1.4006% +6.0795% +11.131%] (p = 0.02 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) high mild
  8 (8.00%) high severe

subtract 512            time:   [15.659 us 15.727 us 15.799 us]
                        change: [-65.994% -65.847% -65.690%] (p = 0.00 < 0.05)
                        Performance has improved.

subtract 512 simd       time:   [14.003 us 14.119 us 14.284 us]
                        change: [-4.9276% -3.2446% -1.6479%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

multiply 512            time:   [15.774 us 15.824 us 15.875 us]
                        change: [-65.694% -65.526% -65.352%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

multiply 512 simd       time:   [14.299 us 14.458 us 14.681 us]
                        change: [-0.9771% -0.0444% +0.9882%] (p = 0.93 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

divide 512              time:   [16.690 us 16.731 us 16.774 us]
                        change: [-65.394% -65.012% -64.701%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

divide 512 simd         time:   [16.098 us 16.147 us 16.202 us]
                        change: [-3.6005% -2.6939% -1.9439%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

sum 512 no simd         time:   [7.1888 us 7.2836 us 7.4349 us]
                        change: [-1.2993% -0.2501% +1.2521%] (p = 0.73 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

limit 512, 256 no simd  time:   [6.8801 us 6.9257 us 6.9792 us]
                        change: [-3.8909% -2.7450% -1.6742%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

limit 512, 512 no simd  time:   [6.8552 us 6.9007 us 6.9552 us]
                        change: [-36.783% -31.294% -25.031%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

     Running target/release/deps/array_from_vec-9acb1269f64e7733
array_from_vec 128      time:   [418.62 ns 423.66 ns 430.30 ns]
                        change: [-2.2547% -0.6846% +0.9641%] (p = 0.48 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

array_from_vec 256      time:   [659.91 ns 661.68 ns 663.62 ns]
                        change: [-2.1474% -1.6329% -1.1820%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

array_from_vec 512      time:   [1.1200 us 1.1244 us 1.1304 us]
                        change: [-2.9911% -2.3466% -1.7654%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

     Running target/release/deps/boolean_kernels-25e7d12fe4fd7f63
and                     time:   [51.779 us 51.928 us 52.109 us]
                        change: [-0.4891% -0.0148% +0.4579%] (p = 0.95 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

and simd                time:   [10.417 us 10.561 us 10.831 us]
                        change: [-5.4340% -4.3339% -2.6810%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

or                      time:   [52.372 us 52.663 us 52.978 us]
                        change: [-1.0637% -0.3796% +0.3087%] (p = 0.30 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

or simd                 time:   [10.330 us 10.366 us 10.404 us]
                        change: [-9.4316% -7.8623% -6.4004%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

not                     time:   [28.368 us 28.506 us 28.684 us]
                        change: [-1.4424% -0.5625% +0.4723%] (p = 0.25 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

not simd                time:   [5.3160 us 5.3966 us 5.5020 us]
                        change: [-3.9861% -3.2280% -2.1942%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

     Running target/release/deps/builder-3c9f08ea07165746
bench_primitive         time:   [3.8598 ms 3.8751 ms 3.8926 ms]
                        thrpt:  [1.0035 GiB/s 1.0080 GiB/s 1.0120 GiB/s]
                 change:
                        time:   [-5.4645% -3.0955% -1.0229%] (p = 0.00 < 0.05)
                        thrpt:  [+1.0334% +3.1944% +5.7803%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

bench_bool              time:   [2.5218 ms 2.5568 ms 2.6091 ms]
                        thrpt:  [191.64 MiB/s 195.55 MiB/s 198.27 MiB/s]
                 change:
                        time:   [-4.0174% -3.2203% -2.2971%] (p = 0.00 < 0.05)
                        thrpt:  [+2.3511% +3.3275% +4.1855%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

     Running target/release/deps/cast_kernels-28d78edc8dd97880
cast int32 to int32 512 time:   [382.90 ns 385.84 ns 389.83 ns]
                        change: [+12.520% +19.250% +27.267%] (p = 0.00 < 0.05)
                        Performance has regressed.

cast int32 to uint32 512
                        time:   [14.323 us 14.362 us 14.403 us]
                        change: [-2.6982% -2.1982% -1.7082%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

cast int32 to float32 512
                        time:   [14.892 us 15.000 us 15.112 us]
                        change: [-0.1973% +0.3037% +0.8193%] (p = 0.26 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

cast int32 to float64 512
                        time:   [14.827 us 14.904 us 14.993 us]
                        change: [-3.4069% -2.2322% -1.1900%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

cast int32 to int64 512 time:   [14.756 us 14.803 us 14.852 us]
                        change: [-1.8245% -1.2044% -0.5979%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

cast float32 to int32 512
                        time:   [15.831 us 15.953 us 16.136 us]
                        change: [+1.2994% +2.0176% +2.9286%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

cast float64 to float32 512
                        time:   [15.355 us 15.443 us 15.534 us]
                        change: [-0.6370% +0.0148% +0.7769%] (p = 0.97 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

cast float64 to uint64 512
                        time:   [15.283 us 15.339 us 15.402 us]
                        change: [-6.0895% -4.3975% -2.8328%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

cast int64 to int32 512 time:   [14.008 us 14.053 us 14.102 us]
                        change: [-8.6791% -7.2588% -5.9678%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

cast date64 to date32 512
                        time:   [16.473 us 16.673 us 16.943 us]
                        change: [+0.6577% +1.4106% +2.2966%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

cast date32 to date64 512
                        time:   [16.043 us 16.125 us 16.211 us]
                        change: [-1.9078% -1.0437% -0.0086%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

cast time32s to time32ms 512
                        time:   [1.2209 us 1.2430 us 1.2806 us]
                        change: [-0.2161% +0.8401% +2.0102%] (p = 0.16 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

cast time32s to time64us 512
                        time:   [16.159 us 16.238 us 16.344 us]
                        change: [-2.0200% -1.3127% -0.5458%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

cast time64ns to time32s 512
                        time:   [18.420 us 18.485 us 18.558 us]
                        change: [-3.2611% -2.8053% -2.3354%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

cast timestamp_ns to timestamp_s 512
                        time:   [464.73 ns 465.98 ns 467.25 ns]
                        change: [+2.4127% +3.5905% +4.5861%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

cast timestamp_ms to timestamp_ns 512
                        time:   [1.8519 us 1.8637 us 1.8805 us]
                        change: [+1.8917% +2.6618% +3.4497%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

cast timestamp_ms to i64 512
                        time:   [620.77 ns 625.18 ns 632.26 ns]
                        change: [+0.3064% +1.3612% +2.6592%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

     Running target/release/deps/comparison_kernels-ac9079b90aba41c8
eq 512                  time:   [15.227 us 15.269 us 15.314 us]
                        change: [-65.188% -65.051% -64.916%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

eq 512 simd             time:   [16.285 us 16.382 us 16.503 us]
                        change: [-4.4614% -1.4725% +2.5033%] (p = 0.49 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

neq 512                 time:   [15.396 us 15.550 us 15.813 us]
                        change: [-67.464% -66.399% -65.402%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

neq 512 simd            time:   [16.248 us 16.348 us 16.477 us]
                        change: [-5.7272% -5.0473% -4.3194%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

lt 512                  time:   [15.544 us 15.617 us 15.705 us]
                        change: [-63.654% -63.364% -63.078%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

lt 512 simd             time:   [16.309 us 16.502 us 16.796 us]
                        change: [-7.1156% -5.4121% -3.7540%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

lt_eq 512               time:   [16.197 us 16.797 us 17.577 us]
                        change: [-62.842% -60.475% -57.947%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) high mild
  14 (14.00%) high severe

lt_eq 512 simd          time:   [16.391 us 16.549 us 16.755 us]
                        change: [-4.1794% -2.5540% -0.5409%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

gt 512                  time:   [15.320 us 15.386 us 15.469 us]
                        change: [-64.783% -64.475% -64.077%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

gt 512 simd             time:   [16.428 us 16.579 us 16.824 us]
                        change: [-5.8809% -4.9818% -4.0636%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

gt_eq 512               time:   [15.373 us 15.423 us 15.476 us]
                        change: [-65.439% -65.034% -64.706%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

gt_eq 512 simd          time:   [16.248 us 16.405 us 16.662 us]
                        change: [-7.7800% -5.5240% -3.7804%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

     Running target/release/deps/csv_writer-b937777743b12b28
record_batches_to_csv   time:   [183.57 us 193.17 us 204.93 us]
                        change: [-17.694% -4.8343% +8.9742%] (p = 0.51 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

     Running target/release/deps/take_kernels-f3cc4f1980a08edc
take u8 256             time:   [21.429 us 21.479 us 21.532 us]
                        change: [+4.0275% +4.5102% +5.0289%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

take u8 512             time:   [39.899 us 40.042 us 40.204 us]
                        change: [-1.1695% -0.6056% -0.0752%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

take u8 1024            time:   [79.301 us 79.561 us 79.828 us]
                        change: [-1.6327% -1.0495% -0.4431%] (p = 0.00 < 0.05)
                        Change within noise threshold.

take i32 256            time:   [21.631 us 21.722 us 21.818 us]
                        change: [+3.5975% +4.3668% +5.1918%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

take i32 512            time:   [41.232 us 41.427 us 41.642 us]
                        change: [-3.7463% -3.3208% -2.9106%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

take i32 1024           time:   [80.877 us 81.279 us 81.730 us]
                        change: [-5.3008% -4.6572% -3.9401%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

take bool 256           time:   [23.209 us 23.288 us 23.377 us]
                        change: [-3.4634% -2.9723% -2.4941%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

take bool 512           time:   [45.849 us 46.050 us 46.268 us]
                        change: [-2.0658% -1.3602% -0.7648%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

take bool 1024          time:   [90.650 us 91.065 us 91.501 us]
                        change: [-1.0199% -0.4763% +0.1308%] (p = 0.09 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

0.16.0 included the change that I made to autovectorize some compute kernels. This mainly resulted in non-SIMD kernels having a smaller performance gap (from 50-60% slower to 10-20% slower).

From 0.16.0 to no `packed_simd`

     Running target/release/deps/arithmetic_kernels-d263bafe1ecab93d
add 512                 time:   [16.502 us 16.676 us 16.925 us]
                        change: [+5.7488% +7.2699% +9.9557%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

add 512 simd            time:   [16.679 us 16.836 us 17.066 us]
                        change: [+4.7016% +10.619% +16.149%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

subtract 512            time:   [16.786 us 17.005 us 17.282 us]
                        change: [+6.8124% +7.9200% +9.2584%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  8 (8.00%) high mild
  5 (5.00%) high severe

subtract 512 simd       time:   [16.667 us 16.839 us 17.063 us]
                        change: [+17.579% +19.217% +20.952%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

multiply 512            time:   [17.637 us 19.687 us 21.976 us]
                        change: [+7.2304% +12.033% +19.261%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

multiply 512 simd       time:   [16.551 us 16.631 us 16.720 us]
                        change: [+14.169% +15.372% +16.456%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

divide 512              time:   [17.315 us 17.365 us 17.419 us]
                        change: [+3.4441% +3.9221% +4.4103%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

divide 512 simd         time:   [17.266 us 17.326 us 17.388 us]
                        change: [+6.7777% +7.2835% +7.7956%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

sum 512 no simd         time:   [8.5583 us 8.6700 us 8.8042 us]
                        change: [+17.027% +18.834% +20.593%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

limit 512, 256 no simd  time:   [7.3637 us 7.4095 us 7.4616 us]
                        change: [+5.2888% +6.4103% +7.5328%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

limit 512, 512 no simd  time:   [7.3472 us 7.3736 us 7.4017 us]
                        change: [+5.8054% +6.9044% +7.9027%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

     Running target/release/deps/array_from_vec-6e44aa2d195a3b96
array_from_vec 128      time:   [431.24 ns 433.22 ns 435.41 ns]
                        change: [-0.6088% +1.3213% +2.9253%] (p = 0.15 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

array_from_vec 256      time:   [681.05 ns 686.51 ns 694.58 ns]
                        change: [+2.8104% +3.3521% +3.9589%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

array_from_vec 512      time:   [1.1477 us 1.1523 us 1.1576 us]
                        change: [+1.8839% +2.3623% +2.8052%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

     Running target/release/deps/boolean_kernels-2fceb4e9cf7f69d5
and                     time:   [49.531 us 49.661 us 49.796 us]
                        change: [-5.3433% -4.8688% -4.3550%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

and simd                time:   [9.2264 us 9.4280 us 9.7238 us]
                        change: [-12.786% -10.898% -8.7337%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

or                      time:   [49.827 us 50.029 us 50.253 us]
                        change: [-5.2611% -4.5904% -3.9306%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

or simd                 time:   [9.1050 us 9.1360 us 9.1669 us]
                        change: [-13.413% -12.471% -11.459%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

not                     time:   [26.908 us 27.048 us 27.226 us]
                        change: [-5.7470% -4.5050% -3.0267%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

not simd                time:   [4.9087 us 4.9665 us 5.0441 us]
                        change: [-9.5757% -8.4957% -7.3691%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

     Running target/release/deps/builder-47e9dfab54e83426
Benchmarking bench_primitive: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 20.7s or reduce sample count to 30.
bench_primitive         time:   [4.0583 ms 4.0759 ms 4.0959 ms]
                        thrpt:  [976.58 MiB/s 981.38 MiB/s 985.64 MiB/s]
                 change:
                        time:   [+3.6774% +4.9561% +5.9081%] (p = 0.00 < 0.05)
                        thrpt:  [-5.5785% -4.7220% -3.5469%]
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  8 (8.00%) high mild
  1 (1.00%) high severe

Benchmarking bench_bool: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 13.3s or reduce sample count to 40.
bench_bool              time:   [2.6125 ms 2.6395 ms 2.6794 ms]
                        thrpt:  [186.61 MiB/s 189.43 MiB/s 191.39 MiB/s]
                 change:
                        time:   [+2.4849% +3.4680% +4.3829%] (p = 0.00 < 0.05)
                        thrpt:  [-4.1989% -3.3517% -2.4246%]
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

     Running target/release/deps/cast_kernels-c9770d72fe9b204b
cast int32 to int32 512 time:   [360.61 ns 363.39 ns 367.77 ns]
                        change: [-25.632% -21.210% -16.494%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

cast int32 to uint32 512
                        time:   [14.567 us 14.603 us 14.645 us]
                        change: [+0.9026% +1.3147% +1.7565%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

cast int32 to float32 512
                        time:   [14.972 us 15.117 us 15.275 us]
                        change: [+0.4208% +1.2079% +2.0879%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

cast int32 to float64 512
                        time:   [14.929 us 14.996 us 15.077 us]
                        change: [+0.4568% +0.9965% +1.5083%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

cast int32 to int64 512 time:   [14.880 us 14.920 us 14.961 us]
                        change: [-0.0546% +0.3857% +0.8089%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

cast float32 to int32 512
                        time:   [16.245 us 16.334 us 16.439 us]
                        change: [+1.8067% +2.7560% +3.6900%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

cast float64 to float32 512
                        time:   [15.802 us 15.852 us 15.905 us]
                        change: [+1.9591% +2.7809% +3.4604%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

cast float64 to uint64 512
                        time:   [16.293 us 16.333 us 16.374 us]
                        change: [+6.0229% +6.4724% +6.9067%] (p = 0.00 < 0.05)
                        Performance has regressed.

cast int64 to int32 512 time:   [14.526 us 14.591 us 14.668 us]
                        change: [+4.2376% +4.7904% +5.2952%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

cast date64 to date32 512
                        time:   [16.931 us 17.066 us 17.226 us]
                        change: [+1.0479% +2.0576% +2.9920%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

cast date32 to date64 512
                        time:   [16.583 us 16.648 us 16.713 us]
                        change: [+2.7951% +3.6554% +4.3566%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

cast time32s to time32ms 512
                        time:   [2.1082 us 2.1162 us 2.1253 us]
                        change: [+69.447% +71.604% +73.385%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

cast time32s to time64us 512
                        time:   [17.054 us 17.121 us 17.197 us]
                        change: [+5.1406% +5.8858% +6.5773%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

cast time64ns to time32s 512
                        time:   [19.334 us 19.544 us 19.791 us]
                        change: [+5.1059% +6.0233% +6.9969%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

cast timestamp_ns to timestamp_s 512
                        time:   [555.04 ns 560.31 ns 567.19 ns]
                        change: [+35.093% +43.293% +50.432%] (p = 0.00 < 0.05)
                        Performance has regressed.

cast timestamp_ms to timestamp_ns 512
                        time:   [2.4111 us 2.4192 us 2.4283 us]
                        change: [+29.614% +30.468% +31.257%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

cast timestamp_ms to i64 512
                        time:   [692.68 ns 695.43 ns 698.67 ns]
                        change: [+10.064% +11.531% +12.672%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

     Running target/release/deps/comparison_kernels-0133fee8f9747e38
eq 512                  time:   [17.686 us 17.944 us 18.310 us]
                        change: [+14.993% +15.993% +17.578%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

eq 512 simd             time:   [17.859 us 17.937 us 18.016 us]
                        change: [+2.0950% +6.1522% +9.2180%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

neq 512                 time:   [17.483 us 17.547 us 17.625 us]
                        change: [+12.430% +13.250% +13.942%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

neq 512 simd            time:   [17.413 us 17.522 us 17.678 us]
                        change: [+6.9202% +8.2899% +10.309%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

lt 512                  time:   [17.710 us 17.893 us 18.162 us]
                        change: [+13.270% +14.421% +15.629%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

lt 512 simd             time:   [17.684 us 17.790 us 17.921 us]
                        change: [+6.6510% +8.3058% +9.8628%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

lt_eq 512               time:   [17.597 us 17.660 us 17.728 us]
                        change: [-7.2662% -0.7635% +5.7716%] (p = 0.83 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

lt_eq 512 simd          time:   [17.625 us 17.713 us 17.819 us]
                        change: [+4.9823% +7.0228% +8.8962%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

gt 512                  time:   [17.603 us 17.890 us 18.252 us]
                        change: [+12.682% +14.159% +15.481%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

gt 512 simd             time:   [17.557 us 17.724 us 17.914 us]
                        change: [+5.7046% +6.8432% +8.2057%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

gt_eq 512               time:   [17.544 us 17.610 us 17.684 us]
                        change: [+14.041% +16.179% +19.229%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

gt_eq 512 simd          time:   [17.749 us 18.011 us 18.395 us]
                        change: [+8.8024% +10.461% +12.459%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

     Running target/release/deps/csv_writer-9387f0497783e820
record_batches_to_csv   time:   [179.16 us 189.32 us 202.58 us]
                        change: [-11.723% -3.6958% +5.6507%] (p = 0.42 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

     Running target/release/deps/take_kernels-0b0e071e2159c546
take u8 256             time:   [21.422 us 21.507 us 21.598 us]
                        change: [-0.4883% -0.0299% +0.4050%] (p = 0.88 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

take u8 512             time:   [41.737 us 41.850 us 41.971 us]
                        change: [+3.9654% +4.4028% +4.8348%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

take u8 1024            time:   [83.299 us 84.170 us 85.421 us]
                        change: [+5.2941% +6.1255% +7.3670%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

take i32 256            time:   [21.295 us 21.428 us 21.594 us]
                        change: [-3.7994% -2.8372% -1.8433%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

take i32 512            time:   [41.618 us 41.774 us 41.956 us]
                        change: [+0.1984% +0.7887% +1.4393%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

take i32 1024           time:   [83.318 us 85.386 us 87.753 us]
                        change: [+2.6187% +4.5585% +6.8482%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) high mild
  12 (12.00%) high severe

take bool 256           time:   [23.326 us 23.391 us 23.460 us]
                        change: [-0.1620% +0.3110% +0.7507%] (p = 0.19 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low severe

take bool 512           time:   [47.138 us 47.325 us 47.518 us]
                        change: [+3.1552% +3.8603% +4.7518%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

take bool 1024          time:   [91.152 us 91.459 us 91.783 us]
                        change: [-0.0296% +0.5257% +1.0275%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Arithmetic kernels are slower by up to 20%
Boolean kernels are faster by 5-10%
Comparison kernels are slower by up to 8% (ignore the non-simd ones)
Cast kernels regress by varying degrees, with a few functions being around 40% slower. When I wrote the cast kernels, I had to pick the faster options when casting temporal types, so I'd need to revisit these to fix the extreme perf drops.

Are the perf drops worth it?

I suppose it'll boil down to whether getting closer to stable Rust (without feature flags) is worth the slight performance drop.

Outstanding work to do

Remove some benchmarks that become redundant (SIMD vs non-SIMD)
Fix the divide by zero error
Tweak temporal casts to find faster options

github-actions · 2020-04-24T21:17:09Z

https://issues.apache.org/jira/browse/ARROW-6718

yordan-pavlov · 2020-04-26T21:46:00Z

removing packed_simd would also make this bug obsolete: https://issues.apache.org/jira/browse/ARROW-8598

paddyhoran · 2020-04-27T02:42:31Z

Thanks @nevi-me. I re-created everything locally and I don't see a reason to keep packed_simd in light of these results.

Also the future of packed_simd is unclear and it's one less dependency on nightly, now we just need specialization on stable...

yordan-pavlov · 2020-04-30T17:08:16Z

Hi,
I thought I would do some profiling yesterday (to help make sure packed_simd is not removed prematurely) and noticed that a lot of time in simd_compare_op is spent in this loop here: https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs#L236

        for i in 0..lanes {
            result.append(T::mask_get(&simd_result, i))?;
        }

I attempted to change this to use mask.bitmask().to_byte_slice() instead of mask_get;
this was looking promising performance-wise before I attempted to change the arrow code by adding

fn bitmask(mask: &Self::SimdMask) -> &[u8] {
                unsafe { let bits = mask.bitmask(); bits.to_byte_slice() }
            }

to the impl ArrowNumericType section of the make_numeric_type macro and changing simd_compare_op to

//...
// notice this has changed to a MutableBuffer
let mut result = MutableBuffer::new(left.len() * mem::size_of::<bool>());
for i in (0..left.len()).step_by(lanes) {
        let simd_left = T::load(left.value_slice(i, lanes));
        let simd_right = T::load(right.value_slice(i, lanes));
        let simd_result = op(simd_left, simd_right);
        // this line is added
        result.write(T::bitmask(&simd_result));
        // this is the old code commented out
        // for i in 0..lanes {
        //     result.append(T::mask_get(&simd_result, i))?;
        // }
    }
//...

~~but this doesn't compile because of error returns a value referencing data owned by the current function; I have been doing mostly C# recently so was hoping you might be able to help.~~
I got it to compile by changing to:

fn bitmask<F>(mask: &Self::SimdMask, mut action: F) where F: FnMut(&[u8]) {
                action(mask.bitmask().to_byte_slice());
            }

and

for i in (0..left.len()).step_by(lanes) {
        let simd_left = T::load(left.value_slice(i, lanes));
        let simd_right = T::load(right.value_slice(i, lanes));
        let simd_result = op(simd_left, simd_right);
        T::bitmask(&simd_result, |b| { result.write(b); });
        // for i in 0..lanes {
        //     result.append(T::mask_get(&simd_result, i))?;
        // }
    }

this led to a significant performance improvement:

filter with arrow SIMD
                        time:   [2.0026 ms 2.0091 ms 2.0172 ms]
                        change: [-70.479% -68.776% -67.260%] (p = 0.00 < 0.05)
                        Performance has improved.

@nevi-me @paddyhoran any thoughts how to improve this further?

sunchao · 2020-05-01T06:44:28Z

This definitely looks great from # of code deduction :D , but yeah it will be better if we can keep the perf loss minimum.

Also the future of packed_simd is unclear and it's one less dependency on nightly, now we just need specialization on stable...

I'm not seeing a light from this tunnel yet - maybe we could try to replace it with [min_specialization](https://github.com/rust-lang/rust/pull/68970) which seems more promising in terms of stabilization. But I found very little documentation on this so far.

Another way is to rewrite the code (I think it is mostly used in Parquet encoder/decoder and Arrow array builders?). It is tedious work but I think it should be do-able.

nevi-me · 2020-05-02T08:45:14Z

Hi @yordan-pavlov, we want to remove packed_simd due to the uncertainty with it being stabilised soon. We so far found that if we optimise some non-SIMD code, we don't lose a lot of performance relative to the explicit SIMD code.

It'd be great if we could see where we could further improve the non-SIMD functions

yordan-pavlov · 2020-05-02T09:58:46Z

hi @nevi-me , I would love to have arrow use stable rust. (but I also want high performance)

I have been working on some benchmarks which compare performance of filtering of in-memory data implemented in different ways, including for-loop, iterator, arrow without SIMD, arrow with SIMD, etc. I have also done pretty much the same benchmarks for .net core and rust.

I hope to find some time to publish those benchmarks in the next few days, but it does look like the rust compiler is very good in auto-vectorization. So far the best performance I have managed to produce with arrow and SIMD (357.53 us) is only about twice as fast as filtering using an iterator without arrow (654.22 us) but is about four times faster than filtering with arrow and loops (1.2704 ms) and many times faster than filtering with arrow using compute::no_simd_compare_op (8.4308 ms).

Yes, SIMD is a lot of work, but at the moment gives the best performance. I wonder if the SIMD features could be moved to another library separate from the core arrow library.

nevi-me · 2020-05-02T10:19:36Z

That's interesting, and how are other benchmarks affected, or are you only focusing on filter? To the extent that there's overlap with take(), it would be interesting to also see how that kernel is affected.

yordan-pavlov · 2020-05-02T12:39:13Z

@nevi-me I have only focused on filtering so far; I will probably implement benchmarks for other operations once I have fully explored filtering.

paddyhoran · 2020-05-02T17:07:01Z

Yes, SIMD is a lot of work, but at the moment gives the best performance. I wonder if the SIMD features could be moved to another library separate from the core arrow library.

@yordan-pavlov when you say SIMD you mean the "simd" feature (packed_simd), right?

yordan-pavlov · 2020-05-03T13:02:19Z

@yordan-pavlov when you say SIMD you mean the "simd" feature (packed_simd), right?

@paddyhoran yes that's what I meant, apologies if it was unclear

yordan-pavlov · 2020-05-04T21:31:28Z

hi @nevi-me ,
I have just published the filtering benchmark here:
https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs

I hope it will be useful in benchmarking and improving arrow performance.
In summary, my filtering benchmarks show that from a performance perspective, using arrow only makes sense together with SIMD. And this is only after making a couple of changes to improve performance and also enable comparison of an array with a scalar value. Using arrow without SIMD currently appears to be significantly slower than a much simpler loop or iterator implementation.

paddyhoran · 2020-05-06T18:56:09Z

So, I'm not sure now about removing SIMD altogether. It took me some time to add all the SIMD features but I don't pretend to be an expert on it. I'd hate to remove all the SIMD code only to realize that someone smarter than I could have improved it (like @yordan-pavlov).

I think we shouldn't use it by default because it requires nightly, #7057 changes it to be opt-in.

I would love to remove some of the code if it's not required though. @yordan-pavlov I think the best way to move this forward is if you can contribute to the benchmarks for the main arrow repo. Then we can look at removing the SIMD code piece by piece to ensure we don't have performance regressions.

Thoughts?

yordan-pavlov · 2020-05-06T20:12:14Z

I agree with @paddyhoran - if the goal is just to enable the use of arrow with stable Rust, it would be reasonable to just not enable the SIMD feature by default, but still keep it so it is available as a choice for those users who need the best performance possible.

A lot of work has gone into the SIMD feature already and it would be a shame to remove it prematurely, without doing enough benchmarking.

Furthermore, I think Rust could have a great future in the big data space and I think this project could play an important part. But SIMD is important in big data. So we should be looking to have SIMD stabilized (in Rust) rather than remove it. If SIMD is removed from arrow, what killer feature would motivate its stabilization in Rust?

For convenience here are the results from my filtering benchmarks:

Benchmark	Time
filter with loop	567.78 us
filter with iter	671.40 us
filter with arrow loop	1.2900 ms
filter with arrow NO SIMD	8.5939 ms
filter with arrow SIMD (array)	599.05 us
filter with arrow SIMD (scalar)	381.38 us

In the table above we can see that SIMD filtering (against scalar values) is 49% faster than a loop, and 76% faster than an iterator implementation. This could mean a difference between waiting 12h or 7h for a job to complete. So I think more benchmarking, profiling and performance improvements have to be done before it can be decided with confidence to remove SIMD (or not).

The source code for the benchmarks used to produce the results about is here:
https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs

I am happy to contribute benchmarks, I just have to figure out how / if they would fit in the main arrow repo.

nevi-me · 2020-05-06T20:36:58Z

I'm happy with turning SIMD off by default

We will be removing explicit SIMD in #7037, this PR removes it from the default features in preparation of removing it completely. Closes #7057 from paddyhoran/turn-off-simd Authored-by: Paddy Horan <paddyhoran@hotmail.com> Signed-off-by: Neville Dipale <nevilledips@gmail.com>

nevi-me requested review from sunchao, andygrove and paddyhoran April 24, 2020 21:11

nevi-me force-pushed the packed_simd branch from 1c4af2f to dfb6440 Compare April 24, 2020 21:15

paddyhoran added the Component: Rust label Apr 26, 2020

nevi-me mentioned this pull request Apr 27, 2020

ARROW-8598: [Rust] simd_compare_op creates buffer of incorrect length #7043

Closed

This was referenced Apr 28, 2020

ARROW-8617: [Rust] Avoid loading simd_load_set_invalid which doesn't exist on aarch64 #7049

Closed

ARROW-8616: [Rust] Turn explicit SIMD off by default #7057

Closed

nevi-me changed the title ~~ARROW-6718: [Rust] Remove packed_simd~~ ARROW-6718: [DRAFT] [Rust] Remove packed_simd May 2, 2020

nevi-me marked this pull request as ready for review May 2, 2020 08:45

nevi-me added 2 commits May 2, 2020 10:52

remove SIMD to assess impact on benchmarks

e8b91c2

use slow divide path if RHS has nulls

7fcd048

nevi-me force-pushed the packed_simd branch from 8401298 to 7fcd048 Compare May 2, 2020 08:52

nevi-me closed this May 6, 2020

asfimport mentioned this pull request Apr 26, 2021

[Rust] packed_simd requires nightly #23061

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-6718: [DRAFT] [Rust] Remove packed_simd #7037

ARROW-6718: [DRAFT] [Rust] Remove packed_simd #7037

nevi-me commented Apr 24, 2020 •

edited

github-actions bot commented Apr 24, 2020

yordan-pavlov commented Apr 26, 2020

paddyhoran commented Apr 27, 2020

yordan-pavlov commented Apr 30, 2020 •

edited

sunchao commented May 1, 2020

nevi-me commented May 2, 2020

yordan-pavlov commented May 2, 2020

nevi-me commented May 2, 2020

yordan-pavlov commented May 2, 2020

paddyhoran commented May 2, 2020

yordan-pavlov commented May 3, 2020 •

edited

yordan-pavlov commented May 4, 2020 •

edited

paddyhoran commented May 6, 2020

yordan-pavlov commented May 6, 2020 •

edited

nevi-me commented May 6, 2020 •

edited

ARROW-6718: [DRAFT] [Rust] Remove packed_simd #7037

ARROW-6718: [DRAFT] [Rust] Remove packed_simd #7037

Conversation

nevi-me commented Apr 24, 2020 • edited

Are the perf drops worth it?

Outstanding work to do

github-actions bot commented Apr 24, 2020

yordan-pavlov commented Apr 26, 2020

paddyhoran commented Apr 27, 2020

yordan-pavlov commented Apr 30, 2020 • edited

sunchao commented May 1, 2020

nevi-me commented May 2, 2020

yordan-pavlov commented May 2, 2020

nevi-me commented May 2, 2020

yordan-pavlov commented May 2, 2020

paddyhoran commented May 2, 2020

yordan-pavlov commented May 3, 2020 • edited

yordan-pavlov commented May 4, 2020 • edited

paddyhoran commented May 6, 2020

yordan-pavlov commented May 6, 2020 • edited

nevi-me commented May 6, 2020 • edited

nevi-me commented Apr 24, 2020 •

edited

yordan-pavlov commented Apr 30, 2020 •

edited

yordan-pavlov commented May 3, 2020 •

edited

yordan-pavlov commented May 4, 2020 •

edited

yordan-pavlov commented May 6, 2020 •

edited

nevi-me commented May 6, 2020 •

edited