Datum based comparison kernels (#4596) #4701

tustvold · 2023-08-15T17:17:49Z

Which issue does this PR close?

Rationale for this change

Adds datum based comparison kernels, deprecating the old kernels and drastically reducing the amount of generated code.

Benchmarks show no major performance regressions, and if anything non-trivial performance improvements.

Additionally a release build on master with dyn_cmp_dict takes

________________________________________________________
Executed in   39.60 secs    fish           external
   usr time  128.19 secs  583.00 micros  128.19 secs
   sys time    2.71 secs  102.00 micros    2.71 secs

But with the changes in this PR takes

________________________________________________________
Executed in   15.87 secs    fish           external
   usr time   60.84 secs    0.00 micros   60.84 secs
   sys time    2.28 secs  643.00 micros    2.28 secs

What changes are included in this PR?

Are there any user-facing changes?

tustvold · 2023-08-15T17:19:52Z

arrow-ord/Cargo.toml

-
-[features]
-dyn_cmp_dict = []
-simd = ["arrow-array/simd"]


The SIMD variants added a non-trivial amount of code complexity, and behaved differently. I opted to simply remove them, we can always add back some sort of SIMD specialization for primitives in future should users feel strongly

For other kernels as I recall we found properly written rust code would be vectorized by llvm more effectively than our hand rolled simd kernels. Do you think that is still the case?

cc @jhorstmann

It is empirically not the case here, LLVM is really bad at optimising horizontal operations such as creating a packed bitmask from a comparison. The LLVM generated kernels are ~2x slower

correct, though I feel like llvm has gotten a little better at this. on the other hand, the simd kernels also did not generate "perfect" code due to combining multiple masks into a single u64 (except for u8 comparisons which have 64 lanes).

I think the tradeoff against improved compile time is ok, and longer term this allows more code cleanups. The decimal types for example already had a quite hacky support for simd. The last simd usages then would be the aggregation kernels.

tustvold · 2023-08-15T17:21:01Z

arrow-ord/src/cmp.rs

+    Ok(BooleanArray::new(values, nulls))
+}
+
+fn values(a: &dyn Array) -> (Option<Vec<usize>>, &dyn Array) {


This formulation not only allows for mixed dictionary and non-dictionary arrays (as before), but also allows for mixed dictionary key sizes.

The cost of hydrating the keys in this way is irrelevant compared to the execution time of the kernels, and avoids a huge amount of codegen

tustvold · 2023-08-15T18:11:11Z

Benchmarks

eq Float32              time:   [7.6371 µs 7.6427 µs 7.6492 µs]
                        change: [-17.269% -17.067% -16.928%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

eq scalar Float32       time:   [5.2425 µs 5.2445 µs 5.2466 µs]
                        change: [-2.5786% -2.3116% -2.0499%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

neq Float32             time:   [7.6537 µs 7.6596 µs 7.6663 µs]
                        change: [-25.233% -25.014% -24.801%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe

neq scalar Float32      time:   [5.2445 µs 5.2465 µs 5.2487 µs]
                        change: [-35.310% -35.116% -34.928%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe

lt Float32              time:   [15.896 µs 15.905 µs 15.916 µs]
                        change: [-0.6139% -0.3269% -0.0448%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe

lt scalar Float32       time:   [9.8077 µs 9.8141 µs 9.8213 µs]
                        change: [+0.1201% +0.3694% +0.5205%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

lt_eq Float32           time:   [15.904 µs 15.913 µs 15.923 µs]
                        change: [-10.285% -10.072% -9.9334%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe

lt_eq scalar Float32    time:   [9.8315 µs 9.8352 µs 9.8395 µs]
                        change: [-21.082% -21.007% -20.928%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

gt Float32              time:   [16.006 µs 16.070 µs 16.157 µs]
                        change: [+4.4889% +5.8947% +7.4152%] (p = 0.00 < 0.05)
                        Performance has regressed.

gt scalar Float32       time:   [9.8282 µs 9.8316 µs 9.8356 µs]
                        change: [+0.3787% +0.5923% +0.7215%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe

gt_eq Float32           time:   [15.906 µs 15.913 µs 15.922 µs]
                        change: [-9.9407% -9.8782% -9.8165%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

gt_eq scalar Float32    time:   [9.7886 µs 9.7941 µs 9.7995 µs]
                        change: [-21.558% -21.436% -21.244%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

eq Int32                time:   [7.6285 µs 7.6364 µs 7.6451 µs]
                        change: [-13.069% -12.808% -12.552%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

eq scalar Int32         time:   [5.1502 µs 5.1527 µs 5.1559 µs]
                        change: [-4.3827% -4.2114% -3.9813%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

neq Int32               time:   [7.5440 µs 7.5503 µs 7.5576 µs]
                        change: [-20.459% -20.251% -20.113%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  4 (4.00%) low mild
  6 (6.00%) high mild
  9 (9.00%) high severe

neq scalar Int32        time:   [5.1496 µs 5.1521 µs 5.1550 µs]
                        change: [-36.346% -36.169% -35.980%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  8 (8.00%) high mild
  9 (9.00%) high severe

lt Int32                time:   [7.6004 µs 7.6047 µs 7.6095 µs]
                        change: [-19.626% -18.502% -17.323%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe

lt scalar Int32         time:   [5.1520 µs 5.1535 µs 5.1553 µs]
                        change: [-4.4893% -4.2218% -3.9580%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

lt_eq Int32             time:   [7.5444 µs 7.5487 µs 7.5541 µs]
                        change: [-21.125% -20.891% -20.683%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

lt_eq scalar Int32      time:   [5.1585 µs 5.1608 µs 5.1636 µs]
                        change: [-35.574% -35.425% -35.323%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

gt Int32                time:   [7.5334 µs 7.5385 µs 7.5443 µs]
                        change: [-14.111% -13.915% -13.789%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) low severe
  4 (4.00%) high mild
  6 (6.00%) high severe

gt scalar Int32         time:   [5.2320 µs 5.2342 µs 5.2366 µs]
                        change: [-2.7473% -2.3978% -2.0629%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  9 (9.00%) high mild
  5 (5.00%) high severe

gt_eq Int32             time:   [7.5977 µs 7.6027 µs 7.6083 µs]
                        change: [-22.831% -22.594% -22.373%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

gt_eq scalar Int32      time:   [5.1495 µs 5.1518 µs 5.1545 µs]
                        change: [-36.404% -36.230% -36.052%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

eq MonthDayNano         time:   [76.988 µs 77.890 µs 78.540 µs]
                        change: [+9.8624% +11.921% +13.779%] (p = 0.00 < 0.05)
                        Performance has regressed.

eq scalar MonthDayNano  time:   [53.423 µs 53.451 µs 53.479 µs]
                        change: [+0.1588% +0.4683% +0.7225%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

eq_dyn_utf8_scalar dictionary[10] string[4])
                        time:   [67.637 µs 67.668 µs 67.702 µs]
                        change: [+27.334% +27.585% +28.005%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

gt_eq_dyn_utf8_scalar scalar dictionary[10] string[4])
                        time:   [132.88 µs 132.92 µs 132.96 µs]
                        change: [+23.659% +23.938% +24.104%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

eq dictionary[10] string[4])
                        time:   [352.91 µs 353.15 µs 353.44 µs]
                        change: [-16.932% -16.725% -16.514%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

tustvold · 2023-08-16T09:15:21Z

arrow-ord/src/cmp.rs

+    }
+}
+
+trait Dictionary: Array {


I could see us making this public, perhaps with some additional methods, so that it can form the basic pattern for "how to handle dictionaries". FYI @alamb

I agree -- methods like "apply unary function to values, returning a Dictionary of the same type" would be very helpful. Given the right documentation I think this would be very helpful and less confusing

One thought I had is if there is a more general formulation that might be helpful (for example, for DictionaryArrays REE Arrays, and maybe StringViewArrays), although maybe they just could have their own functions

I don't have enough experience with REE Arrays to want to propose an abstraction for them, I honestly don't really know how to handle them efficiently...

StringViewArrays shouldn't require any additional logic as it isn't a "nested" type

tustvold · 2023-08-16T10:03:40Z

Latest benchmarks

eq Float32              time:   [7.6360 µs 7.6405 µs 7.6457 µs]
                        change: [-17.240% -16.996% -16.750%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) high mild
  12 (12.00%) high severe

eq scalar Float32       time:   [5.2382 µs 5.2407 µs 5.2433 µs]
                        change: [-2.6651% -2.3977% -2.1336%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

neq Float32             time:   [7.3749 µs 7.3801 µs 7.3862 µs]
                        change: [-27.961% -27.771% -27.616%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe

neq scalar Float32      time:   [5.2315 µs 5.2346 µs 5.2384 µs]
                        change: [-35.454% -35.241% -34.988%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe

lt Float32              time:   [15.455 µs 15.580 µs 15.681 µs]
                        change: [-5.7116% -4.9659% -4.2590%] (p = 0.00 < 0.05)
                        Performance has improved.

lt scalar Float32       time:   [9.8461 µs 9.8686 µs 9.8992 µs]
                        change: [+4.0063% +5.2827% +6.7502%] (p = 0.00 < 0.05)
                        Performance has regressed.

lt_eq Float32           time:   [15.896 µs 15.905 µs 15.915 µs]
                        change: [-10.347% -10.136% -10.003%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

lt_eq scalar Float32    time:   [9.7845 µs 9.7888 µs 9.7932 µs]
                        change: [-21.407% -21.254% -21.051%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  11 (11.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe

gt Float32              time:   [15.894 µs 15.905 µs 15.917 µs]
                        change: [-0.3861% -0.0659% +0.2289%] (p = 0.72 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

gt scalar Float32       time:   [9.7929 µs 9.7982 µs 9.8039 µs]
                        change: [+0.1360% +0.4044% +0.6769%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

gt_eq Float32           time:   [15.868 µs 15.883 µs 15.901 µs]
                        change: [-9.9332% -9.7737% -9.5779%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 26 outliers among 100 measurements (26.00%)
  12 (12.00%) low mild
  4 (4.00%) high mild
  10 (10.00%) high severe

gt_eq scalar Float32    time:   [9.8059 µs 9.8120 µs 9.8192 µs]
                        change: [-21.359% -21.226% -21.019%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

eq Int32                time:   [7.6663 µs 7.6775 µs 7.6896 µs]
                        change: [-12.682% -12.410% -12.111%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe

eq scalar Int32         time:   [5.1598 µs 5.1615 µs 5.1638 µs]
                        change: [-4.2256% -4.0547% -3.8551%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

neq Int32               time:   [7.5694 µs 7.5728 µs 7.5769 µs]
                        change: [-20.136% -19.885% -19.648%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) high mild
  11 (11.00%) high severe

neq scalar Int32        time:   [5.1674 µs 5.1699 µs 5.1726 µs]
                        change: [-36.189% -36.007% -35.816%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

lt Int32                time:   [7.6795 µs 7.6866 µs 7.6961 µs]
                        change: [-18.855% -17.713% -16.550%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

lt scalar Int32         time:   [5.1625 µs 5.1655 µs 5.1691 µs]
                        change: [-4.2787% -4.0476% -3.8997%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  10 (10.00%) high severe

lt_eq Int32             time:   [7.5761 µs 7.5806 µs 7.5861 µs]
                        change: [-20.731% -20.486% -20.235%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) high mild
  12 (12.00%) high severe

lt_eq scalar Int32      time:   [5.2456 µs 5.2493 µs 5.2536 µs]
                        change: [-34.425% -34.242% -34.062%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe

gt Int32                time:   [7.5842 µs 7.5915 µs 7.6007 µs]
                        change: [-11.722% -10.713% -9.6121%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 22 outliers among 100 measurements (22.00%)
  1 (1.00%) high mild
  21 (21.00%) high severe

gt scalar Int32         time:   [5.1702 µs 5.1726 µs 5.1753 µs]
                        change: [-3.9266% -3.6285% -3.3424%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  12 (12.00%) high severe

gt_eq Int32             time:   [7.6899 µs 7.6971 µs 7.7053 µs]
                        change: [-21.943% -21.716% -21.463%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

gt_eq scalar Int32      time:   [5.1667 µs 5.1693 µs 5.1722 µs]
                        change: [-36.230% -36.051% -35.867%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  9 (9.00%) high severe

eq MonthDayNano         time:   [63.182 µs 63.378 µs 63.600 µs]
                        change: [-4.9198% -4.2991% -3.5572%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

eq scalar MonthDayNano  time:   [54.143 µs 54.196 µs 54.262 µs]
                        change: [+2.4635% +2.9231% +3.3797%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

eq_dyn_utf8_scalar dictionary[10] string[4])
                        time:   [53.156 µs 53.176 µs 53.197 µs]
                        change: [-0.0663% +0.1054% +0.3198%] (p = 0.37 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

gt_eq_dyn_utf8_scalar scalar dictionary[10] string[4])
                        time:   [108.95 µs 108.98 µs 109.02 µs]
                        change: [+1.3318% +1.6177% +1.9018%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

eq dictionary[10] string[4])
                        time:   [388.52 µs 388.82 µs 389.12 µs]
                        change: [-8.7351% -8.5458% -8.4271%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

alamb · 2023-08-16T10:10:23Z

These are very neat results. To what do you attribute the performance improvements like the following?

...
gt Int32
                        time:   [7.5842 µs 7.5915 µs 7.6007 µs]
                        change: [-11.722% -10.713% -9.6121%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 22 outliers among 100 measurements (22.00%)
  1 (1.00%) high mild
  21 (21.00%) high severe
...
eq dictionary[10] string[4])
                        time:   [388.52 µs 388.82 µs 389.12 µs]
                        change: [-8.7351% -8.5458% -8.4271%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

tustvold · 2023-08-16T10:15:12Z

For the case of dictionaries it is likely because the process of normalizing the keys allows for unchecked indexing, which in turn helps LLVM optimise the code properly as there isn't a branch in the main loop.

For the case of primitives I don't have a very good intuition, however, these kernels historically have been extremely sensitive to inlining. It is possible that collocating more of the code allows LLVM to better optimise the code... I haven't looked in too much detail. It could equally be the inline(never) preventing it from inlining too much and blowing its ability to analyze the code.

tustvold · 2023-08-16T10:24:41Z

Ran the benchmarks with the pre-existing SIMD kernels, and they are roughly twice as fast, although have different semantics for floats. As written this PR will therefore represent a performance regression for users with the SIMD feature enabled. I can look to bring back the SIMD versions if people feel strongly, although my preference would be to avoid this complexity

alamb

I read through this PR's code and tests carefully, and I really like the new formulation -- very nicely done @tustvold 👏

For other reviewers, the size of the diff might obscure the fact that the core logic of all the comparisons is now combined into several relatively easy to read generic functions that are all shared.

My biggest comment / suggestion is to make it easier to create Scalars from rust types now that Scalar plays much more prominently in the kernel API

I do wonder if some of the compilation speedup is that arrow-rs no longer instantiates all the code for the different comparisons unless it is actually called. Thus the benefits for downstream application binary size may not be as significant (but certainly it seems like an improvement none the less)

cc @viirya and @jhorstmann

alamb · 2023-08-16T10:13:26Z

arrow-ord/Cargo.toml

@@ -44,10 +44,3 @@ half = { version = "2.1", default-features = false, features = ["num-traits"] }

 [dev-dependencies]
 rand = { version = "0.8", default-features = false, features = ["std", "std_rng"] }
-
-[package.metadata.docs.rs]
-features = ["dyn_cmp_dict"]


Is this an API change? (if someone was using arrow-ord directly)

I see that dyn_cmp_dict is still used in arrow / arrow-string

alamb · 2023-08-16T10:13:33Z

arrow-ord/Cargo.toml

-
-[features]
-dyn_cmp_dict = []
-simd = ["arrow-array/simd"]


For other kernels as I recall we found properly written rust code would be vectorized by llvm more effectively than our hand rolled simd kernels. Do you think that is still the case?

cc @jhorstmann

alamb · 2023-08-16T10:19:46Z

arrow-flight/src/sql/metadata/db_schemas.rs

@@ -129,7 +129,8 @@ impl GetDbSchemasBuilder {
        }

        if let Some(catalog_filter_name) = catalog_filter {
-            filters.push(eq_utf8_scalar(&catalog_name, &catalog_filter_name)?);
+            let scalar = StringArray::from_iter_values([catalog_filter_name]);


As I user I find this construction somewhat awkward to create a simple scalar

What would you think about creating convenience functions that would let this code be like:

let scalar = Scalar::new_utf8(catalog_fitler_name);

(doesn't have to be part of this PR, I can add it as a follow on ticket/PR)

I'll have a think about what we can do here

alamb · 2023-08-16T10:20:48Z

arrow-flight/src/sql/metadata/sql_info.rs

+                let s = UInt32Array::from(vec![tt]);
+                eq(arr, &Scalar::new(&s))


Similarly, I think this would read better like

let s = Scalar::new_uint32(tt); eq(arr, &Scalar::new(&s))

Proposal in #4704

alamb · 2023-08-16T10:24:17Z

arrow-ord/src/lib.rs

@@ -43,6 +43,8 @@
 //! ```
 //!

+pub mod cmp;
+#[doc(hidden)]


What is the reason for hidden -- is it to discourage new uses of these functions? I also note you added "deprecated" notes to the functions in there, which should help people migrate

So that it is clear where to look for comparison kernels

alamb · 2023-08-16T10:51:32Z

arrow-ord/src/cmp.rs

+    let r = r_v.map(|x| x.values().as_ref()).unwrap_or(r);
+
+    let values = downcast_primitive_array! {
+        (l, r) => apply(op, l.values().as_ref(), l_s, l_v, r.values().as_ref(), r_s, r_v),


Just confirming that this handles Date, Time, Timestamp, etc types, right?

alamb · 2023-08-16T11:00:07Z

arrow-ord/src/cmp.rs

+    match (l_s, r_s) {
+        (None, None) => {
+            assert_eq!(l.len(), r.len());
+            collect_bool(l.len(), neg, |idx| unsafe {


is it worth adding a safety note here related to the fact that the index is coming from a valid array?

alamb · 2023-08-16T11:00:49Z

arrow-ord/src/cmp.rs

+    op: impl Fn(T::Item, T::Item) -> bool,
+) -> BooleanBuffer {
+    assert_eq!(l_v.len(), r_v.len());
+    collect_bool(l_v.len(), neg, |idx| unsafe {


same comment here about documenting the safety of using idx

alamb · 2023-08-16T11:02:03Z

arrow-ord/src/cmp.rs

+    })
+}
+
+trait ArrayOrd {


Could you possible add some more documentation to this trait?

alamb · 2023-08-16T11:05:11Z

arrow-ord/src/cmp.rs

+    }
+
+    fn is_eq(l: Self::Item, r: Self::Item) -> bool {
+        l.is_eq(r)


TIL: https://doc.rust-lang.org/std/cmp/enum.Ordering.html#method.is_eq

This is actually https://docs.rs/arrow-array/latest/arrow_array/trait.ArrowNativeTypeOp.html#tymethod.is_eq

tustvold · 2023-08-16T11:18:34Z

I do wonder if some of the compilation speedup is that arrow-rs no longer instantiates all the code for the different comparisons unless it is actually called

We don't make use of generics in the public interface, so it will instantiate all the variants regardless of if they're called. This is actually unlike before, so the speedup is quite possibly even more significant

alamb · 2023-08-16T12:01:10Z

I recommend leaving this PR open for another day or two to allow others the chance to comment (or request more time to comment)

tustvold · 2023-08-16T16:35:09Z

I will probably look to get this in Friday morning GMT to make the next arrow release

jhorstmann · 2023-08-16T18:00:01Z

We are currently several arrow releases behind in our engine, so I can't easily test the performance impact on our internal benchmark queries. The compile-time improvements seem really nice though, and if needed we could implement comparison kernels for our limited set of datatypes using portable_simd.

…-kernels

Datum based comparison kernels (apache#4596)

8cce942

github-actions bot added the arrow Changes to the arrow crate label Aug 15, 2023

tustvold commented Aug 15, 2023

View reviewed changes

Clippy

e309227

More clippy

c78212c

github-actions bot added the parquet Changes to the parquet crate label Aug 15, 2023

tustvold added 2 commits August 15, 2023 20:21

Even more clippy

ceee661

Further clippy

c41d948

github-actions bot added the arrow-flight Changes to the arrow-flight crate label Aug 15, 2023

tustvold added 2 commits August 15, 2023 21:01

Format

0653162

Use take kernel for scalar evaluation

dec1970

tustvold commented Aug 16, 2023

View reviewed changes

Clippy

01dc052

alamb approved these changes Aug 16, 2023

View reviewed changes

tustvold added the api-change Changes to the arrow API label Aug 16, 2023

Review feedback

023a3ef

alamb mentioned this pull request Aug 16, 2023

Improve ergonomics of Scalar #4704

Merged

tustvold added 2 commits August 17, 2023 11:46

Merge remote-tracking branch 'upstream/master' into scalar-comparison…

7a858a8

…-kernels

Use AnyDictionaryArray

1a934d6

tustvold merged commit 8bbb5c1 into apache:master Aug 18, 2023
31 checks passed

This was referenced Aug 21, 2023

Datum Based Comparison Kernels #4596

Closed

Use Datum for string kernels #4632

Closed

Remove Deprecated Comparison Kernels #4733

Closed

tustvold mentioned this pull request Dec 7, 2023

Remove SIMD Feature #5184

Merged

tustvold mentioned this pull request Mar 16, 2024

add comparation kernel for decimal array #2766

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datum based comparison kernels (#4596) #4701

Datum based comparison kernels (#4596) #4701

tustvold commented Aug 15, 2023

tustvold Aug 15, 2023

alamb Aug 16, 2023

tustvold Aug 16, 2023 •

edited

jhorstmann Aug 16, 2023

tustvold Aug 15, 2023

tustvold commented Aug 15, 2023

tustvold Aug 16, 2023

alamb Aug 16, 2023

tustvold Aug 16, 2023

tustvold commented Aug 16, 2023

alamb commented Aug 16, 2023 •

edited

tustvold commented Aug 16, 2023 •

edited

tustvold commented Aug 16, 2023 •

edited

alamb left a comment

alamb Aug 16, 2023

alamb Aug 16, 2023

alamb Aug 16, 2023

tustvold Aug 16, 2023

alamb Aug 16, 2023

tustvold Aug 16, 2023

alamb Aug 16, 2023

tustvold Aug 16, 2023

alamb Aug 16, 2023

alamb Aug 16, 2023

alamb Aug 16, 2023

alamb Aug 16, 2023

alamb Aug 16, 2023

tustvold Aug 16, 2023

tustvold commented Aug 16, 2023 •

edited

alamb commented Aug 16, 2023

tustvold commented Aug 16, 2023

jhorstmann commented Aug 16, 2023

		let s = UInt32Array::from(vec![tt]);
		eq(arr, &Scalar::new(&s))

Datum based comparison kernels (#4596) #4701

Datum based comparison kernels (#4596) #4701

Conversation

tustvold commented Aug 15, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Aug 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Aug 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Aug 16, 2023

alamb commented Aug 16, 2023 • edited

tustvold commented Aug 16, 2023 • edited

tustvold commented Aug 16, 2023 • edited

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Aug 16, 2023 • edited

alamb commented Aug 16, 2023

tustvold commented Aug 16, 2023

jhorstmann commented Aug 16, 2023

tustvold Aug 16, 2023 •

edited

alamb commented Aug 16, 2023 •

edited

tustvold commented Aug 16, 2023 •

edited

tustvold commented Aug 16, 2023 •

edited

tustvold commented Aug 16, 2023 •

edited