New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-11572: [Rust] Add a kernel for division by single scalar #9454
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9454 +/- ##
==========================================
- Coverage 82.27% 82.26% -0.01%
==========================================
Files 244 244
Lines 55393 55427 +34
==========================================
+ Hits 45573 45599 +26
- Misses 9820 9828 +8
Continue to review full report at Codecov.
|
ac4e214
to
9f03b4f
Compare
9f03b4f
to
2eb0195
Compare
This is a good change, thanks @abreis. I think adding scalar equivalents to functions where we've previously converted scalars to arrays, ties us up in the future. The C++ implementation solves this issue by using a enum Datum {
Array(ArrayRef),
Scalar(ScalarRef)
} This would allow us to avoid What are your thoughts @jorgecarleitao @alamb @andygrove @ritchie46? |
IMO, that would be the superior approach (though it's a much larger rewrite). I tried to mock what it could look like for the numeric kernels: // Datum with concrete types for the arithmetic module
enum DatumNumeric<'a, T>
where
T: ArrowNumericType,
{
Array(&'a PrimitiveArray<T>),
Scalar(T::Native),
}
// lets the user pass arrays without interacting with Datum
impl<'a, T> From<&'a PrimitiveArray<T>> for DatumNumeric<'a, T>
where
T: ArrowNumericType,
{
fn from(array: &'a PrimitiveArray<T>) -> Self {
DatumNumeric::Array(array)
}
}
// can match and run specialized code for array/array, array/scalar, etcetera
fn datum_math_divide<'a, T, DN>(left: DN, right: DN) -> Result<PrimitiveArray<T>>
where
T: ArrowNumericType,
T::Native: Div<Output = T::Native> + Zero,
DN: Into<DatumNumeric<'a, T>>,
{
use DatumNumeric::*;
match (left.into(), right.into()) {
(Array(left), Array(right)) => todo!("array/array"),
(Array(array), Scalar(divisor)) => todo!("array/scalar"),
_ => todo!(),
}
}
fn test_datum_divide() {
let a = Int32Array::from(vec![15, 15, 8, 1, 9]);
let b = Int32Array::from(vec![5, 6, 8, 9, 1]);
let c = datum_math_divide(&a, &b).unwrap(); // Works, same interface as before
} However, Rust doesn't like the impl From for scalars (I tried |
I think @jorgecarleitao was doing something similar to the |
As @abreis' mock-up shows it can be an elegant way to reduce the public API surface. |
@abreis , I am not convinced that that is sufficient, unfortunately, because it excludes all types that are not Numeric (i.e. all dates and times for primitives, as well as all other logical types). We could of course offer a Generally, the logical operation performed on the data depends on its In my opinion, the data structure should be something like what DataFusion has, but instead of having one enum variant per logical type, we should have one enum variant per physical type, i.e. enum ScalarValue {
Int32(Option<i32>, DataType), // int32, date32, time32
Int64(Option<i64>, DataType), // int64, date64, time64, timestamp
List(Option<Vec<ScalarValue>>, DataType),
} The This corresponds to the notion that each variant needs to be treated fundamentally different because its physical layout (i.e. at the machine level) is different. This is how Rust handles these things with generics, because it relies on type information to compile the instructions. This would allow us to write our numerical operations neatly using a generic over We still have a problem over which a |
I agree. I just realized that Here is a more fleshed-out mock of a Datum type. The signature of Base type: #[derive(Debug)]
pub enum Datum<'a, T>
where
T: ArrowPrimitiveType,
{
Array(&'a PrimitiveArray<T>),
Scalar(Option<T::Native>),
}
impl<'a, T> From<&'a PrimitiveArray<T>> for Datum<'a, T>
where
T: ArrowPrimitiveType,
{
fn from(array: &'a PrimitiveArray<T>) -> Self {
Datum::Array(array)
}
}
impl<'a, T> From<Option<T::Native>> for Datum<'a, T>
where
T: ArrowPrimitiveType,
{
fn from(scalar: Option<T::Native>) -> Self {
Datum::Scalar(scalar)
}
} A (user-facing) method for math division: pub fn math_divide<'a1, 'a2, T, DL, DR>(
left: DL,
right: DR,
) -> Result<PrimitiveArray<T>>
where
T: ArrowNumericType,
T::Native: Div<Output = T::Native> + Zero,
DL: Into<Datum<'a1, T>>, // left and right may have different lifetimes
DR: Into<Datum<'a2, T>>, // but `T` must be the same
{
use Datum::*;
match (left.into(), right.into()) {
(Array(left), Array(right)) => todo!(), // array/array
(Array(array), Scalar(divisor)) => todo!(), // array/scalar
_ => todo!(),
}
} Test code: fn test_datum_divide() {
let array1 = Int32Array::from(vec![15, 15, 8, 1, 9]);
let array2 = Int32Array::from(vec![5, 6, 8, 9, 1]);
let scalar = Some(8i32);
let a_over_a = math_divide(&array1, &array2).unwrap(); // works
let a_over_s = math_divide(&array1, scalar).unwrap(); // also works
} @jorgecarleitao I'm aware this doesn't address the second half of your comment. I'm just exploring this idea for the current version of arrow. |
d4608a9
to
356c300
Compare
2eb0195
to
2926907
Compare
Given that
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two small suggestions. This looks really good :)
Also, note that there is a potential optimization for all SIMD here where we create an un-initialized buffer and write to it directly from SIMD (instead of creating a zeroed buffer and writing to it)
Thanks! I went to try and implement this, but it looks like it's already optimized as you suggest. let mut result = MutableBuffer::new(buffer_size).with_bitset(buffer_size, false); to prepare the return buffer, which uses |
@abreis I think this PR is ready to go but it needs a rebase. Can you please do so and I'll merge it in over the weekend? |
Apologies, I've been swamped this week. I want to add a couple more tests and take a look at Jorge's comments before it's merged. Let me ping you once I do. Thanks! |
Makes sense. Thanks @abreis ! |
2926907
to
6b52026
Compare
@alamb Thanks, this should be good to go now. I want to point out that the latest benchmarks (#9454 (comment)) suggest that the compiler's auto-vectorized code might be ~2-5% faster than our own SIMD implementation of this particular method. I have little experience with high-performance code though, so I can't say whether that will always be the case. |
Note that One easy win is to use What I was thinking was using This is not for this PR, though; was just a comment that we may be able to do more here. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @abreis -- really good stuff.
…nel in arrow This is a small PR to make DataFusion use the just-merged `divide_scalar` arrow kernel (#9454). Performance-wise: * on the `arrow` side, this specialized kernel is ~40-50% faster than the standard `divide`, mostly due to not having to check for divide-by-zero on every row; * on the `datafusion` side, it can now skip the `scalar.to_array_of_size(num_rows)` allocation, which should be a decent win for operations on large arrays. The eventual goal is to have `op_scalar` variants for every arithmetic operation — `divide` will show the biggest performance gains but all variants should save DataFusion a (possibly expensive) allocation. Closes #9543 from abreis/datafusion-divide-scalar Authored-by: Andre Braga Reis <andre@brg.rs> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This PR proposes a `divide_scalar` kernel that divides numeric arrays by a single scalar. Benchmarks show ~40-50% gains: ``` # features = [] divide 512 time: [2.3210 us 2.3345 us 2.3490 us] divide_scalar 512 time: [1.4374 us 1.4425 us 1.4485 us] (-38%) divide_nulls 512 time: [2.1718 us 2.1799 us 2.1894 us] divide_scalar_nulls 512 time: [1.3888 us 1.3959 us 1.4036 us] (-36%) # features = ["simd"] divide 512 time: [1.0221 us 1.0348 us 1.0481 us] divide_scalar 512 time: [468.04 ns 471.36 ns 475.19 ns] (-54%) divide_nulls 512 time: [960.20 ns 964.30 ns 969.15 ns] divide_scalar_nulls 512 time: [471.33 ns 476.41 ns 482.09 ns] (-51%) ``` The speedups are due to: - checking for `DivideByZero` only once; - not having to combine two null bitmaps; - using `Simd::splat()` to fill the divisor chunks. Tests are pretty bare right now, if you think this is worth merging I'll write a few more. Closes apache#9454 from abreis/divide-scalar Authored-by: Andre Braga Reis <andre@brg.rs> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
…nel in arrow This is a small PR to make DataFusion use the just-merged `divide_scalar` arrow kernel (apache#9454). Performance-wise: * on the `arrow` side, this specialized kernel is ~40-50% faster than the standard `divide`, mostly due to not having to check for divide-by-zero on every row; * on the `datafusion` side, it can now skip the `scalar.to_array_of_size(num_rows)` allocation, which should be a decent win for operations on large arrays. The eventual goal is to have `op_scalar` variants for every arithmetic operation — `divide` will show the biggest performance gains but all variants should save DataFusion a (possibly expensive) allocation. Closes apache#9543 from abreis/datafusion-divide-scalar Authored-by: Andre Braga Reis <andre@brg.rs> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This PR proposes a `divide_scalar` kernel that divides numeric arrays by a single scalar. Benchmarks show ~40-50% gains: ``` # features = [] divide 512 time: [2.3210 us 2.3345 us 2.3490 us] divide_scalar 512 time: [1.4374 us 1.4425 us 1.4485 us] (-38%) divide_nulls 512 time: [2.1718 us 2.1799 us 2.1894 us] divide_scalar_nulls 512 time: [1.3888 us 1.3959 us 1.4036 us] (-36%) # features = ["simd"] divide 512 time: [1.0221 us 1.0348 us 1.0481 us] divide_scalar 512 time: [468.04 ns 471.36 ns 475.19 ns] (-54%) divide_nulls 512 time: [960.20 ns 964.30 ns 969.15 ns] divide_scalar_nulls 512 time: [471.33 ns 476.41 ns 482.09 ns] (-51%) ``` The speedups are due to: - checking for `DivideByZero` only once; - not having to combine two null bitmaps; - using `Simd::splat()` to fill the divisor chunks. Tests are pretty bare right now, if you think this is worth merging I'll write a few more. Closes apache#9454 from abreis/divide-scalar Authored-by: Andre Braga Reis <andre@brg.rs> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
…nel in arrow This is a small PR to make DataFusion use the just-merged `divide_scalar` arrow kernel (apache#9454). Performance-wise: * on the `arrow` side, this specialized kernel is ~40-50% faster than the standard `divide`, mostly due to not having to check for divide-by-zero on every row; * on the `datafusion` side, it can now skip the `scalar.to_array_of_size(num_rows)` allocation, which should be a decent win for operations on large arrays. The eventual goal is to have `op_scalar` variants for every arithmetic operation — `divide` will show the biggest performance gains but all variants should save DataFusion a (possibly expensive) allocation. Closes apache#9543 from abreis/datafusion-divide-scalar Authored-by: Andre Braga Reis <andre@brg.rs> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This PR proposes a
divide_scalar
kernel that divides numeric arrays by a single scalar.Benchmarks show ~40-50% gains:
The speedups are due to:
DivideByZero
only once;Simd::splat()
to fill the divisor chunks.Tests are pretty bare right now, if you think this is worth merging I'll write a few more.