Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-3880: [Rust] Implement simple math operations for numeric arrays #3033

Closed
wants to merge 19 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@andygrove
Copy link
Contributor

andygrove commented Nov 26, 2018

No description provided.

andygrove added some commits Nov 26, 2018

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Nov 26, 2018

Codecov Report

Merging #3033 into master will increase coverage by 3.43%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3033      +/-   ##
==========================================
+ Coverage      87%   90.43%   +3.43%     
==========================================
  Files         496       13     -483     
  Lines       70563     2092   -68471     
==========================================
- Hits        61393     1892   -59501     
+ Misses       9069      200    -8869     
+ Partials      101        0     -101
Impacted Files Coverage Δ
rust/src/lib.rs 100% <ø> (ø) ⬆️
rust/src/array.rs 90.97% <100%> (+1.25%) ⬆️
rust/src/tensor.rs 93.11% <100%> (ø) ⬆️
rust/src/csv/reader.rs 87.5% <100%> (+0.12%) ⬆️
rust/src/error.rs 75% <100%> (+8.33%) ⬆️
rust/src/buffer.rs 93.53% <100%> (+0.02%) ⬆️
python/pyarrow/ipc.pxi
cpp/src/parquet/column_page.h
cpp/src/parquet/bloom_filter-test.cc
cpp/src/plasma/client.cc
... and 479 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 427a219...06bbc4a. Read the comment docs.

@andygrove

This comment has been minimized.

Copy link
Contributor Author

andygrove commented Nov 27, 2018

@paddyhoran @sunchao @kszucs Looking for a review please

}

pub fn divide(&self, other: &PrimitiveArray<$native_ty>) -> PrimitiveArray<$native_ty> {
self.math_helper(other, |a, b| a / b)

This comment has been minimized.

@paddyhoran

paddyhoran Nov 27, 2018

Contributor

Should we check if b is zero and push a null if it is?

This comment has been minimized.

@andygrove

andygrove Nov 27, 2018

Author Contributor

I believe divide by zero should be an error, as it would be usually

This comment has been minimized.

@andygrove

andygrove Nov 27, 2018

Author Contributor

However, we don't want a panic ... so probably divide should return a Result<> and should check first for any 0 values before attempting the operation

This comment has been minimized.

@paddyhoran

paddyhoran Nov 28, 2018

Contributor

Yea, maybe returning a Result is more appropriate.

This comment has been minimized.

@andygrove

andygrove Nov 28, 2018

Author Contributor

This is tricky ... the code is generic and using macros which makes it hard to check for literal zero ... I'm going to have to think about this some more. I probably have to refactor this a bit.

This comment has been minimized.

@paddyhoran

paddyhoran Nov 28, 2018

Contributor

We could use something like this.

In ARROW-3878 @sunchao is introducing the following:
pub trait ArrowNumericType: ArrowPrimitiveType {}

We could make this:
pub trait ArrowNumericType: ArrowPrimitiveType + Zero {} and you could make the above generic over ArrowNumericType.

I think it's fine to merge as is and open a JIRA for the follow up work, as once ARROW-3878 is merged the solution will change.

This comment has been minimized.

@andygrove

andygrove Nov 28, 2018

Author Contributor

Thanks @paddyhoran ... that num crate made things much nicer and now we no longer panic on divide by zero. Glad to see ArrowNumericType coming along ... I actually tried to implement that myself this week and failed.

self.math_helper(other, |a, b| a - b)
}

pub fn multiply(

This comment has been minimized.

@sunchao

sunchao Nov 27, 2018

Contributor

does this work for boolean types as well?

This comment has been minimized.

@andygrove

andygrove Nov 28, 2018

Author Contributor

No, this is only implemented for the numeric types (as with the existing min and max methods).

if self.is_null(i) || other.is_null(i) {
b.push_null().unwrap();
} else {
b.push(op(self.value(index), other.value(index))).unwrap();

This comment has been minimized.

@sunchao

sunchao Nov 27, 2018

Contributor

should we check length: what if other.len < self.len?

This comment has been minimized.

@andygrove

andygrove Nov 28, 2018

Author Contributor

Good point. I've added a check for that.

@@ -200,6 +201,55 @@ impl<T: ArrowNumericType> PrimitiveArray<T> {
&raw[offset as usize..offset as usize + len as usize]
}

//TODO: need help here ...

This comment has been minimized.

@andygrove

andygrove Dec 5, 2018

Author Contributor

@paddyhoran @sunchao I wonder if you can help me figure this out. Since the change to use specialization, I don't know how to make this work for numeric types but not boolean types. I'll keep trying but I feel I am missing something.

@andygrove

This comment has been minimized.

Copy link
Contributor Author

andygrove commented Dec 5, 2018

Well, I made it work using macros .... probably not ideal, but gives me the functionality I want.

@@ -239,6 +241,67 @@ impl<T: ArrowNumericType> PrimitiveArray<T> {
}
}

macro_rules! def_numeric_math_ops {

This comment has been minimized.

@sunchao

sunchao Dec 5, 2018

Contributor

Instead of macro, you can do:

impl<T: ArrowNumericType> PrimitiveArray<T>
where
    T::Native: Add<Output = T::Native>
        + Sub<Output = T::Native>
        + Mul<Output = T::Native>
        + Div<Output = T::Native>
        + Zero,
{
  // actual impl
}
@@ -15,6 +15,7 @@
// specific language governing permissions and limitations
// under the License.

use num::Zero;

This comment has been minimized.

@sunchao

sunchao Dec 5, 2018

Contributor

nit: can you move this line below std and above the use array_data::{ArrayData, ArrayDataRef};?

macro_rules! def_numeric_math_ops {
( $ty:ident, $native_ty:ident ) => {
impl PrimitiveArray<$ty> {
fn math_op<F>(self, other: &PrimitiveArray<$ty>, op: F) -> Result<PrimitiveArray<$ty>>

This comment has been minimized.

@sunchao

sunchao Dec 5, 2018

Contributor

I'm thinking whether we should expose some basic functions on primitive array, such as map, fold, etc. The math_op can be generalized to accept any function that takes two arrays. In functional world, it can be achieved via zip plus map, but I'm not sure how to implement zip on multiple primitive arrays..

@andygrove

This comment has been minimized.

Copy link
Contributor Author

andygrove commented Dec 5, 2018

@sunchao Thanks for the help, I re-implemented using generics. Should be good now. I agree with supporting zip and map and will experiment with that and create a separate PR if I figure it out.

@sunchao
Copy link
Contributor

sunchao left a comment

Thanks @andygrove . Code-wise this looks good to me. However I don't have enough knowledge to decide whether this will be good addition to the core array functionalities. Seems other languages are not doing this. Another option could be to make a math module and implement these there? seems these functions are not accessing array internals.

Maybe @wesm , @xhochy , @kszucs can help reviewing this too?

})
}

pub fn lt_eq(self, other: &PrimitiveArray<T>) -> Result<PrimitiveArray<BooleanType>> {

This comment has been minimized.

@sunchao

sunchao Dec 6, 2018

Contributor

nit: could we use BooleanArray instead of PrimitiveArray<BooleanType>? same below.

"Cannot perform math operation on two batches of different length".to_string(),
));
}
let mut b = PrimitiveArrayBuilder::<BooleanType>::new(self.len());

This comment has been minimized.

@sunchao

sunchao Dec 6, 2018

Contributor

nit: could we use BooleanArrayBuilder instead of PrimitiveArrayBuilder::<BooleanType>?

@andygrove

This comment has been minimized.

Copy link
Contributor Author

andygrove commented Dec 7, 2018

@sunchao Maybe you are right .. I have started a discussion on the mailing list about this.

@andygrove

This comment has been minimized.

Copy link
Contributor Author

andygrove commented Dec 10, 2018

@sunchao I moved the math and comparison operators out of array and into a new array_ops source file.

@sunchao
Copy link
Contributor

sunchao left a comment

LGTM. Thanks @andygrove !

@paddyhoran

This comment has been minimized.

Copy link
Contributor

paddyhoran commented Dec 11, 2018

+1 LGTM, thanks @andygrove

@wesm

This comment has been minimized.

Copy link
Member

wesm commented Dec 11, 2018

Needs rebase

@andygrove andygrove closed this in 2428945 Dec 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.