Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions rust/arrow/src/array/array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ pub trait Array: fmt::Debug + Send + Sync + JsonEqual {
/// assert_eq!(array.is_null(1), true);
/// ```
fn is_null(&self, index: usize) -> bool {
self.data().is_null(index)
self.data_ref().is_null(index)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alone improved the performance a lot, but directly using ArrayData was even better. Should also improve some other kernels.

I think the data method is a real performance foot-gun and would be in favor of removing it completely. That makes it obvious to every user whether an Arc gets cloned. The cloning itself might not be a problem, but I think it sometimes prohibits other compiler optimizations.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great finding!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I agree with you: require an explicit .clone from the user (also broadly speaking, not just for .data() in particular. We should audit the code base for similar occurrences.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created https://issues.apache.org/jira/browse/ARROW-10683 about removing data method.

}

/// Returns whether the element at `index` is not null.
Expand All @@ -174,7 +174,7 @@ pub trait Array: fmt::Debug + Send + Sync + JsonEqual {
/// assert_eq!(array.is_valid(1), false);
/// ```
fn is_valid(&self, index: usize) -> bool {
self.data().is_valid(index)
self.data_ref().is_valid(index)
}

/// Returns the total number of null values in this array.
Expand Down
10 changes: 5 additions & 5 deletions rust/arrow/src/array/ord.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@ pub type DynComparator<'a> = Box<dyn Fn(usize, usize) -> Ordering + 'a>;

/// compares two floats, placing NaNs at last
fn cmp_nans_last<T: Float>(a: &T, b: &T) -> Ordering {
match (a, b) {
(x, y) if x.is_nan() && y.is_nan() => Ordering::Equal,
(x, _) if x.is_nan() => Ordering::Greater,
(_, y) if y.is_nan() => Ordering::Less,
(_, _) => a.partial_cmp(b).unwrap(),
match (a.is_nan(), b.is_nan()) {
(true, true) => Ordering::Equal,
(true, false) => Ordering::Greater,
(false, true) => Ordering::Less,
_ => a.partial_cmp(b).unwrap(),
}
}

Expand Down
33 changes: 17 additions & 16 deletions rust/arrow/src/compute/kernels/sort.rs
Original file line number Diff line number Diff line change
Expand Up @@ -486,26 +486,27 @@ pub fn lexsort_to_indices(columns: &[SortColumn]) -> Result<UInt32Array> {
));
};

// convert ArrayRefs to OrdArray trait objects and perform row count check
// map to data and DynComparator
let flat_columns = columns
.iter()
.map(|column| -> Result<(&Array, DynComparator, SortOptions)> {
// flatten and convert build comparators
Ok((
column.values.as_ref(),
build_compare(column.values.as_ref(), column.values.as_ref())?,
column.options.unwrap_or_default(),
))
})
.collect::<Result<Vec<(&Array, DynComparator, SortOptions)>>>()?;
.map(
|column| -> Result<(&ArrayDataRef, DynComparator, SortOptions)> {
// flatten and convert build comparators
// use ArrayData for is_valid checks later to avoid dynamic call
let values = column.values.as_ref();
let data = values.data_ref();
Ok((
data,
build_compare(values, values)?,
column.options.unwrap_or_default(),
))
},
)
.collect::<Result<Vec<(&ArrayDataRef, DynComparator, SortOptions)>>>()?;

let lex_comparator = |a_idx: &usize, b_idx: &usize| -> Ordering {
for column in flat_columns.iter() {
let values = &column.0;
let comparator = &column.1;
let sort_option = column.2;

match (values.is_valid(*a_idx), values.is_valid(*b_idx)) {
for (data, comparator, sort_option) in flat_columns.iter() {
match (data.is_valid(*a_idx), data.is_valid(*b_idx)) {
(true, true) => {
match (comparator)(*a_idx, *b_idx) {
// equal, move on to next column
Expand Down