Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of min and max aggregation kernels without nulls #1373

Closed
HaoYang670 opened this issue Mar 1, 2022 · 0 comments · Fixed by #1374
Closed

Improve performance of min and max aggregation kernels without nulls #1373

HaoYang670 opened this issue Mar 1, 2022 · 0 comments · Fixed by #1374
Labels
arrow Changes to the arrow crate performance

Comments

@HaoYang670
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The function min_max_string does some unnecessary checks when null_count > 0 . For example, we don't need to check has_value in every loop because it will always be true after the first loop.

    if null_count == 0 {
        n = array.value(0);
        for i in 1..data.len() {
            let item = array.value(i);
            if cmp(n, item) {
                n = item;
            }
        }
    } else {
        n = "";
        let mut has_value = false;

        for i in 0..data.len() {
            let item = array.value(i);
            if data.is_valid(i) && (!has_value || cmp(n, item)) {
                has_value = true;
                n = item;
            }
        }
    }

https://github.com/apache/arrow-rs/blob/master/arrow/src/compute/kernels/aggregate.rs#L55-L64

Apart from that, I want this function to be cleaned up because the "for loops" here are not pretty.

Describe the solution you'd like

  1. Performance should be improved when null_count > 0
  2. No performance penalty is introduced in other cases
  3. clean up the code. Maybe use some FP skills

Describe alternatives you've considered
We can also replace array.value(i) by array.value_unchecked(i). But it will introduce some "unsafe", so I am not sure.

@HaoYang670 HaoYang670 added the enhancement Any new improvement worthy of a entry in the changelog label Mar 1, 2022
@alamb alamb changed the title Clean up and improve the performance of min_max_string Improve performance of min and max aggregation kernels without nulls Mar 3, 2022
@alamb alamb added arrow Changes to the arrow crate performance and removed enhancement Any new improvement worthy of a entry in the changelog labels Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants