Skip to content

Conversation

@Ted-Jiang
Copy link
Member

Which issue does this PR close?

Closes #10055.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Apr 12, 2024

pub(crate) fn merge_digests(&mut self, digests: &[TDigest]) {
self.digest = TDigest::merge_digests(digests);
let mut input_digests = digests.to_vec();
Copy link
Member Author

@Ted-Jiang Ted-Jiang Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As one Accumulator call merge() should not lose it inner status, this is not a good API desgin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me, but it is hard to review without a test that demonstrates the incorrect behavior. Would it be possible to add a unit test as part of this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree we should add a test for this fix (otherwise we may break the behavior again in a subsequent refactoring, for example)

@Ted-Jiang Ted-Jiang requested a review from alamb April 12, 2024 08:22
@Ted-Jiang
Copy link
Member Author

@alamb @jychen7 PTAL

@alamb alamb marked this pull request as draft April 13, 2024 13:36
@alamb
Copy link
Contributor

alamb commented Apr 13, 2024

Marking as draft as I think this PR is waiting on a test

@Ted-Jiang Ted-Jiang marked this pull request as ready for review April 15, 2024 03:29
@Ted-Jiang
Copy link
Member Author

Ted-Jiang commented Apr 15, 2024

Sorry for the delay, add a test on digest merge add test for accumulator merge_digests

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thanks @Ted-Jiang

I ran the test without the changes in this PR and verified it failed:


---- aggregate::approx_percentile_cont::tests::test_combine_approx_percentile_accumulator stdout ----
thread 'aggregate::approx_percentile_cont::tests::test_combine_approx_percentile_accumulator' panicked at datafusion/physical-expr/src/aggregate/approx_percentile_cont.rs:471:9:
assertion `left == right` failed
  left: 50000.0
 right: 100000.0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    aggregate::approx_percentile_cont::tests::test_combine_approx_percentile_accumulator

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 1574 filtered out; finished in 0.01s

I think this code introduces some unecessary cloning, but that can be avoided using something like Ted-Jiang#118

pub(crate) fn merge_digests(&mut self, digests: &[TDigest]) {
self.digest = TDigest::merge_digests(digests);
let mut input_digests = digests.to_vec();
input_digests.push(self.digest.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible to avoid these clones -- here is a proposal that targets this PR Ted-Jiang#118 to do so

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ted-Jiang

Reduce cloneing in ApproxPercentileAccumulator
@Ted-Jiang Ted-Jiang merged commit 74b966e into apache:main Apr 16, 2024
Omega359 pushed a commit to Omega359/arrow-datafusion that referenced this pull request Apr 16, 2024
* improve ApproxPercentileAccumulator merge api and fix bug

* add test for accumulator merge_digests

* fix test

* Reduce cloneing in ApproxPercentileAccumulator

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix approx_percentile_cont_with_weight update_batch bug

3 participants