Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Add first and last aggregation #34911

Closed
icexelloss opened this issue Apr 5, 2023 · 3 comments
Closed

[C++] Add first and last aggregation #34911

icexelloss opened this issue Apr 5, 2023 · 3 comments

Comments

@icexelloss
Copy link
Contributor

icexelloss commented Apr 5, 2023

Describe the enhancement requested

Add first and last aggregation kernel and support in Acero

Component(s)

C++

@icexelloss
Copy link
Contributor Author

take

@westonpace
Copy link
Member

We spoke about this a bit externally but to record the conversation:

first and last can be written in two different ways. The unary version first(expr) is a window aggregate. It is an aggregate function that depends on the order of the data. There is also a binary variant (often called arg_min and arg_max) which returns the smallest value in column X given column Y. The advantage of the binary variant is that it doesn't depend on the order. However, it is a little less flexible (e.g. no way to specify custom sort function, not that we support that yet anyways :)).

From our conversation, my understanding is that you are interested in the unary window-aggregate version. We don't have any window aggregates yet but I do think it would be a good idea to start adding some. We have pretty much all the building blocks we need to support a "window aggregate node" (or an extension to the current aggregate nodes). Furthermore, even if we don't build in proper multithreaded support for window functions it should be possible to use window aggregates today as long as your plan is single threaded.

@westonpace westonpace added this to the 13.0.0 milestone Apr 28, 2023
@westonpace
Copy link
Member

Issue resolved by pull request 34912
#34912

westonpace pushed a commit that referenced this issue Apr 28, 2023
### Rationale for this change
This PR adds "first" and "last" aggregator and support using those with Acero's segmented aggregation.

### What changes are included in this PR?
- [x] Numeric Scalar Aggregator (bool, int types, floating types)
- [x] Numeric Hash Aggregator (bool, int types, floating types)
- [x] Docstring
- [x] Non-Numeric Scalar Aggregator (string, binary, fixed binary, temporal)
- [x] Non-Numeric Hash Aggregator (string, binary, fixed binary, temporal)
- [x] Add `ordered` flag in aggregate kernels
- [x] Implement and test skip null
- [x] Update compute.rst

### Are these changes tested?
- [x] Compute Kernel Test (Scalar Kernels, all supported datatypes)
- [x] Hash Aggregate Test (Hash Kernels, all supported datatypes)
- [x] Segmented Aggregation Test (Both Scalar and Hash Kernels)

### Are there any user-facing changes?
Yes. Added First and Last aggregator.

Authored-by: Li Jin <ice.xelloss@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this issue May 11, 2023
### Rationale for this change
This PR adds "first" and "last" aggregator and support using those with Acero's segmented aggregation.

### What changes are included in this PR?
- [x] Numeric Scalar Aggregator (bool, int types, floating types)
- [x] Numeric Hash Aggregator (bool, int types, floating types)
- [x] Docstring
- [x] Non-Numeric Scalar Aggregator (string, binary, fixed binary, temporal)
- [x] Non-Numeric Hash Aggregator (string, binary, fixed binary, temporal)
- [x] Add `ordered` flag in aggregate kernels
- [x] Implement and test skip null
- [x] Update compute.rst

### Are these changes tested?
- [x] Compute Kernel Test (Scalar Kernels, all supported datatypes)
- [x] Hash Aggregate Test (Hash Kernels, all supported datatypes)
- [x] Segmented Aggregation Test (Both Scalar and Hash Kernels)

### Are there any user-facing changes?
Yes. Added First and Last aggregator.

Authored-by: Li Jin <ice.xelloss@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
### Rationale for this change
This PR adds "first" and "last" aggregator and support using those with Acero's segmented aggregation.

### What changes are included in this PR?
- [x] Numeric Scalar Aggregator (bool, int types, floating types)
- [x] Numeric Hash Aggregator (bool, int types, floating types)
- [x] Docstring
- [x] Non-Numeric Scalar Aggregator (string, binary, fixed binary, temporal)
- [x] Non-Numeric Hash Aggregator (string, binary, fixed binary, temporal)
- [x] Add `ordered` flag in aggregate kernels
- [x] Implement and test skip null
- [x] Update compute.rst

### Are these changes tested?
- [x] Compute Kernel Test (Scalar Kernels, all supported datatypes)
- [x] Hash Aggregate Test (Hash Kernels, all supported datatypes)
- [x] Segmented Aggregation Test (Both Scalar and Hash Kernels)

### Are there any user-facing changes?
Yes. Added First and Last aggregator.

Authored-by: Li Jin <ice.xelloss@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
### Rationale for this change
This PR adds "first" and "last" aggregator and support using those with Acero's segmented aggregation.

### What changes are included in this PR?
- [x] Numeric Scalar Aggregator (bool, int types, floating types)
- [x] Numeric Hash Aggregator (bool, int types, floating types)
- [x] Docstring
- [x] Non-Numeric Scalar Aggregator (string, binary, fixed binary, temporal)
- [x] Non-Numeric Hash Aggregator (string, binary, fixed binary, temporal)
- [x] Add `ordered` flag in aggregate kernels
- [x] Implement and test skip null
- [x] Update compute.rst

### Are these changes tested?
- [x] Compute Kernel Test (Scalar Kernels, all supported datatypes)
- [x] Hash Aggregate Test (Hash Kernels, all supported datatypes)
- [x] Segmented Aggregation Test (Both Scalar and Hash Kernels)

### Are there any user-facing changes?
Yes. Added First and Last aggregator.

Authored-by: Li Jin <ice.xelloss@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants