New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added function arrayFold()
#49794
Added function arrayFold()
#49794
Conversation
This is an automated comment for commit 878e36d with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
This is an automatic comment. The PR descriptions does not match the template. Please, edit it accordingly. The error is: More than one changelog category specified: 'New Feature', 'Implementaеion of function arrayFold(x1,...,xn,accum -> expression, array1,...,arrayn, init_accum) applying lambda function to a number of same sized array columns and collecting result in accumulator. Accumulator can be either constant or column.' |
This is an automatic comment. The PR descriptions does not match the template. Please, edit it accordingly. The error is: More than one changelog category specified: 'New Feature', '### Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):' |
2 similar comments
This is an automatic comment. The PR descriptions does not match the template. Please, edit it accordingly. The error is: More than one changelog category specified: 'New Feature', '### Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):' |
This is an automatic comment. The PR descriptions does not match the template. Please, edit it accordingly. The error is: More than one changelog category specified: 'New Feature', '### Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Would you add some docs to docs/en/sql-reference/functions/array-functions.md?
And just out of interest: did you have a look at the previous attempts to add arrayFold? #34155. Asking because there were some performance issues and I am curious how this PR addresses them?
Yes, I had a look at previous solution, and implemented function in vectorized fashion as was suggested by alexey-milovidov. In a week I am planning to add docs and some tests comparing this functions against specified functions such as arraySum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will check the code probably tomorrow. It would be nice if you could add some comments to the code that document the design decisions (so novice readers like me will have an easier time 😄 )
eah, alexey-milovidov had the same comment about the lack of comments, surely will do it too |
|
||
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override | ||
{ | ||
if (arguments.size() < 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need at least three arguments?
size_t getNumberOfArguments() const override { return 0; } | ||
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return true; } | ||
|
||
void getLambdaArgumentTypes(DataTypes & arguments) const override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function getLambdaArgumentTypes
isn't implemented often in the codebase. Suggest to add some comments what the elements arguments
represent. I think they are simply the function arguments (lambda function, array arguments, initial accumulator) but a naive reader could think they are the lambda's arguments.
src/Functions/array/arrayFold.cpp
Outdated
{ | ||
const DataTypeArray * array_type = checkAndGetDataType<DataTypeArray>(&*arguments[i + 1]); | ||
if (!array_type) | ||
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Argument {} of function {} must be array. Found {} instead.", toString(i + 2), getName(), arguments[i + 1]->getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(toString()
is superfluous)
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Argument {} of function {} must be array. Found {} instead.", toString(i + 2), getName(), arguments[i + 1]->getName()); | ||
nested_types[i] = recursiveRemoveLowCardinality(array_type->getNestedType()); | ||
} | ||
nested_types[nested_types.size() - 1] = arguments[arguments.size() - 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nested_types.back() = arguments.back()
?
} | ||
} | ||
|
||
std::vector<MutableColumns> data_arrays; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written elsewhere, below code is a bit hard to grasp. Let's add comments.
Fixes #34155
Implemention of function
arrayFold(x1 ,..., xn, accum -> expression, array1,...,arrayn, init_accum)
, applying a lambda function to a number of array columns, collecting the result in an accumulator. The accumulator can be either constant or a column.The implementation gets the columns of the N-th elements for every array and applies them sequentially.
Examples:
Changelog category :
Changelog entry :
Add function "arrayFold(x1, ..., xn, accum -> expression, array1, ..., arrayn, init_accum)" which applies a lambda function to multiple arrays of the same cardinality and collects the result in an accumulator.