Introduce arrayRemove and Postgres compatibility alias array_remove function#89585
Introduce arrayRemove and Postgres compatibility alias array_remove function#89585Avogar merged 22 commits intoClickHouse:masterfrom
arrayRemove and Postgres compatibility alias array_remove function#89585Conversation
|
Workflow [PR], commit [2f240a5] Summary: ❌
|
aaa8a15 to
bb8e2e3
Compare
There was a problem hiding this comment.
Current implementation is a bit overcomplicated. We can make it simplier:
- Create a filter for array elements, where 1 will mean that the element should stay, 0 that the element should be removed;
- Apply
IColumn::filterfor array elements with the calculated filter. - Create new offsets based on the filter.
Filter can be created in different ways depending on the second argument type:
- If it's a constant
NULLwe can take null mask from the array elements and inverse it. - If it's not
Nullableand array elements are notNullable, we can just call functionnotEqualson array elements and replicated second argument based on offsets (IColumn::replicate, so it has the same number of rows as array elements, for constant columns this method doesn't copy any data actually). - If it's
Nullableor array elements areNullable, we need to take into account NULLs and can execute something likeNOT if(isNull(array_elements) OR isNull(element_to_remove), isNull(array_elements) AND isNull(element_to_remove), equals(array_elements, element_to_remove)to create a filter.
This will be the best way to implement it, because:
- It will support non-constant second argument
- Creating filters using other functions call will help us to deligate all the work to existing functions that are already optimized for every data type and can deal with arguments of different data types, so we don't need to worry about it at all.
- Calling
IColumn::filteris optimized for every data type, it's much more efficient than copying the data from one column to another row by row.
Functions can be created using FunctionFactiry global instance.
|
@Avogar Thanks for the detailed review comments. I've updated the implementation to your suggested approach. |
|
Test failures are in |
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
|
@Avogar The PR is ready for review again. Thanks for all the helpful comments. I took a look at the test failures and they all seem unrelated to this PR. For example, in the amd_debug test, I can see the following failure: |
Avogar
left a comment
There was a problem hiding this comment.
Looks great! Thanks! Last 2 small comments and will be ready for merge
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
|
Stress test (amd_ubsan) - #84669 |
59d7d62
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Support for
arrayRemove(arr, elem)to remove all elements equal toelemfrom the arrayarr. Resolves #52099Documentation entry for user-facing changes
Motivation: It is useful to be able to remove all elements equal to a certain value from an array. The function
arrayRemovesupports that functionality.Syntax:
arrayRemove(arr, elem)Parameters:
arr: Array(T)elem: TExamples:
SELECT arrayRemove([1, 2, 2, 3], 2)->[1, 3]SELECT arrayRemove(['a', NULL, 'b', NULL], NULL)->['a', 'b']Details
This PR introduces the
arrayRemovefunction and its PostgreSQL compatibility aliasarray_remove. The function removes all elements equal to a specified value from an array.Tests:
00700_decimal_array_functions03707_function_array_removeFixes #52099