Skip to content

Introduce arrayRemove and Postgres compatibility alias array_remove function#89585

Merged
Avogar merged 22 commits intoClickHouse:masterfrom
tiwarysaurav:arrayRemove
Nov 14, 2025
Merged

Introduce arrayRemove and Postgres compatibility alias array_remove function#89585
Avogar merged 22 commits intoClickHouse:masterfrom
tiwarysaurav:arrayRemove

Conversation

@tiwarysaurav
Copy link
Copy Markdown
Contributor

@tiwarysaurav tiwarysaurav commented Nov 5, 2025

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Support for arrayRemove(arr, elem) to remove all elements equal to elem from the array arr. Resolves #52099

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Motivation: It is useful to be able to remove all elements equal to a certain value from an array. The function arrayRemove supports that functionality.

Syntax: arrayRemove(arr, elem)
Parameters:

  1. arr: Array(T)
  2. elem: T

Examples:
SELECT arrayRemove([1, 2, 2, 3], 2) -> [1, 3]
SELECT arrayRemove(['a', NULL, 'b', NULL], NULL) -> ['a', 'b']

Details

This PR introduces the arrayRemove function and its PostgreSQL compatibility alias array_remove. The function removes all elements equal to a specified value from an array.

Tests:

  • 00700_decimal_array_functions
  • 03707_function_array_remove

Fixes #52099

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Nov 5, 2025

CLA assistant check
All committers have signed the CLA.

@Avogar Avogar self-assigned this Nov 5, 2025
@Avogar Avogar added the can be tested Allows running workflows for external contributors label Nov 5, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Nov 5, 2025

Workflow [PR], commit [2f240a5]

Summary:

job_name test_name status info comment
Stress test (amd_ubsan) failure
Server died FAIL cidb
Hung check failed, possible deadlock found (see hung_check.log) FAIL cidb
Killed by signal (in clickhouse-server.log) FAIL cidb
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL cidb
Killed by signal (output files) FAIL cidb
Found signal in gdb.log FAIL cidb
AST fuzzer (amd_ubsan) failure
Logical error: 'Unexpected node type for table expression. Expected table, table function, query, union, join or array join. Actual IDENTIFIER'. FAIL cidb
BuzzHouse (amd_debug) failure
Buzzing result failure cidb
Performance Comparison (amd_release, master_head, 5/6) failure
Insert historical data failure

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Nov 5, 2025
Copy link
Copy Markdown
Member

@Avogar Avogar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation is a bit overcomplicated. We can make it simplier:

  1. Create a filter for array elements, where 1 will mean that the element should stay, 0 that the element should be removed;
  2. Apply IColumn::filter for array elements with the calculated filter.
  3. Create new offsets based on the filter.

Filter can be created in different ways depending on the second argument type:

  • If it's a constant NULL we can take null mask from the array elements and inverse it.
  • If it's not Nullable and array elements are not Nullable, we can just call function notEquals on array elements and replicated second argument based on offsets (IColumn::replicate, so it has the same number of rows as array elements, for constant columns this method doesn't copy any data actually).
  • If it's Nullable or array elements are Nullable, we need to take into account NULLs and can execute something like NOT if(isNull(array_elements) OR isNull(element_to_remove), isNull(array_elements) AND isNull(element_to_remove), equals(array_elements, element_to_remove) to create a filter.

This will be the best way to implement it, because:

  1. It will support non-constant second argument
  2. Creating filters using other functions call will help us to deligate all the work to existing functions that are already optimized for every data type and can deal with arguments of different data types, so we don't need to worry about it at all.
  3. Calling IColumn::filter is optimized for every data type, it's much more efficient than copying the data from one column to another row by row.

Functions can be created using FunctionFactiry global instance.

@tiwarysaurav tiwarysaurav requested a review from Avogar November 11, 2025 08:17
@tiwarysaurav
Copy link
Copy Markdown
Contributor Author

@Avogar Thanks for the detailed review comments. I've updated the implementation to your suggested approach.

@tiwarysaurav
Copy link
Copy Markdown
Contributor Author

Test failures are in test_ytsaurus/test_dictionaries.py:: which seems unrelated and likely flaky/existing issue.

tiwarysaurav and others added 9 commits November 11, 2025 20:25
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
@tiwarysaurav tiwarysaurav requested a review from Avogar November 11, 2025 16:45
@tiwarysaurav
Copy link
Copy Markdown
Contributor Author

@Avogar The PR is ready for review again. Thanks for all the helpful comments.

I took a look at the test failures and they all seem unrelated to this PR. For example, in the amd_debug test, I can see the following failure:

(version 25.11.1.2486)
(query: SELECT ALL `t0d0`.`c0`, `t0d0`.`c0`, `t0d0`.`c2.keys` FROM `t10` AS t0d0 WHERE `t0d0`.`c0` = 0.125 OR equals(`t0d0`.`c3`, -4093048032609258357::Int) OR endsWith(`t0d0`.`c2`.`c1`.`😆`, 'leave') GROUP BY toDecimal128('254602871.36919224815240229', 17), `t0d0`.`c0`, `t0d0`.`c2.keys`, `t0d0`.`c0.c1` WITH ROLLUP WINDOW w0 AS (RANGE BETWEEN ('1900-01-01 00:00:00'::DateTime64) FOLLOWING AND CURRENT ROW), w1 AS (RANGE BETWEEN -5434965264930459606 FOLLOWING AND -5246358408012005262 FOLLOWING) ORDER BY ALL LIMIT (-toDecimal128('254602871.36919224815240229', 17)), 0 WITH TIES INTO OUTFILE '/var/lib/clickhouse/user_files/file.data' TRUNCATE FORMAT Null;)
Received exception from server (version 25.11.1):
Code: 47. DB::Exception: Received from localhost:9000. DB::Exception: Identifier 't0d0.c0.c1' cannot be resolved from table with name t0d0. In scope SELECT CAST('{"😆":[{"c1":[],"c0":"名字"},"2069-10-14 08:21:20.58313875",[]],"😆":-826643605,"c1":null,"c0":"915:33:51","😉😉":"help","c0":"日本"}', 'JSON') != t0d0.c3, a1, [1] FROM d1.t35 AS t0d0 FULL OUTER JOIN d0.t13 AS t1d0 ON toUnixTimestamp64Nano(t0d0.`c0.c1`) = t1d0.c2.`Dynamic(max_types=8)` PASTE JOIN d1.t21 AS t2d0 GLOBAL PASTE JOIN d1.t21 AS t3d0 WINDOW
    w0 AS (PARTITION BY ngramSearchCaseInsensitiveUTF8(CAST('4012978702958360433', 'Int16'), t1d0.c3) AS a0, t2d0.`c0.c1`),
    w1 AS (PARTITION BY 1125407332334229629, sumKahanDistinct(t2d0.c0), CAST(CAST('[(6934965930309277685,1540133112), (-167807725,190441940), (1440619609,0.34417241607352030793954571168648206944643024787245), (-63338.033553552123102558414878893255738213479,-40438407991008596874228746296054.77108071252762853098113647237244), (5715074626936495255,-3812622998043258849), (1957482384,-1089155839)]', 'Geometry'), 'UInt256') ORDER BY t1d0.`c0.c1`[c1] ASC NULLS LAST, t2d0.c3 ASC, t1d0.c2 DESC NULLS FIRST ROWS BETWEEN CURRENT ROW AND sipHash64Keyed(CAST('23044531579085759433695536754696325000', 'Int256')) PRECEDING),
    w2 AS (ORDER BY t0d0.`c0.c1` DESC NULLS FIRST, arrayReverseSplit((p0, p1, p2) -> p0, t3d0.c0.`😉😉`.:`Array(JSON)` AS a1) ASC NULLS FIRST). Maybe you meant: ['t0d0.c0']. Stack trace:

0. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/contrib/llvm-project/libcxx/include/__exception/exception.h:113: Poco::Exception::Exception(String const&, int) @ 0x00000000253168b2
1. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Common/Exception.cpp:129: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000001522c629
2. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Common/Exception.h:123: DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000d57ea0e
3. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Common/Exception.h:58: DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000d57e551
...

Copy link
Copy Markdown
Member

@Avogar Avogar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks! Last 2 small comments and will be ready for merge

tiwarysaurav and others added 2 commits November 13, 2025 18:28
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
@tiwarysaurav tiwarysaurav requested a review from Avogar November 13, 2025 13:04
@Avogar
Copy link
Copy Markdown
Member

Avogar commented Nov 14, 2025

@Avogar Avogar added this pull request to the merge queue Nov 14, 2025
Merged via the queue into ClickHouse:master with commit 59d7d62 Nov 14, 2025
125 of 130 checks passed
@tiwarysaurav tiwarysaurav deleted the arrayRemove branch November 14, 2025 17:38
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Function arrayRemove and a compatibility alias array_remove for PostgreSQL

5 participants