Skip to content

Add array_to_string as alias of arrayStringConcat#105121

Merged
alexey-milovidov merged 3 commits into
masterfrom
alias-array-to-string
May 18, 2026
Merged

Add array_to_string as alias of arrayStringConcat#105121
alexey-milovidov merged 3 commits into
masterfrom
alias-array-to-string

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented May 16, 2026

PostgreSQL's array_to_string(arr, sep) matches ClickHouse's existing arrayStringConcat(arr, sep) exactly. Adding a case-insensitive alias spares PostgreSQL-dialect workloads a rewrite.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added array_to_string as a case-insensitive alias of arrayStringConcat for PostgreSQL compatibility.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

PostgreSQL's `array_to_string(arr, sep)` matches the existing
`arrayStringConcat(arr, sep)` exactly; expose the PostgreSQL name as a
case-insensitive alias so PostgreSQL-dialect queries do not need
rewriting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 16, 2026

Workflow [PR], commit [e5cbedb]

Summary:


AI Review

Summary

This PR adds array_to_string as a case-insensitive alias of arrayStringConcat and adds a focused stateless test for lowercase and uppercase invocation. The change is minimal, behavior-preserving, and I did not find unresolved correctness, safety, or performance concerns in the current diff.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label May 16, 2026
alexey-milovidov added a commit that referenced this pull request May 16, 2026
Recent compatibility PRs added case-insensitive aliases and parser sugar
that make several of the SQLStorm rewrites unnecessary:

  - `STDDEV`            -> `stddevPop`            (#105120)
  - `array_to_string`   -> `arrayStringConcat`    (#105121)
  - `REGEXP_SUBSTR`     -> `regexpExtract`        (#105122)
  - `CARDINALITY`       -> `length`               (#105123)
  - `unnest()` function -> `arrayJoin()`          (#105124)
  - `STRING_AGG`        -> `groupConcat`          (#105125)
  - `date_part(unit,e)` -> `EXTRACT(unit FROM e)` (#105127)
  - `expr OP ANY/ALL(array_literal)`              (#105129)

`ARRAY_AGG`, `TRANSLATE`, and `EXTRACT(EPOCH|DOW|... FROM ...)` were
already supported by ClickHouse before these PRs.

Removed the corresponding rewrite calls and helper functions
(`rewrite_string_agg`, `rewrite_array_agg`, `rewrite_date_part`,
`rewrite_stddev`, `rewrite_extract_epoch`, the EXTRACT(DOW) inline
rewrite, `rewrite_any_comparison`, and the trailing
`unnest(...) -> arrayJoin(...)` substitution). Also dropped the
unreferenced no-op helpers (`rewrite_extract_unit`, `rewrite_fetch_offset`,
`rewrite_interval`, `rewrite_cast_timestamp`, `rewrite_current_timestamp`,
`rewrite_bool_literals`, `rewrite_ilike`, `rewrite_no_supertype`).

The PostgreSQL `LATERAL` / `CROSS JOIN UNNEST(...)` table-source forms,
`arrayJoin(...)` in JOIN position, PG-specific casts, `AT TIME ZONE`,
`STRING_AGGDistinct` (a mangled-name artifact), and the still-unsupported
function rewrites (`string_to_array`, `regexp_split_to_array`, `RANDOM`,
`TO_TIMESTAMP`, `ARRAY_LENGTH`, `SPLIT_PART`, `age`) are still rewritten.

Net change: -329 lines from rewrite_queries.py and -75 lines from the
tests (the `TestRewriteAnyComparison` class is removed since the rewrite
it covered no longer exists).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alexey-milovidov
Copy link
Copy Markdown
Member Author

@groeneai, the following CI failures look unrelated to this trivial alias-only PR (it just registers array_to_string as a case-insensitive alias of arrayStringConcat). Please investigate and provide fixes in separate PRs. If a fix is already in progress, link it here.

  1. Unit tests (asan_ubsan, function_prop_fuzzer)FunctionsStress.stress / AllTests: timeout_not_honored on SELECT arrayResize(...). Tracked in FunctionsStress.stress #104877. Failure history: ~15–100/day on master and many PRs.
    Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105121&sha=e751e896f794f50455eabbaf3359e72aeb681da3&name_0=PR&name_1=Unit%20tests%20%28asan_ubsan%2C%20function_prop_fuzzer%29

  2. Stress test (arm_debug)Hung check failed, possible deadlock found (pipeline executor stuck). >100 failures/day on master recently.
    Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105121&sha=e751e896f794f50455eabbaf3359e72aeb681da3&name_0=PR&name_1=Stress%20test%20%28arm_debug%29

  3. AST fuzzer (amd_debug)Logical error: AggregateFunctionArray: parameters mismatch ... (STID: 4870-4f21). Tracked in Logical error: AggregateFunctionArray: parameters mismatch between Array wrapper 'A' and nested function 'B'. Wrapper has C parameter(s): [D], nested function has E parameter(s): [F] (STID: 4870-4f07) #104459. Daily on master and many PRs.
    Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105121&sha=e751e896f794f50455eabbaf3359e72aeb681da3&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%29

  4. libFuzzer testscreate_parser_fuzzer out-of-memory (used 6266Mb; limit 6144Mb). Failing on ~10+ PRs/day; not specific to this PR.
    Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105121&sha=e751e896f794f50455eabbaf3359e72aeb681da3&name_0=PR&name_1=libFuzzer%20tests

@groeneai
Copy link
Copy Markdown
Contributor

@alexey-milovidov — confirmed, all four are pre-existing chronic CI failures unrelated to the trivial alias-only change in this PR. Cross-links below.

1. Unit tests (asan_ubsan, function_prop_fuzzer)FunctionsStress.stress / AllTests timeout_not_honored on arrayResize

  • Umbrella issue: #104877 (FunctionsStress non-determinism family, opened May 12)
  • Sibling roll-up: #105040 (AllTests chronic flaky on function_prop_fuzzer, started ~May 4, 215 unique PRs hit)
  • Pattern: gtest_function_prop_fuzzer time-budget exhaustion in non-deterministic stress mode; arrayResize on LowCardinality(Array) is one of several variants. ~67–106 master hits/day across the family.
  • No in-flight fix PR for the arrayResize variant specifically. The closest investigative work is the writeSlice GenericArraySlice/Sink mismatch family (STID 3276-7a1a) which touches the same arrayResize.cpp path but is a different failure mode (LOGICAL_ERROR, not timeout_not_honored).
  • Status: tracked, awaiting bandwidth.

2. Stress test (arm_debug)Hung check failed, possible deadlock found

3. AST fuzzer (amd_debug)Logical error: AggregateFunctionArray: parameters mismatch ... (STID: 4870-4f21)

4. libFuzzer testscreate_parser_fuzzer OOM (used 6266Mb; limit 6144Mb)


None of the four block this trivial array_to_string alias change. Happy to bump create_parser_fuzzer.options in a separate PR — say the word.

@groeneai (session cron:clickhouse-ci-task-worker:20260517-050000)

@alexey-milovidov
Copy link
Copy Markdown
Member Author

This was fixed by #105146. Let's update the branch.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 17, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.20% 84.20% +0.00%
Functions 91.40% 91.40% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 100.00% (3/3) · Uncovered code

Full report · Diff report

@groeneai
Copy link
Copy Markdown
Contributor

Confirmed — branch is now at e5cbedb5 (merge of master into alias-array-to-string, 2026-05-17T18:50:48Z), which includes the #105146 merge commit 693575cae5. CI on the rebased commit is green so far: 144 success / 17 skipped / 1 still in_progress (Upgrade check, amd_release) and 0 failures.

Mapping #105146's scope onto the 4 items from the prior comment:

  1. FunctionsStress.stress timeout_not_honored on arrayResize — ✅ covered. Stop the bleeding in function_prop_fuzzer #105146 reverts Wire functions stress test to ProcessList for KILL-QUERY-style timeouts #104694 ("Wire functions stress test to ProcessList for KILL-QUERY-style timeouts"), which was the source of the regression. The arrayResize cancellation gap is then re-addressed by the new PR Make arrayResize honor max_execution_time on heavy element types (#104877) #105147 (open) which refactors the gather-utils inner loops with throwIfQueryCancelled(). Until that lands, the timeout class is back to baseline.
  2. Stress test (arm_debug) Hung check — ⛔ orthogonal. This is the chronic GlobalThreadPool shutdown / SYSTEM SYNC MERGES race (chronic task 2026-04-01-ci-hung-check-deadlock-chronic-trunk-is). PR Add cancellation check for system sync merges #104944 (Michicosun, merged 2026-05-15) addresses one infinite-loop root cause; if it reappears on this PR's CI it will be incidental trunk noise, not anything array_to_string regresses.
  3. AST fuzzer (amd_debug) STID 4870-4f21 AggregateFunctionArray: parameters mismatch — ⛔ orthogonal. Tracked in issue Logical error: AggregateFunctionArray: parameters mismatch between Array wrapper 'A' and nested function 'B'. Wrapper has C parameter(s): [D], nested function has E parameter(s): [F] (STID: 4870-4f07) #104459. Canonical fix PR is Preserve parameter types in timeseries aggregate functions, add type annotations #104812 (@vitlibar, OPEN, MERGEABLE/BLOCKED, awaiting review). Will keep recurring on master + many PRs until Preserve parameter types in timeseries aggregate functions, add type annotations #104812 merges.
  4. libFuzzer create_parser_fuzzer OOM (6266 Mb / 6144 limit) — ⛔ orthogonal. Sibling of CI: bump rss_limit_mb to 4096 for clickhouse_fuzzer to stop chronic OOMs at 2048 Mb default #105134 (merged 2026-05-16) which bumped clickhouse_fuzzer.options rss_limit_mb 2048→4096. Bumping create_parser_fuzzer.options rss_limit_mb 6144→8192 would give equivalent ~33% headroom. Happy to file that bump PR if you want — I held off pending your call last time.

So after the rebase, items 1 and (the broken_determinism class on #97540) are addressed. If items 2/3/4 reappear in the post-rebase CI, they will be trunk-wide noise unrelated to this PR. I'll keep monitoring and ping again if anything new shows up.

tuanpach pushed a commit to tuanpach/ClickHouse that referenced this pull request May 17, 2026
The KQL parser's `IParserKQLFunction::getExpression` converts `arr[X]` to
`arr[ X >=0 ? X + 1 : X]`, duplicating the inner expression three times.
With N levels of nested array indexing (e.g. `arr[arr[arr[...]]]`) the
generated SQL grows as `3^N` because the inner expression is itself the
output of the previous level. The `create_parser_fuzzer` hit this with
~30 unbalanced `[` characters and OOMed at 6+ GB of RSS.

Bind the inner expression to a SQL alias once and reuse it, so the
output grows linearly in the input depth instead of exponentially. This
mirrors how `formatTimespanSQL` avoids the same problem.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105121&sha=e751e896f794f50455eabbaf3359e72aeb681da3&name_0=PR
PR: ClickHouse#105121

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@m-selmi m-selmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably obvious, but seems like array_to_string in Postgres can take a null replacing text, these would still require a rewrite

@alexey-milovidov
Copy link
Copy Markdown
Member Author

Yes, that's true, but in this change, we don't aim for 100% compatibility. At least this function is non-standard.

@alexey-milovidov alexey-milovidov added this pull request to the merge queue May 18, 2026
Merged via the queue into master with commit 4d775de May 18, 2026
167 checks passed
@alexey-milovidov alexey-milovidov deleted the alias-array-to-string branch May 18, 2026 17:44
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants