Add array_to_string as alias of arrayStringConcat#105121
Conversation
PostgreSQL's `array_to_string(arr, sep)` matches the existing `arrayStringConcat(arr, sep)` exactly; expose the PostgreSQL name as a case-insensitive alias so PostgreSQL-dialect queries do not need rewriting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Workflow [PR], commit [e5cbedb] Summary: ✅ AI ReviewSummaryThis PR adds Final VerdictStatus: ✅ Approve |
Recent compatibility PRs added case-insensitive aliases and parser sugar that make several of the SQLStorm rewrites unnecessary: - `STDDEV` -> `stddevPop` (#105120) - `array_to_string` -> `arrayStringConcat` (#105121) - `REGEXP_SUBSTR` -> `regexpExtract` (#105122) - `CARDINALITY` -> `length` (#105123) - `unnest()` function -> `arrayJoin()` (#105124) - `STRING_AGG` -> `groupConcat` (#105125) - `date_part(unit,e)` -> `EXTRACT(unit FROM e)` (#105127) - `expr OP ANY/ALL(array_literal)` (#105129) `ARRAY_AGG`, `TRANSLATE`, and `EXTRACT(EPOCH|DOW|... FROM ...)` were already supported by ClickHouse before these PRs. Removed the corresponding rewrite calls and helper functions (`rewrite_string_agg`, `rewrite_array_agg`, `rewrite_date_part`, `rewrite_stddev`, `rewrite_extract_epoch`, the EXTRACT(DOW) inline rewrite, `rewrite_any_comparison`, and the trailing `unnest(...) -> arrayJoin(...)` substitution). Also dropped the unreferenced no-op helpers (`rewrite_extract_unit`, `rewrite_fetch_offset`, `rewrite_interval`, `rewrite_cast_timestamp`, `rewrite_current_timestamp`, `rewrite_bool_literals`, `rewrite_ilike`, `rewrite_no_supertype`). The PostgreSQL `LATERAL` / `CROSS JOIN UNNEST(...)` table-source forms, `arrayJoin(...)` in JOIN position, PG-specific casts, `AT TIME ZONE`, `STRING_AGGDistinct` (a mangled-name artifact), and the still-unsupported function rewrites (`string_to_array`, `regexp_split_to_array`, `RANDOM`, `TO_TIMESTAMP`, `ARRAY_LENGTH`, `SPLIT_PART`, `age`) are still rewritten. Net change: -329 lines from rewrite_queries.py and -75 lines from the tests (the `TestRewriteAnyComparison` class is removed since the rewrite it covered no longer exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@groeneai, the following CI failures look unrelated to this trivial alias-only PR (it just registers
|
|
@alexey-milovidov — confirmed, all four are pre-existing chronic CI failures unrelated to the trivial alias-only change in this PR. Cross-links below. 1.
|
|
This was fixed by #105146. Let's update the branch. |
LLVM Coverage Report
Changed lines: 100.00% (3/3) · Uncovered code |
|
Confirmed — branch is now at Mapping #105146's scope onto the 4 items from the prior comment:
So after the rebase, items 1 and (the broken_determinism class on #97540) are addressed. If items 2/3/4 reappear in the post-rebase CI, they will be trunk-wide noise unrelated to this PR. I'll keep monitoring and ping again if anything new shows up. |
The KQL parser's `IParserKQLFunction::getExpression` converts `arr[X]` to `arr[ X >=0 ? X + 1 : X]`, duplicating the inner expression three times. With N levels of nested array indexing (e.g. `arr[arr[arr[...]]]`) the generated SQL grows as `3^N` because the inner expression is itself the output of the previous level. The `create_parser_fuzzer` hit this with ~30 unbalanced `[` characters and OOMed at 6+ GB of RSS. Bind the inner expression to a SQL alias once and reuse it, so the output grows linearly in the input depth instead of exponentially. This mirrors how `formatTimespanSQL` avoids the same problem. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105121&sha=e751e896f794f50455eabbaf3359e72aeb681da3&name_0=PR PR: ClickHouse#105121 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
m-selmi
left a comment
There was a problem hiding this comment.
Probably obvious, but seems like array_to_string in Postgres can take a null replacing text, these would still require a rewrite
|
Yes, that's true, but in this change, we don't aim for 100% compatibility. At least this function is non-standard. |
PostgreSQL's
array_to_string(arr, sep)matches ClickHouse's existingarrayStringConcat(arr, sep)exactly. Adding a case-insensitive alias spares PostgreSQL-dialect workloads a rewrite.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Added
array_to_stringas a case-insensitive alias ofarrayStringConcatfor PostgreSQL compatibility.Documentation entry for user-facing changes