Use the tuple element names as column names in `untuple()` #55123

garcher22 · 2023-09-29T08:54:20Z

Function untuple() currently ignores the element names of named tuples and generates ugly result column names:

:) SELECT untuple(JSONExtract('{"key": "value"}', 'Tuple(key String)'));

┌─tupleElement(JSONExtract('{"key": "value"}', 'Tuple(key String)'), 1)─┐
│ value                                                                 │
└───────────────────────────────────────────────────────────────────────┘

Why would anyone want column names like this? And there's effectively no way to change these names when using untuple(). Let's respect the user intention and use the explicitly specified tuple element names (new behavior):

:) SELECT untuple(JSONExtract('{"key": "value"}', 'Tuple(key String)'));

┌─key───┐
│ value │
└───────┘

Interestingly, when the untuple() function itself has an alias, the result column names are already usable as of now but with an index as the column name instead of the element alias. This behavior is retained by this PR for backward compatibility:

:) SELECT untuple(JSONExtract('{"key": "value"}', 'Tuple(key String)')) a;

┌─a.1───┐
│ value │
└───────┘

When the element names are not specified, also keep the old behavior (ugly name):

:) SELECT untuple(JSONExtract('{"key": "value"}', 'Tuple(String)'));

┌─tupleElement(JSONExtract('{"key": "value"}', 'Tuple(String)'), 1)─┐
│ value                                                             │
└───────────────────────────────────────────────────────────────────┘

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

When function "untuple()" is now called on a tuple with named elements and itself has an alias (e.g. "select untuple(tuple(1)::Tuple(element_alias Int)) AS untuple_alias"), then the result column name is now generated from the untuple alias and the tuple element alias (in the example: "untuple_alias.element_alias").

CLAassistant · 2023-09-29T08:54:25Z

All committers have signed the CLA.

UnamedRus · 2023-09-29T14:26:04Z

It would be also great to fix tupleToNameValuePairs back
#36773

garcher22 · 2023-09-29T16:23:24Z

It would be also great to fix tupleToNameValuePairs back #36773

@UnamedRus I suppose the root caue is tuple() losing the column aliases, it was removed here 6cbdc6a
but I have no idea why, or how to bring it back, or why it's a wontfix. @CurtizJ maybe you know?

garcher22 · 2023-09-29T16:55:53Z

It would be also great to fix tupleToNameValuePairs back #36773

@UnamedRus I suppose the root caue is tuple() losing the column aliases, it was removed here 6cbdc6a but I have no idea why, or how to bring it back, or why it's a wontfix. @CurtizJ maybe you know?

Thinking about it, it would make perfect sense with this PR especially, untuple(tuple(a, b)) would be a noop. The named tuple elements are nice and useful, but there are some unfortunate inconsistencies here and there.

garcher22 · 2023-09-29T19:24:13Z

My laptop is called lg, so I get complaints about undocumented new function localhostamma from 02415_all_new_functions_must_be_documented 🤣

#27093 @amosbird @qoega fyi

garcher22 · 2023-09-29T21:16:07Z

I already have 11 messages in my inbox about "Run cancelled: CherryPick - master (613f8db)", not sure what is going on? Here's a link to one of the runs: https://github.com/garcher22/ClickHouse/actions/runs/6356562956

@Felixoid maybe you can help me as the main author of that workflow

upd.: looks like it somehow got enabled on my fork and was failing, although I don't remember doing anything for it... for now I disabled it manually.

alexey-milovidov · 2023-09-30T01:30:26Z

This sounds dangerous - it previous led to a bug: https://github.com/ClickHouse/ClickHouse/pull/26179/files

robot-clickhouse · 2023-09-30T01:33:11Z

This is an automated comment for commit 558b2ff with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Successful checks

Check name	Description	Status
AST fuzzer	Runs randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help	✅ success
CI running	A meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR	✅ success
ClickHouse build check	Builds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process	✅ success
Compatibility check	Checks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help	✅ success
Docker image for servers	The check to build and optionally push the mentioned image to docker hub	✅ success
Docs Check	Builds and tests the documentation	✅ success
Fast test	Normally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here	✅ success
Flaky tests	Checks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc	✅ success
Install packages	Checks that the built packages are installable in a clear environment	✅ success
Mergeable Check	Checks if all other necessary checks are successful	✅ success
Performance Comparison	Measure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests	✅ success
Push to Dockerhub	The check for building and pushing the CI related docker images to docker hub	✅ success
SQLTest	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
SQLancer	Fuzzing tests that detect logical bugs with SQLancer tool	✅ success
Sqllogic	Run clickhouse on the sqllogic test set against sqlite and checks that all statements are passed	✅ success
Stateful tests	Runs stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Stateless tests	Runs stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Style Check	Runs a set of checks to keep the code style clean. If some of tests failed, see the related log from the report	✅ success
Unit tests	Runs the unit tests for different release types	✅ success

Check name	Description	Status
Integration tests	The integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests	❌ failure
Stress test	Runs stateless functional tests concurrently from several clients to detect concurrency-related errors	❌ failure
Upgrade check	Runs stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts	❌ failure

amosbird · 2023-09-30T05:52:10Z

I believe it's better to treat untuple as a JOIN variant, meaning it introduces another set of columns within the namespace of a table. We can enforce assigning an alias to untuple, similar to what we did when joining subqueries.

alexey-milovidov · 2023-09-30T06:46:32Z

It will be nice to support tuple.* with Analyzer.

garcher22 · 2023-09-30T11:14:23Z

This sounds dangerous - it previous led to a bug: https://github.com/ClickHouse/ClickHouse/pull/26179/files

Right, it fails:

:)
SELECT
    untuple(CAST((1., 2), 'Tuple(a Int, b Int)')),
    untuple(CAST((NULL, *), 'Tuple(a Nullable(Int), b Int)'))
FROM
(
    SELECT 123
)

Received exception:
Code: 352. DB::Exception: Block structure mismatch in (columns with identical name must have identical structure) stream: different types:
a Int32 Const(size = 0, Int32(size = 1))
a Nullable(Int32) Const(size = 0, Nullable(size = 1, Int32(size = 1), UInt8(size = 1))). (AMBIGUOUS_COLUMN_NAME)

This should have been an assertion failure though I guess?

I'm also getting some other weird errors even without my changes:

select untuple(tuple(1)::Tuple(a Int)), untuple(tuple(null)::Tuple(Nullable(a Int))) format TSVWithNames;

Received exception:
Code: 223. DB::Exception: Unexpected AST element for data type.: While processing CAST(tuple(1), 'Tuple(a Int)').1, untuple(CAST(tuple(NULL), 'Tuple(Nullable(a Int))')). (UNEXPECTED_AST_STRUCTURE)

No idea what it is, probably also an assertion failure?

Anyway, this looks like too much work. What about a simpler change, use tuple element names only when the untuple alias is set? This will still make the column names usable for me, and won't have this problem. I updated the PR to implement this change instead.

rschu1ze · 2023-10-08T18:14:24Z

ClickHouse Integration Tests (release) [2/4]: #52554
ClickHouse Stress Test (msan): #55148

alexey-milovidov · 2023-10-08T19:08:19Z

@rschu1ze, Doesn't it break query analysis?
I remember that last time I removed the untuple function because it was impossible for it to work correctly.

ClickHouse Stress Test (msan)

Are you 100% sure it is not related to this pull request?

alexey-milovidov · 2023-10-08T19:09:15Z

@rschu1ze even if the memory sanitizer issue is not related to this pull request, its existence means that currently our repository is in a non-acceptable state, and nothing should be done other than fixing it.

rschu1ze · 2023-10-09T18:57:49Z

@rschu1ze, Doesn't it break query analysis?
I remember that last time I removed the untuple function because it was impossible for it to work correctly.

I extended the tests (#55425) and I am pretty sure that this particular PR does not add new problems. There is however a problem with untuple() and the analyzer in general which existed before this PR already, I opened #55426. If the problem cannot be fixed easily, then I'd be okay with removing untuple - the same can be achieved with sub-index access (tuple.1) syntax.

Use the tuple element names as column names in untuple()

81cbe2a

rschu1ze self-assigned this Sep 29, 2023

add a test

5150af2

alexey-milovidov added the can be tested Allows running workflows for external contributors label Sep 30, 2023

robot-clickhouse added the pr-improvement Pull request with some product improvements label Sep 30, 2023

simpler

a6968a9

garcher22 added 2 commits September 30, 2023 13:27

tabs

b89ff6e

tabs

00da4e0

This comment was marked as outdated.

Sign in to view

Some fixups

8c35ccf

rschu1ze approved these changes Oct 8, 2023

View reviewed changes

This comment was marked as outdated.

Sign in to view

rschu1ze changed the title ~~Use the tuple element names as column names in untuple()~~ Use the tuple element names as column names in untuple() Oct 8, 2023

Fix spelling

558b2ff

rschu1ze merged commit d4e7fa2 into ClickHouse:master Oct 8, 2023
274 of 282 checks passed

This was referenced Oct 9, 2023

Improve tests for untuple() #55425

Merged

untuple() with analyzer can lead to dupliate column names. #55426

Open

UnamedRus mentioned this pull request Nov 13, 2023

untuple function return column names as tuple field names for named tuples. #56676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the tuple element names as column names in `untuple()` #55123

Use the tuple element names as column names in `untuple()` #55123

garcher22 commented Sep 29, 2023 •

edited by rschu1ze

CLAassistant commented Sep 29, 2023 •

edited

UnamedRus commented Sep 29, 2023

garcher22 commented Sep 29, 2023

garcher22 commented Sep 29, 2023

garcher22 commented Sep 29, 2023

garcher22 commented Sep 29, 2023 •

edited

alexey-milovidov commented Sep 30, 2023

robot-clickhouse commented Sep 30, 2023 •

edited by robot-ch-test-poll3

amosbird commented Sep 30, 2023

alexey-milovidov commented Sep 30, 2023

garcher22 commented Sep 30, 2023 •

edited

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

rschu1ze commented Oct 8, 2023

alexey-milovidov commented Oct 8, 2023

alexey-milovidov commented Oct 8, 2023

rschu1ze commented Oct 9, 2023

Use the tuple element names as column names in untuple() #55123

Use the tuple element names as column names in untuple() #55123

Conversation

garcher22 commented Sep 29, 2023 • edited by rschu1ze

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

CLAassistant commented Sep 29, 2023 • edited

UnamedRus commented Sep 29, 2023

garcher22 commented Sep 29, 2023

garcher22 commented Sep 29, 2023

garcher22 commented Sep 29, 2023

garcher22 commented Sep 29, 2023 • edited

alexey-milovidov commented Sep 30, 2023

robot-clickhouse commented Sep 30, 2023 • edited by robot-ch-test-poll3

amosbird commented Sep 30, 2023

alexey-milovidov commented Sep 30, 2023

garcher22 commented Sep 30, 2023 • edited

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

rschu1ze commented Oct 8, 2023

alexey-milovidov commented Oct 8, 2023

alexey-milovidov commented Oct 8, 2023

rschu1ze commented Oct 9, 2023

Use the tuple element names as column names in `untuple()` #55123

Use the tuple element names as column names in `untuple()` #55123

garcher22 commented Sep 29, 2023 •

edited by rschu1ze

CLAassistant commented Sep 29, 2023 •

edited

garcher22 commented Sep 29, 2023 •

edited

robot-clickhouse commented Sep 30, 2023 •

edited by robot-ch-test-poll3

garcher22 commented Sep 30, 2023 •

edited