Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the tuple element names as column names in untuple() #55123

Merged
merged 7 commits into from Oct 8, 2023

Conversation

garcher22
Copy link
Contributor

@garcher22 garcher22 commented Sep 29, 2023

Function untuple() currently ignores the element names of named tuples and generates ugly result column names:

:) SELECT untuple(JSONExtract('{"key": "value"}', 'Tuple(key String)'));

┌─tupleElement(JSONExtract('{"key": "value"}', 'Tuple(key String)'), 1)─┐
│ value                                                                 │
└───────────────────────────────────────────────────────────────────────┘

Why would anyone want column names like this? And there's effectively no way to change these names when using untuple(). Let's respect the user intention and use the explicitly specified tuple element names (new behavior):

:) SELECT untuple(JSONExtract('{"key": "value"}', 'Tuple(key String)'));

┌─key───┐
│ value │
└───────┘

Interestingly, when the untuple() function itself has an alias, the result column names are already usable as of now but with an index as the column name instead of the element alias. This behavior is retained by this PR for backward compatibility:

:) SELECT untuple(JSONExtract('{"key": "value"}', 'Tuple(key String)')) a;

┌─a.1───┐
│ value │
└───────┘

When the element names are not specified, also keep the old behavior (ugly name):

:) SELECT untuple(JSONExtract('{"key": "value"}', 'Tuple(String)'));

┌─tupleElement(JSONExtract('{"key": "value"}', 'Tuple(String)'), 1)─┐
│ value                                                             │
└───────────────────────────────────────────────────────────────────┘

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

When function "untuple()" is now called on a tuple with named elements and itself has an alias (e.g. "select untuple(tuple(1)::Tuple(element_alias Int)) AS untuple_alias"), then the result column name is now generated from the untuple alias and the tuple element alias (in the example: "untuple_alias.element_alias").

@CLAassistant
Copy link

CLAassistant commented Sep 29, 2023

CLA assistant check
All committers have signed the CLA.

@UnamedRus
Copy link
Contributor

It would be also great to fix tupleToNameValuePairs back
#36773

@rschu1ze rschu1ze self-assigned this Sep 29, 2023
@garcher22
Copy link
Contributor Author

It would be also great to fix tupleToNameValuePairs back #36773

@UnamedRus I suppose the root caue is tuple() losing the column aliases, it was removed here 6cbdc6a
but I have no idea why, or how to bring it back, or why it's a wontfix. @CurtizJ maybe you know?

@garcher22
Copy link
Contributor Author

It would be also great to fix tupleToNameValuePairs back #36773

@UnamedRus I suppose the root caue is tuple() losing the column aliases, it was removed here 6cbdc6a but I have no idea why, or how to bring it back, or why it's a wontfix. @CurtizJ maybe you know?

Thinking about it, it would make perfect sense with this PR especially, untuple(tuple(a, b)) would be a noop. The named tuple elements are nice and useful, but there are some unfortunate inconsistencies here and there.

@garcher22
Copy link
Contributor Author

My laptop is called lg, so I get complaints about undocumented new function localhostamma from 02415_all_new_functions_must_be_documented 🤣

#27093 @amosbird @qoega fyi

@garcher22
Copy link
Contributor Author

garcher22 commented Sep 29, 2023

I already have 11 messages in my inbox about "Run cancelled: CherryPick - master (613f8db)", not sure what is going on? Here's a link to one of the runs: https://github.com/garcher22/ClickHouse/actions/runs/6356562956

@Felixoid maybe you can help me as the main author of that workflow

upd.: looks like it somehow got enabled on my fork and was failing, although I don't remember doing anything for it... for now I disabled it manually.

@alexey-milovidov
Copy link
Member

This sounds dangerous - it previous led to a bug: https://github.com/ClickHouse/ClickHouse/pull/26179/files

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Sep 30, 2023
@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Sep 30, 2023
@robot-clickhouse
Copy link
Member

robot-clickhouse commented Sep 30, 2023

This is an automated comment for commit 558b2ff with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR✅ success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker image for serversThe check to build and optionally push the mentioned image to docker hub✅ success
Docs CheckBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Mergeable CheckChecks if all other necessary checks are successful✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
Push to DockerhubThe check for building and pushing the CI related docker images to docker hub✅ success
SQLTestThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
SQLancerFuzzing tests that detect logical bugs with SQLancer tool✅ success
SqllogicRun clickhouse on the sqllogic test set against sqlite and checks that all statements are passed✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Style CheckRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Check nameDescriptionStatus
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests❌ failure
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors❌ failure
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts❌ failure

@amosbird
Copy link
Collaborator

I believe it's better to treat untuple as a JOIN variant, meaning it introduces another set of columns within the namespace of a table. We can enforce assigning an alias to untuple, similar to what we did when joining subqueries.

@alexey-milovidov
Copy link
Member

It will be nice to support tuple.* with Analyzer.

@garcher22
Copy link
Contributor Author

garcher22 commented Sep 30, 2023

This sounds dangerous - it previous led to a bug: https://github.com/ClickHouse/ClickHouse/pull/26179/files

Right, it fails:

:)
SELECT
    untuple(CAST((1., 2), 'Tuple(a Int, b Int)')),
    untuple(CAST((NULL, *), 'Tuple(a Nullable(Int), b Int)'))
FROM
(
    SELECT 123
)

Received exception:
Code: 352. DB::Exception: Block structure mismatch in (columns with identical name must have identical structure) stream: different types:
a Int32 Const(size = 0, Int32(size = 1))
a Nullable(Int32) Const(size = 0, Nullable(size = 1, Int32(size = 1), UInt8(size = 1))). (AMBIGUOUS_COLUMN_NAME)

This should have been an assertion failure though I guess?

I'm also getting some other weird errors even without my changes:

select untuple(tuple(1)::Tuple(a Int)), untuple(tuple(null)::Tuple(Nullable(a Int))) format TSVWithNames;

Received exception:
Code: 223. DB::Exception: Unexpected AST element for data type.: While processing CAST(tuple(1), 'Tuple(a Int)').1, untuple(CAST(tuple(NULL), 'Tuple(Nullable(a Int))')). (UNEXPECTED_AST_STRUCTURE)

No idea what it is, probably also an assertion failure?

Anyway, this looks like too much work. What about a simpler change, use tuple element names only when the untuple alias is set? This will still make the column names usable for me, and won't have this problem. I updated the PR to implement this change instead.

@garcher22

This comment was marked as outdated.

@Felixoid

This comment was marked as outdated.

@alexey-milovidov

This comment was marked as outdated.

@rschu1ze rschu1ze changed the title Use the tuple element names as column names in untuple() Use the tuple element names as column names in untuple() Oct 8, 2023
@rschu1ze
Copy link
Member

rschu1ze commented Oct 8, 2023

ClickHouse Integration Tests (release) [2/4]: #52554
ClickHouse Stress Test (msan): #55148

@rschu1ze rschu1ze merged commit d4e7fa2 into ClickHouse:master Oct 8, 2023
274 of 282 checks passed
@alexey-milovidov
Copy link
Member

@rschu1ze, Doesn't it break query analysis?
I remember that last time I removed the untuple function because it was impossible for it to work correctly.

ClickHouse Stress Test (msan)

Are you 100% sure it is not related to this pull request?

@alexey-milovidov
Copy link
Member

@rschu1ze even if the memory sanitizer issue is not related to this pull request, its existence means that currently our repository is in a non-acceptable state, and nothing should be done other than fixing it.

@rschu1ze
Copy link
Member

rschu1ze commented Oct 9, 2023

@rschu1ze, Doesn't it break query analysis?
I remember that last time I removed the untuple function because it was impossible for it to work correctly.

I extended the tests (#55425) and I am pretty sure that this particular PR does not add new problems. There is however a problem with untuple() and the analyzer in general which existed before this PR already, I opened #55426. If the problem cannot be fixed easily, then I'd be okay with removing untuple - the same can be achieved with sub-index access (tuple.1) syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants