Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support NULL in function transform. #40493

Closed
wants to merge 1 commit into from

Conversation

amosbird
Copy link
Collaborator

@amosbird amosbird commented Aug 22, 2022

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Support NULL in function transform. This fixes #2655 , #9596 , #38666

This helps evaluate TPC-DS query 39 correctly.

This PR also removes most boilerplate code introduced in #31839 .

Current implementation is not ideal w.r.t performance since nullable checks are added even in not null branches. It's to avoid code bloating.

There is also a proposal to implement a general version of transform #37654 . However there isn't a clear evidence that it will outperform current implementation.

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@robot-ch-test-poll robot-ch-test-poll added the pr-improvement Pull request with some product improvements label Aug 22, 2022
@amosbird amosbird force-pushed the better-transform branch 2 times, most recently from 40e0da0 to 049c356 Compare August 23, 2022 03:08
@amosbird
Copy link
Collaborator Author

Also support DateTime64 in transform. This fixes #32387

@amosbird
Copy link
Collaborator Author

Stress test (thread) — Sanitizer assert (in stderr.log)

https://s3.amazonaws.com/clickhouse-test-reports/40493/3a0096a384bc42bae2d423e60260cc267a4aa982/stress_test__thread_/stderr.log

==672==WARNING: invalid path to external symbolizer!
==672==WARNING: Failed to use and restart external symbolizer!

It's somehow related to MergeMutate.

Stress test (debug) — OOM killer (or signal 9) in clickhouse-server.log

Doesn't seem related

@nickitat nickitat self-assigned this Aug 24, 2022
@nickitat
Copy link
Member

Stress test (thread) — Sanitizer assert (in stderr.log)

https://s3.amazonaws.com/clickhouse-test-reports/40493/3a0096a384bc42bae2d423e60260cc267a4aa982/stress_test__thread_/stderr.log

==672==WARNING: invalid path to external symbolizer! ==672==WARNING: Failed to use and restart external symbolizer!

It's somehow related to MergeMutate.

I guess, the main issue is "WARNING: ThreadSanitizer: data race (pid=672)"

@nickitat
Copy link
Member

looks like we have almost zero perf tests for transform. let's add some.

src/Functions/transform.cpp Outdated Show resolved Hide resolved
src/Functions/transform.cpp Outdated Show resolved Hide resolved
src/Functions/transform.cpp Outdated Show resolved Hide resolved
src/Functions/transform.cpp Show resolved Hide resolved
@amosbird
Copy link
Collaborator Author

amosbird commented Aug 25, 2022

looks like we have almost zero perf tests for transform. let's add some.

Done.

I guess, the main issue is "WARNING: ThreadSanitizer: data race (pid=672)"

I don't know why llvm-symbolizer is broken and there is no evidence what's going wrong.

@amosbird
Copy link
Collaborator Author

Performance Comparison [4/4] — 2 errors, 2 faster, 12 slower, 3 unstable

most significant difference is : +1.301x . Maybe it worth adding nullable specializations (doubling the code size to avoid performance degradation for not-null cases)

@nickitat
Copy link
Member

I don't know why llvm-symbolizer is broken and there is no evidence what's going wrong.

fix is on the way afaiu #40608

most significant difference is : +1.301x . Maybe it worth adding nullable specializations (doubling the code size to avoid performance degradation for not-null cases)

we need to fix only the hot loops. maybe it is possible to put null-related code under if constexpr(nullable)?

@amosbird
Copy link
Collaborator Author

we need to fix only the hot loops. maybe it is possible to put null-related code under if constexpr(nullable)?

That's what I mean by "double the code size". The hot code is a bloated template. Adding if constexpr will make it generating x2 more code

@nickitat
Copy link
Member

we need to fix only the hot loops. maybe it is possible to put null-related code under if constexpr(nullable)?

That's what I mean by "double the code size". The hot code is a bloated template. Adding if constexpr will make it generating x2 more code

I don't see a big issue here, since they will appear only in the single translation unit. for me the bigger concern is the source code complexity

@amosbird
Copy link
Collaborator Author

Integration tests (asan) [1/3] — fail: 1, passed: 654, flaky: 0

=================================== FAILURES ===================================
_____________________________ test_default_column ______________________________
[gw4] linux -- Python 3.8.10 /usr/bin/python3

    def test_default_column():
        node1.query(
            "CREATE TABLE dist ON CLUSTER 'test_cluster' (x Int32, y Int32 DEFAULT x + 100, z Int32 DEFAULT x + y) ENGINE = Distributed('test_cluster', currentDatabase(), local, y)"
        )
        node1.query(
            "CREATE TABLE local ON CLUSTER 'test_cluster' (x Int32, y Int32 DEFAULT x + 200, z Int32 DEFAULT x - y) ENGINE = MergeTree() ORDER BY y"
        )
    
        for insert_sync in [0, 1]:
            settings = {"insert_distributed_sync": insert_sync}
    
            # INSERT INTO TABLE dist (x)
            node1.query("TRUNCATE TABLE local ON CLUSTER 'test_cluster'")
            node1.query(
                "INSERT INTO TABLE dist (x) VALUES (1), (2), (3), (4)", settings=settings
            )
            node1.query("SYSTEM FLUSH DISTRIBUTED dist")
            assert node1.query("SELECT x, y, z FROM local") == TSV(
                [[2, 102, 104], [4, 104, 108]]
            )
            assert node2.query("SELECT x, y, z FROM local") == TSV(
                [[1, 101, 102], [3, 103, 106]]
            )
>           assert node1.query("SELECT x, y, z FROM dist") == TSV(
                [[2, 102, 104], [4, 104, 108], [1, 101, 102], [3, 103, 106]]
            )
E           AssertionError: assert '1\t101\t102\n3\t103\t106\n2\t102\t104\n4\t104\t108\n' == 2	102	104\n4	104	108\n1	101	102\n3	103	106
E            +  where '1\t101\t102\n3\t103\t106\n2\t102\t104\n4\t104\t108\n' = <bound method ClickHouseInstance.query of <helpers.cluster.ClickHouseInstance object at 0x7f5e0067f040>>('SELECT x, y, z FROM dist')
E            +    where <bound method ClickHouseInstance.query of <helpers.cluster.ClickHouseInstance object at 0x7f5e0067f040>> = <helpers.cluster.ClickHouseInstance object at 0x7f5e0067f040>.query
E            +  and   2	102	104\n4	104	108\n1	101	102\n3	103	106 = TSV([[2, 102, 104], [4, 104, 108], [1, 101, 102], [3, 103, 106]])

Doesn't seem related.

ClickHouse build check — 19/21 artifact groups are OK

Don't know what's going wrong.

Stress test (thread) — Sanitizer assert (in stderr.log)

Same error as before.

Stateless tests (debug) [1/3] — Tests are not finished, fail: 1, passed: 22, skipped: 2

2022.08.26 00:38:48.424169 [ 632 ] {} <Trace> BaseDaemon: Received signal 15
2022.08.26 00:38:48.424290 [ 632 ] {} <Information> Application: Received termination signal (Terminated)
2022.08.26 00:38:48.424396 [ 629 ] {} <Debug> Application: Received termination signal.

Don't know why.

@amosbird amosbird force-pushed the better-transform branch 5 times, most recently from 6b58557 to f0b7222 Compare August 30, 2022 01:02
@amosbird
Copy link
Collaborator Author

@amosbird
Copy link
Collaborator Author

@nickitat Friendly ping.

@nickitat
Copy link
Member

let's fix the build and make sure there is no issues with correctness and perf

@amosbird
Copy link
Collaborator Author

transform.query6.prewarm0: DB::Exception: First argument and elements of array of second argument of function transform must have compatible types: both numeric or both strings.: While processing transform(rand(10), [2, 4, NULL], [20, 40, 60]). Stack trace:

It's because without this PR the query won't work.

SELECT transform(rand(10), [2, 4, 6], [20, null, 60], null) FROM numbers(10000000) FORMAT Null

It's much slower because without this PR the query result is incorrect.

SELECT transform(rand(10), [2, 4, 6], [20, 40, 60]) FROM numbers(10000000) FORMAT Null

It's slightly slower and I think it's unrelated, because I did some local verification:

Before:

localhost:7000, queries 1079, QPS: 12.609, RPS: 126184662.472, MiB/s: 962.713, result RPS: 0.000, result MiB/s: 0.000.

0.000% 0.031 sec.
10.000% 0.032 sec.
20.000% 0.032 sec.
30.000% 0.032 sec.
40.000% 0.033 sec.
50.000% 0.033 sec.
60.000% 0.034 sec.
70.000% 0.058 sec.
80.000% 0.116 sec.
90.000% 0.162 sec.
95.000% 0.263 sec.
99.000% 0.524 sec.
99.900% 1.486 sec.
99.990% 1.595 sec.

After:

localhost:7000, queries 1550, QPS: 15.328, RPS: 153392659.079, MiB/s: 1170.293, result RPS: 0.000, result MiB/s: 0.000.

0.000% 0.031 sec.
10.000% 0.031 sec.
20.000% 0.031 sec.
30.000% 0.032 sec.
40.000% 0.032 sec.
50.000% 0.032 sec.
60.000% 0.033 sec.
70.000% 0.047 sec.
80.000% 0.089 sec.
90.000% 0.145 sec.
95.000% 0.203 sec.
99.000% 0.365 sec.
99.900% 0.463 sec.
99.990% 0.522 sec.

@amosbird
Copy link
Collaborator Author

The clang-15 ubsan build failed. I guess it's because the translation unit now becomes too large. Except that everything looks good. @nickitat Could you take another look? I don't know what's the proper way of resolving this build error and there is no error logs.

@nickitat
Copy link
Member

The clang-15 ubsan build failed. I guess it's because the translation unit now becomes too large. Except that everything looks good. @nickitat Could you take another look? I don't know what's the proper way of resolving this build error and there is no error logs.

I'm also ok with this pr as soon as CI became happy. let's be optimistic and hope that retry of the ubsan build will succeed (let's just wait).

@nickitat
Copy link
Member

unfortunately looks like that only complete rerun could help

@amosbird
Copy link
Collaborator Author

unfortunately looks like that only complete rerun could help

Maybe we can add some compiler flags for transform.cpp?

@nickitat
Copy link
Member

yes, I think we could. but I would like to see the errors first. maybe you could try to build locally with ubsan and see what will happen?

@amosbird
Copy link
Collaborator Author

amosbird commented Nov 17, 2022

maybe you could try to build locally with ubsan and see what will happen?

When compiling this TU, clang++ exits without any log messages after running for 15 minutes. I guess there is nothing we can do.

@amosbird amosbird marked this pull request as draft November 17, 2022 18:45
@nickitat
Copy link
Member

maybe splitting into multiple cpp files will help. ideally would be to simplify the code somehow.

@alexey-milovidov
Copy link
Member

@amosbird we still need it.

@amosbird
Copy link
Collaborator Author

amosbird commented Jun 5, 2023

I will try if clang 16 works

@alexey-milovidov
Copy link
Member

Merging in #51351.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected result when using CASE operator or transform function with NULL
4 participants