Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement in order aggregation (optimize_aggregation_in_order) for projections for tables with fully materialized projections #37469

Conversation

azat
Copy link
Collaborator

@azat azat commented May 23, 2022

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Implement in order aggregation (optimize_aggregation_in_order) for fully materialized projections.

Cc: @amosbird @KochetovNicolai

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label May 23, 2022
@azat azat changed the title Implement in order aggregation for projections (optimize_aggregation_in_order) Implement in order aggregation (optimize_aggregation_in_order) for projections for tables fully materialized projections May 24, 2022
@azat azat force-pushed the projections-optimize_aggregation_in_order branch from 6a98909 to bb7712b Compare May 24, 2022 10:26
@azat azat changed the title Implement in order aggregation (optimize_aggregation_in_order) for projections for tables fully materialized projections Implement in order aggregation (optimize_aggregation_in_order) for projections for tables with fully materialized projections May 24, 2022
@azat azat force-pushed the projections-optimize_aggregation_in_order branch from bb7712b to 5c793c6 Compare May 24, 2022 10:29
@azat azat marked this pull request as ready for review May 24, 2022 10:29
@azat azat force-pushed the projections-optimize_aggregation_in_order branch 2 times, most recently from 66b6d88 to 41cb029 Compare May 25, 2022 17:28
@azat azat force-pushed the projections-optimize_aggregation_in_order branch from 41cb029 to 2b513ac Compare May 31, 2022 12:04
@azat
Copy link
Collaborator Author

azat commented Jun 4, 2022

@KochetovNicolai can you please take a look? (or maybe someone else could/want?)

@azat
Copy link
Collaborator Author

azat commented Jun 7, 2022

Failures are unrelated.

Stress test (undefined, actions) — Hung check failed

        "elapsed": 2486.29842776,
        "is_cancelled": 1,
        "is_all_data_sent": 0,
        "read_rows": "500000",
        "read_bytes": "4000000",
        "total_rows_approx": "0",
        "written_rows": "500000",
        "written_bytes": "4000000",
        "memory_usage": "6427070",
        "peak_memory_usage": "6427070",
        "query": "INSERT INTO a FORMAT TSV",
$ pigz -cd clickhouse-server.stress.log.gz | grep -a -e '{11}' 
2022.05.31 19:00:26.118816 [ 2481 ] {11} <Debug> executeQuery: (from [::1]:60892) (comment: 01019_parallel_parsing_cancel.sh) INSERT INTO a FORMAT TSV (stage: Complete)
2022.05.31 19:00:26.163092 [ 2481 ] {11} <Trace> ContextAccess (default): Access granted: INSERT(x) ON test_15.a
2022.05.31 19:00:26.187395 [ 2481 ] {11} <Trace> ContextAccess (default): Access granted: SELECT(x) ON test_15.a
2022.05.31 19:00:26.203316 [ 2481 ] {11} <Trace> ContextAccess (default): Access granted: SELECT(x) ON test_15.a
2022.05.31 19:00:26.235590 [ 2481 ] {11} <Trace> ContextAccess (default): Access granted: SELECT(x) ON test_15.a
2022.05.31 19:00:26.447828 [ 2481 ] {11} <Trace> ContextAccess (default): Access granted: SELECT(x) ON test_15.a
2022.05.31 19:00:26.448177 [ 2481 ] {11} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
#19 0x0000000025199eba in DB::ExecutingInnerQueryFromViewTransform::onGenerate (this=0x7f19547dc428) at ../src/Processors/Transforms/buildPushingToViewsChain.cpp:564

@azat azat force-pushed the projections-optimize_aggregation_in_order branch from 2b513ac to 89bbd30 Compare June 7, 2022 06:03
@azat
Copy link
Collaborator Author

azat commented Jun 7, 2022

Rebased (to fix conflicts with #37543)

@azat
Copy link
Collaborator Author

azat commented Jun 15, 2022

Ping on this one, can someone take a look please? (will rebase after)

Copy link
Collaborator

@amosbird amosbird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only one suggestion. @KochetovNicolai Could you help take a look?

src/Interpreters/Aggregator.cpp Outdated Show resolved Hide resolved
Copy link
Member

@KochetovNicolai KochetovNicolai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok

@KochetovNicolai KochetovNicolai self-assigned this Jun 15, 2022
@azat azat force-pushed the projections-optimize_aggregation_in_order branch from 89bbd30 to 167aaff Compare June 15, 2022 20:48
azat added 7 commits June 16, 2022 09:58
…ctExecutor

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
v2: fill AggregateColumnsConstData only for only_merge
    (fixes 01291_aggregation_in_order and some other tests)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This is required for proper optimize_aggregation_in_order for
projections.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
v2: use real column name instead of aliases from GROUP BY
    Fixes the following error in 01710_projection_aggregation_in_order:

        Not found column a in block. There are only columns: toStartOfHour(ts), sum(value). (NOT_FOUND_COLUMN_IN_BLOCK)
v2.1: Get back support for projected and non-projected parts
v2.2: merge tests and rename
v3: Reduce copy-paste for optimize_aggregation_in_order for projections
v4: rebase on top of QueryPlanResourceHolder
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
After AggregatingStep is used, there is not StrictResize processor,
since there is only one stream.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
…erge

Suggested-by: @amosbird
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
@azat azat force-pushed the projections-optimize_aggregation_in_order branch from 167aaff to 0d4f786 Compare June 16, 2022 07:02
@azat
Copy link
Collaborator Author

azat commented Jun 18, 2022

Stateless tests (thread, actions) [1/3] — Some tests restarted, fail: 1, passed: 1397, skipped: 11

  • 01183_custom_separated_format_http
2022-06-16 12:50:34 curl: (28) Operation timed out after 60001 milliseconds with 0 bytes received

Stateless tests (release, s3 storage, actions) — Tests are not finished, fail: 1, passed: 4146, skipped: 52

Remote query hanged:

$ pigz -cd clickhouse-server.log.gz | fgrep -a -e 02293_grouping_function_group_by -e 749e5b8a-8567-4c5b-8b08-8452e7d5c48a
...
2022.06.16 16:56:47.786592 [ 929 ] {749e5b8a-8567-4c5b-8b08-8452e7d5c48a} <Debug> executeQuery: (from [::ffff:127.0.0.1]:43864, initial_query_id: 06c8328c-d06d-417f-b2b0-f46cd50852d2) (comment: 02293_grouping_function_group_by.sql) SELECT `number`, grouping(`number`, `number` % 2) AS `gr` FROM numbers(10) GROUP BY `number`, `number` % 2 WITH ROLLUP WITH TOTALS ORDER BY `number` ASC, `gr` ASC (stage: WithMergeableState)
2022.06.16 16:56:47.786702 [ 929 ] {749e5b8a-8567-4c5b-8b08-8452e7d5c48a} <Trace> ContextAccess (default): Access granted: CREATE TEMPORARY TABLE ON *.*
2022.06.16 16:56:47.786886 [ 929 ] {749e5b8a-8567-4c5b-8b08-8452e7d5c48a} <Trace> InterpreterSelectQuery: FetchColumns -> WithMergeableState
2022.06.16 16:56:47.787244 [ 1958 ] {749e5b8a-8567-4c5b-8b08-8452e7d5c48a} <Trace> AggregatingTransform: Aggregating
2022.06.16 16:56:47.787260 [ 1958 ] {749e5b8a-8567-4c5b-8b08-8452e7d5c48a} <Trace> Aggregator: An entry for key=15678497920870529023 found in cache: sum_of_sizes=10, median_size=10
2022.06.16 16:56:47.787293 [ 1958 ] {749e5b8a-8567-4c5b-8b08-8452e7d5c48a} <Trace> Aggregator: Aggregation method: keys128
2022.06.16 16:56:47.787310 [ 1958 ] {749e5b8a-8567-4c5b-8b08-8452e7d5c48a} <Debug> AggregatingTransform: Aggregated. 10 to 10 rows (from 90.00 B) in 0.000360196 sec. (27762.663 rows/sec., 244.01 KiB/sec.)
2022.06.16 16:56:47.787315 [ 1958 ] {749e5b8a-8567-4c5b-8b08-8452e7d5c48a} <Trace> Aggregator: Merging aggregated data
2022.06.16 16:56:47.787531 [ 929 ] {749e5b8a-8567-4c5b-8b08-8452e7d5c48a} <Information> executeQuery: Read 10 rows, 80.00 B in 0.00090523 sec., 11046 rows/sec., 86.30 KiB/sec.
2022.06.16 16:56:47.787573 [ 929 ] {749e5b8a-8567-4c5b-8b08-8452e7d5c48a} <Debug> MemoryTracker: Peak memory usage (for query): 1.06 MiB.
...
2022.06.16 17:06:46.272537 [ 41897 ] {8f4979f8-dc0c-4bb9-aae0-bcdf2224bc95} <Debug> executeQuery: (from [::1]:45314) (comment: 02293_grouping_function_group_by.sql) DROP DATABASE test_ya7omt  (stage: Complete)
...
pigz: skipping: clickhouse-server.log.gz: corrupted -- incomplete deflate data
pigz: abort: internal threads error

And trace_log is empty, because it is S3.

@KochetovNicolai KochetovNicolai merged commit b8d27aa into ClickHouse:master Jun 21, 2022
@azat azat deleted the projections-optimize_aggregation_in_order branch June 21, 2022 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants