Skip to content

Show ranges in explain indexes#79938

Merged
nickitat merged 9 commits intoClickHouse:masterfrom
cwurm:explain_indexes_show_ranges
Jul 19, 2025
Merged

Show ranges in explain indexes#79938
nickitat merged 9 commits intoClickHouse:masterfrom
cwurm:explain_indexes_show_ranges

Conversation

@cwurm
Copy link
Copy Markdown
Member

@cwurm cwurm commented May 7, 2025

When investigating query performance, it can be very useful to see how many ranges ClickHouse will read from. Today, this is shown in a log line in system.text_log, but we can have it earlier, before executing the query in EXPLAIN indexes = 1.

For example, it is useful for a case like this where the number of granules to be read is almost the same, but the number of ranges is vastly different:

create table t1 (x int, y int, z int) order by (x, y);
create table t2 (x int, y int, z int) order by (y, x);

insert into t1 select number % 100, number % 100, number from numbers(100_000_000);
insert into t2 select * from t1;

optimize table t1 final;
optimize table t2 final;

explain indexes = 1 select avg(z) from t1 where y in (select number * 10 from numbers(10));

    ┌─explain────────────────────────────────────────────────┐
 1. │ CreatingSets (Create sets before main query execution) │
 2. │   Expression ((Project names + Projection))            │
 3. │     Aggregating                                        │
 4. │       Expression (Before GROUP BY)                     │
 5. │         Expression                                     │
 6. │           ReadFromMergeTree (default.t1)               │
 7. │           Indexes:                                     │
 8. │             PrimaryKey                                 │
 9. │               Keys:                                    │
10. │                 y                                      │
11. │               Condition: (y in 10-element set)         │
12. │               Parts: 1/113. │               Granules: 1310/1220814. │               Ranges: 90                               │
    └────────────────────────────────────────────────────────┘
    
explain indexes = 1 select avg(z) from t2 where y in (select number * 10 from numbers(10));

    ┌─explain────────────────────────────────────────────────┐
 1. │ CreatingSets (Create sets before main query execution) │
 2. │   Expression ((Project names + Projection))            │
 3. │     Aggregating                                        │
 4. │       Expression (Before GROUP BY)                     │
 5. │         Expression                                     │
 6. │           ReadFromMergeTree (default.t2)               │
 7. │           Indexes:                                     │
 8. │             PrimaryKey                                 │
 9. │               Keys:                                    │
10. │                 y                                      │
11. │               Condition: (y in 10-element set)         │
12. │               Parts: 1/113. │               Granules: 1230/1220814. │               Ranges: 10                               │
    └────────────────────────────────────────────────────────┘

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Show the number of ranges to be read in the output of EXPLAIN indexes = 1.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented May 7, 2025

Workflow [PR], commit [c651977]

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label May 7, 2025
@cwurm
Copy link
Copy Markdown
Member Author

cwurm commented May 7, 2025

Interestingly, it can show fewer ranges than expected (like 1 range for 2 parts below) because of things like the query condition cache. I need to think about this a bit.

    ┌─explain─────────────────────────────────────────────────┐
 1. │ Expression ((Project names + Projection))               │
 2. │   Aggregating                                           │
 3. │     Expression (Before GROUP BY)                        │
 4. │       Expression                                        │
 5. │         ReadFromMergeTree (default.t_skip_index_insert) │
 6. │         Indexes:                                        │
 7. │           Skip                                          │
 8. │             Name: idx_a                                 │
 9. │             Description: minmax GRANULARITY 1           │
10. │             Parts: 2/2                                  │
11. │             Granules: 50/50                             │
12. │           Skip                                          │
13. │             Name: idx_b                                 │
14. │             Description: set GRANULARITY 1              │
15. │             Parts: 2/2                                  │
16. │             Granules: 50/50                             │
17. │             Ranges: 1                                   │
    └─────────────────────────────────────────────────────────┘

@nickitat nickitat self-assigned this May 14, 2025
@nickitat
Copy link
Copy Markdown
Member

because of things like the query condition cache

they could be disabled.
pls update reference files.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jun 17, 2025

Dear @nickitat, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jun 22, 2025

Workflow [PR], commit [9a6a3d9]

Summary:

job_name test_name status info comment
Stateless tests (amd_debug, distributed plan, s3 storage) failure
03352_concurrent_rename_alter FAIL

@cwurm
Copy link
Copy Markdown
Member Author

cwurm commented Jul 19, 2025

@nickitat I finally found time to fix all the test outputs. The failure in 03352_concurrent_rename_alter looks unrelated.

@cwurm cwurm requested a review from nickitat July 19, 2025 16:24
@nickitat nickitat enabled auto-merge July 19, 2025 17:14
@nickitat nickitat added this pull request to the merge queue Jul 19, 2025
Merged via the queue into ClickHouse:master with commit ab73363 Jul 19, 2025
120 of 123 checks passed
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 19, 2025
@rschu1ze
Copy link
Copy Markdown
Member

Gentlemen, please don't forget to update the docs. Did that here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants