Enhance ClickHouse Profile: generate a uniq id for steps and processors #63518

qhsong · 2024-05-08T11:41:03Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

uniform step and pipeline

Clickhouse current profiling are sometimes confuse for me:

For explain plan we got step name
For explain pipeline we got processor name
For system.processor_profile_log/system.opentelemetry_span_log, we got an pointer address

When we need analyze a complex query with dup name, it's hard to identify those things.
I think we need generate an uniq ID for every processor and step which should meaningful and can not change for every query. I use ${NAME}_${INDEX} patten as uniq ID style. ${NAME} means step/processor name, ${INDEX} is generated by generated time.

After this PR, for a query select * from t1 as t join t1 as t2 on t.a=t2.a where t.a=1

explain

  ┌─explain───────────────────────────────────────────┐
1. │ Expression_20 ((Project names + (Projection + ))) │
2. │   Join_6 (JOIN FillRightFirst)                    │
3. │     Expression_21                                 │
4. │       ReadFromMergeTree_0 (default.t1)            │
5. │     Expression_22                                 │
6. │       ReadFromMergeTree_3 (default.t1)            │
   └───────────────────────────────────────────────────┘

explain pipeline

    ┌─explain──────────────────────────────────────────────────────────────────┐
 1. │ (Expression_20)                                                          │
 2. │ ExpressionTransform                                                      │
 3. │   (Join_6)                                                               │
 4. │   JoiningTransform 2 → 1                                                 │
 5. │     (Expression_21)                                                      │
 6. │     ExpressionTransform                                                  │
 7. │       (ReadFromMergeTree_0)                                              │
 8. │       MergeTreeSelect(pool: ReadPoolInOrder, algorithm: InOrder) 0 → 1   │
 9. │     (Expression_22)                                                      │
10. │     FillingRightJoinSide                                                 │
11. │       ExpressionTransform                                                │
12. │         (ReadFromMergeTree_3)                                            │
13. │         MergeTreeSelect(pool: ReadPoolInOrder, algorithm: InOrder) 0 → 1 │
    └──────────────────────────────────────────────────────────────────────────┘

select id, name,parent_ids,plan_step from system.processors_profile_log;

     ┌─id──────────────────────────┬─name────────────────────┬─parent_ids──────────────────────┬─plan_step──────────┐
  6. │ SourceFromSingleChunk_1     │ SourceFromSingleChunk   │ ['ExpressionTransform_2']       │                    │
  7. │ ExpressionTransform_2       │ ExpressionTransform     │ ['LimitsCheckingTransform_3']   │ Expression_19      │
  8. │ LimitsCheckingTransform_3   │ LimitsCheckingTransform │ ['LazyOutputFormat_4']          │                    │
  9. │ NullSource_5                │ NullSource              │ ['LazyOutputFormat_4']          │                    │
 10. │ NullSource_6                │ NullSource              │ ['LazyOutputFormat_4']          │                    │
 11. │ LazyOutputFormat_4          │ LazyOutputFormat        │ []                              │                    │
 12. │ SourceFromSingleChunk_18    │ SourceFromSingleChunk   │ ['FilterTransform_19']          │                    │
 13. │ FilterTransform_19          │ FilterTransform         │ ['ExpressionTransform_20']      │ Filter_416         │
 14. │ ExpressionTransform_20      │ ExpressionTransform     │ ['DistinctTransform_83']        │ Expression_167     │
 15. │ SourceFromSingleChunk_21    │ SourceFromSingleChunk   │ ['FilterTransform_22']          │                    │
 16. │ FilterTransform_22          │ FilterTransform         │ ['ExpressionTransform_23']      │ Filter_417         │
 17. │ ExpressionTransform_23      │ ExpressionTransform     │ ['DistinctTransform_84']        │ Expression_433     │

select operation_name from system.opentelemetry_span_log;

    ┌─operation_name───────────────────────────────────────────────┐
 1. │ DB::InterpreterSelectQueryAnalyzer::execute()                │
 2. │ ThreadPoolRead                                               │
 3. │ ThreadPoolRead                                               │
 4. │ MergeTreeSource::tryGenerate()                               │
 5. │ MergeTreeSelect(pool: ReadPoolInOrder, algorithm: InOrder)_0 │
 6. │ ExpressionTransform_1                                        │
 7. │ ExpressionTransform_2                                        │
 8. │ LimitsCheckingTransform_3                                    │
 9. │ LazyOutputFormat_4                                           │
10. │ MergeTreeSource::tryGenerate()                               │
11. │ MergeTreeSelect(pool: ReadPoolInOrder, algorithm: InOrder)_0 │
12. │ NullSource_5                                                 │
13. │ NullSource_6                                                 │
14. │ LazyOutputFormat_4                                           │
15. │ PipelineExecutor::execute()                                  │
16. │ QueryPullPipeEx                                              │
17. │ query                                                        │
18. │ TCPHandler                                                   │
    └──────────────────────────────────────────────────────────────┘

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

Modify your CI run

NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step

Include tests (required builds will be added automatically):

Exclude tests:

Extra options:

do not test (only style check)
disable merge-commit (no merge from master before tests)
disable CI cache (job reuse)

Only specified batches in multi-batch jobs:

1
2
3
4

CLAassistant · 2024-05-08T11:41:08Z

All committers have signed the CLA.

robot-ch-test-poll1 · 2024-05-08T11:57:29Z

This is an automated comment for commit ee5d22c with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check name	Description	Status
CI running	A meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR	⏳ pending
ClickHouse build check	Builds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process	❌ failure
Mergeable Check	Checks if all other necessary checks are successful	❌ failure

Successful checks

Check name	Description	Status
A Sync	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Docs check	Builds and tests the documentation	✅ success
Fast test	Normally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here	✅ success
PR Check	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Style check	Runs a set of checks to keep the code style clean. If some of tests failed, see the related log from the report	✅ success

qhsong · 2024-05-08T12:11:12Z

Not sure this idea is work for ClickHouse, if worked, maybe I will add more test case for this.

src/Interpreters/ProcessorsProfileLog.cpp

nickitat · 2024-05-08T23:26:05Z

for explain plan and pipeline I don't think there is a lot of confusion, since they already contain formatting that displays hierarchy. also we have a lot of tests that check plan or pipeline specifically. they all will break.
speaking of processors_profile_log - fully agree.
maybe let's implement it only for processors_profile_log and opentelemetry_span_log?

qhsong · 2024-05-09T02:25:04Z

for explain plan and pipeline I don't think there is a lot of confusion, since they already contain formatting that displays hierarchy. also we have a lot of tests that check plan or pipeline specifically. they all will break. speaking of processors_profile_log - fully agree. maybe let's implement it only for processors_profile_log and opentelemetry_span_log?

In fact, I believe that the explain plan plays a crucial role in this PR. When we use processor_profile_log or open telemetry_span_log to identify a specific process or step with slow execution, how can we determine the corresponding details for this steps? Therefore, I think it's essential for us.

I have observed that it break some stateless test cases. I believe it's worthwhile to modify the test case content. It's not hard to change.

UnamedRus · 2024-05-09T18:24:12Z

for explain plan and pipeline

I actually think, that it's make more sense to add them (ids) in json output for those statements, which is more suitable format for consumption by program, and introduction of new field simpler here.

qhsong · 2024-05-10T03:29:57Z

for explain plan and pipeline

I actually think, that it's make more sense to add them (ids) in json output for those statements, which is more suitable format for consumption by program, and introduction of new field simpler here.

I also add a field in explain json. I will fix it later.

qhsong · 2024-05-15T13:37:24Z

Summary this feature:

Add "Node Id" in explain json=1

            {
              "Node Type": "Expression",
              "Node Id": "Expression_22",
              "Plans": [
                {
                  "Node Type": "ReadFromMergeTree",
                  "Node Id": "ReadFromMergeTree_2",
                  "Description": "default.t1"
                }
              ]
            }

Add Step id and iprocessor id in explain pipeline graph=1
Add processor_uniq_id and step_uniq_id in processors_profile_log
Change Processor_id in opentelemetry_span_log

src/Interpreters/ProcessorsProfileLog.cpp

src/Interpreters/ProcessorsProfileLog.h

src/Interpreters/executeQuery.cpp

src/QueryPipeline/QueryPipelineBuilder.cpp

src/Processors/QueryPlan/QueryPlan.h

nickitat · 2024-05-15T21:16:31Z

src/Interpreters/Context.h

@@ -1336,7 +1336,14 @@ class Context: public ContextData, public std::enable_shared_from_this<Context>
    std::shared_ptr<Clusters> getClustersImpl(std::lock_guard<std::mutex> & lock) const;

    /// Throttling
+
+    size_t step_count = 0;


let's not put it inside Context. e.g. it could be a static data member of IQueryPlanStep. the same for IProcessor

Put in context can make explain result more stable, every time we explain same query will get same id.
If we put in static data, we can not get stable result, so I put it in context.

I think we better have unstable results than polluting context with this random counters

For stateless case 01786_explain_merge_tree, if not stable, the result may not have fixed result. So How to fix this case? just disable json output?

just disable json output

I guess it makes no difference to the test logic what output format we use

Just worries background thread call Plan. I will remove json output.

I think we better have unstable results than polluting context with this random counters

Recently I think stable result is a important feature, If we worry about polluting context, how about put a point of int in CurrentThread::ThreadStatus. This should not polluting context and get a better stable index.

Recently I think stable result is a important feature

what value exactly do you see in it?

It's easy to Debugging.
When I found some processors slow, we can use explain query to identify its' steps. if we use static value , It hard to do it if query is complex.

If you think it is not a important, I will change it to static value.

mistake

qhsong force-pushed the dev/profile_uniq_id branch from 74135e0 to 74e07ca Compare May 8, 2024 11:43

UnamedRus mentioned this pull request May 8, 2024

Better query introspection #41744

Closed

nickitat self-assigned this May 8, 2024

robot-ch-test-poll1 added the pr-improvement Pull request with some product improvements label May 8, 2024

nickitat added the can be tested Allows running workflows for external contributors label May 8, 2024

qhsong force-pushed the dev/profile_uniq_id branch from 74e07ca to 7c9afe1 Compare May 8, 2024 12:10

qhsong force-pushed the dev/profile_uniq_id branch from 7c9afe1 to 09dada1 Compare May 8, 2024 12:31

nickitat reviewed May 8, 2024

View reviewed changes

src/Interpreters/ProcessorsProfileLog.cpp Outdated Show resolved Hide resolved

qhsong force-pushed the dev/profile_uniq_id branch from 09dada1 to 8be1aec Compare May 15, 2024 13:06

nickitat previously approved these changes May 15, 2024

View reviewed changes

nickitat self-requested a review May 15, 2024 21:18

qhsong force-pushed the dev/profile_uniq_id branch from 8be1aec to ae16ea5 Compare May 16, 2024 02:44

Add repeatable uniq ID for processor and step

ee5d22c

qhsong force-pushed the dev/profile_uniq_id branch from ae16ea5 to ee5d22c Compare May 16, 2024 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance ClickHouse Profile: generate a uniq id for steps and processors #63518

Enhance ClickHouse Profile: generate a uniq id for steps and processors #63518

qhsong commented May 8, 2024 •

edited

CLAassistant commented May 8, 2024 •

edited

robot-ch-test-poll1 commented May 8, 2024 •

edited by robot-clickhouse-ci-2

qhsong commented May 8, 2024

nickitat commented May 8, 2024

qhsong commented May 9, 2024

UnamedRus commented May 9, 2024

qhsong commented May 10, 2024

qhsong commented May 15, 2024 •

edited

nickitat May 15, 2024

qhsong May 16, 2024

nickitat May 16, 2024

qhsong May 17, 2024

nickitat May 17, 2024

qhsong May 20, 2024

qhsong May 22, 2024

nickitat May 28, 2024

qhsong May 29, 2024

Enhance ClickHouse Profile: generate a uniq id for steps and processors #63518

Are you sure you want to change the base?

Enhance ClickHouse Profile: generate a uniq id for steps and processors #63518

Conversation

qhsong commented May 8, 2024 • edited

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

Include tests (required builds will be added automatically):

Exclude tests:

Extra options:

Only specified batches in multi-batch jobs:

CLAassistant commented May 8, 2024 • edited

robot-ch-test-poll1 commented May 8, 2024 • edited by robot-clickhouse-ci-2

qhsong commented May 8, 2024

nickitat commented May 8, 2024

qhsong commented May 9, 2024

UnamedRus commented May 9, 2024

qhsong commented May 10, 2024

qhsong commented May 15, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qhsong commented May 8, 2024 •

edited

CLAassistant commented May 8, 2024 •

edited

robot-ch-test-poll1 commented May 8, 2024 •

edited by robot-clickhouse-ci-2

qhsong commented May 15, 2024 •

edited