Add /processors-profile web UI for per-query pipeline heatmap by nikitamikhaylov · Pull Request #104614 · ClickHouse/ClickHouse

nikitamikhaylov · 2026-05-11T17:08:09Z

Inspired by: https://github.com/nickitat/fake_explain_analyze

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added a new built-in web UI at /processors-profile that visualizes the pipeline of any past SELECT query as a heatmap, sourced from system.query_log and system.processors_profile_log. Each processor is colored by its elapsed_us and shows per-processor stats (rows, bytes, wait times) on hover.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Serves a built-in HTML page at /processors_profile (registered next to /jemalloc, same auth model: user/password URL params, inherits CORS from existing WebUIRequestHandler). The page lets the user pick a recent SELECT from system.query_log via an editable WHERE clause, then renders the query's pipeline as a Graphviz DOT SVG (via @viz-js/viz from CDN) with per-processor heatmap colors derived from sum(elapsed_us) in system.processors_profile_log. Hovering a node shows a tooltip with name, step, elapsed/wait times and input/output rows and bytes joined by processor_uniq_id. Inspired by: https://raw.githubusercontent.com/nickitat/fake_explain_analyze/refs/heads/master/explain.py Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-05-11T17:09:23Z

Workflow [PR], commit [505be98]

Summary: ❌

job_name	test_name	status	info	comment
Finish Workflow		FAIL
	python3 ./ci/jobs/scripts/workflow_hooks/feature_docs.py	FAIL		IGNORED

AI Review

Summary

This PR adds a new /processors-profile Web UI for rendering per-processor heatmaps from system.query_log and system.processors_profile_log, and also switches /play graph rendering to embedded Viz.js. I found one new major issue and two still-live prior issues (including one blocker), so this is not ready to merge yet.

Findings

❌ Blockers

[programs/server/js/viz-standalone.js:1] / [src/Server/WebUIRequestHandler.cpp:51] The PR commits and embeds a ~1.4 MiB prebuilt third-party bundle (viz-standalone.js, including Graphviz object code). This violates the repository rule against committing large/binary dependency artifacts and permanently bloats git history.
Suggested fix: do not commit the generated bundle; instead integrate a reproducible build/download step in CI/tests or serve a source-built/minimal artifact generated during build.

⚠️ Majors

[programs/server/processors_profile.html:615] The query picker stores only query_id, and both fetchQueryText and fetchProfileData filter only by query_id = {qid:String}. query_id is not guaranteed to identify a single execution (re-use is possible, and cluster mode can return multiple hosts), so the page can show a graph/profile for the wrong execution.
Suggested fix: carry a stable execution key from the picker (for example host + event_time/event_time_microseconds + initial_query_id/query_id) and use that full key in follow-up reads.
[src/Server/HTTPHandlerFactory.cpp:396] /processors-profile is wired only in addCommonDefaultHandlersFactory; there is no corresponding handler.type branch in createHandlersFactoryFromConfig. Deployments with custom http_handlers cannot configure this endpoint, unlike other Web UI pages.
Suggested fix: add a dedicated config handler type (for example processors_profile) mapped to ProcessorsProfileWebUIRequestHandler in the config-driven handler switch.

Final Verdict

Status: ❌ Block
Minimum required actions:

Remove the committed large prebuilt viz-standalone.js artifact approach and replace it with a compliant delivery path.
Disambiguate profiling target selection so follow-up reads identify one concrete query execution, not just query_id.
Add config-driven handler support for /processors-profile in http_handlers.

So the page sees every replica's local log table and any rotated query_log_N / processors_profile_log_N tables on Cloud / sharded setups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A `Cluster: on/off` button in the header flips the page between `system.*_log` (single-node default) and `clusterAllReplicas('default', merge('system', '^*_log'))` for both `query_log` and `processors_profile_log`. The preference is persisted in `localStorage` and applies to the dropdown list, the query-text lookup, and the profile-data join. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous `'${queryId.replace(/'/g, "\\'")}'` interpolation only escaped single quotes — a backslash in the input would slip through and end the SQL string literal early, allowing injection. Both `query_id` lookups (fetchQueryText and fetchProfileData) now bind the value as a parameterized `{qid:String}`, so the server treats it as a literal of the declared type and any quote/backslash/etc payload stays inert. `runQuery` gained a `params` object argument that's appended to the request URL as `param_<name>=…`, mirroring ClickHouse's standard query parameter mechanism. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

alexey-milovidov

Rename to processors-profile as these paths are more convenient for URLs.

…web-ui

Per review feedback: hyphenated paths are more URL-friendly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per review feedback: the page must work in air-gapped or restricted environments and should not depend on a third-party CDN at runtime. The `@viz-js/viz@3.7.0` standalone bundle is now embedded into the binary via `#embed` and served from `/js/viz-standalone.js`, alongside the other built-in JS assets handled by `JavaScriptWebUIRequestHandler`. `processors_profile.html` now loads the local copy.

clickhouse-gh · 2026-05-17T12:49:04Z

 };
+constexpr unsigned char resource_viz_standalone_js[] =
+{
+#embed "../../programs/server/js/viz-standalone.js"


Embedding programs/server/js/viz-standalone.js pulls a ~1.38 MiB prebuilt dependency bundle (including Graphviz object code) into the repository.

That permanently inflates git history and violates the no-large/fat-binary-in-git rule. Please avoid committing this generated bundle and instead generate/package it from source during build/release (or fetch it in packaging/CI) so the artifact is not stored in-tree.

This is tolerable. Please proceed.

This issue still holds in the current head.

programs/server/js/viz-standalone.js is a ~1.38 MiB prebuilt/minified bundle that includes third-party object code, so we are still checking a fat generated artifact into git. That conflicts with the repo rule against large/binary blobs in-tree, and it now also causes CI failure (Style check -> codespell on this file).

Please keep this asset out of the repository (generate/fetch it in build/release packaging, or vendor the source and build it as part of tooling) so we avoid permanent history bloat and style-check noise.

This is still present in the current head: programs/server/js/viz-standalone.js is a ~1.38 MiB prebuilt third-party bundle (including Graphviz object code) checked directly into the repository and embedded into the server binary. That remains a fat generated dependency artifact in git, with permanent history/binary-size cost.

Please switch to a source-based or build-time fetched artifact flow instead of committing the bundled output file.

I re-checked current head and this is still unresolved.

programs/server/js/viz-standalone.js remains a ~1.38 MiB prebuilt third-party bundle (including embedded Graphviz object code), checked directly into git and then embedded into the server binary via #embed.

The previous reply does not change the underlying invariant issue: this permanently increases repository history with a generated fat artifact instead of source-managed assets/build-time generation.

alexey-milovidov · 2026-05-17T18:39:17Z

This was fixed by #105146. Let's update the branch.

clickhouse-gh · 2026-05-17T18:59:34Z

+            }
+            picker.innerHTML = rows.map(r => {
+                const label = `${r.event_time}  ·  ${r.user}  ·  ${formatDuration(r.query_duration_ms)}  ·  ${r.query_id}  ·  ${r.query}`;
+                return `<option value="${escapeHtml(r.query_id)}">${escapeHtml(label)}</option>`;


query_id alone is not a stable identifier for a single query execution. It can be reused (for example via explicit query_id settings), and in cluster mode the same query_id can appear on multiple hosts/executions.

Because the picker stores only query_id in <option value=...>, subsequent calls (fetchQueryText / fetchProfileData) can pick up rows from a different execution and merge unrelated processor stats. That breaks the page contract of showing the selected query's pipeline.

Please carry a per-execution key from the picker (for example hostname + query_id + event_time_microseconds, or another unique tuple) and filter both follow-up queries by that key (or a precise time window) instead of query_id only.

Not important, please disregard the comment above.

The ambiguity is still present in the current code.

The picker stores only query_id (<option value=...>), then both follow-up queries use only WHERE query_id = {qid:String} (fetchQueryText and fetchProfileData). If query_id is reused (manual SET query_id, retries, or cross-host duplicates in cluster mode), the selected row can resolve to another execution and blend unrelated processor stats.

Please carry an execution key from the picker (for example query_id + event_time_microseconds + hostname) and filter both follow-up reads by that key instead of query_id alone.

This is still live in the current head. The picker stores only query_id as the option value, and both follow-up queries still filter only WHERE query_id = {qid:String}. That conflates distinct executions when query_id is reused (explicit SET query_id, retries, or collisions across hosts in cluster mode), and duplicate options with the same value are not independently selectable.

Please key selection and profile lookup by a tuple that uniquely identifies one log row (for example event_time_microseconds + host + query_id, or a stable row id), and use that same tuple in both query-text and profile-data fetches.

I re-checked current head and this concern is still live.

The picker still stores only query_id as <option value=...>, and both follow-up reads still filter only WHERE query_id = {qid:String} in fetchQueryText and fetchProfileData.

The prior "not important" reply does not address the correctness issue: when query_id is reused (explicit query_id settings, retries, or multi-host contexts), the selected row in the UI is not a stable key for one execution, so the rendered text/heatmap can describe a different execution than the one the user picked.

alexey-milovidov · 2026-05-17T19:55:51Z

Also, please replace the usage of Dagre in the Web UI (play.html) with Viz as well.

The `EXPLAIN GRAPH` renderer in the Play web UI now uses the same vendored `viz-standalone.js` bundle as `/processors_profile`, instead of loading `dagre`, `graphlib-dot`, `dagre-d3`, and `d3` from external CDNs. This removes the last runtime CDN dependency for diagram rendering, so the page works in air-gapped or restricted environments. Also skip `viz-standalone.js` in the codespell typo check — the minified bundle contains the literal token `ofO` which codespell flags as a typo of `of`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Overlapping `profileQuery` calls (auto-select on `loadQueries`, picker change) could complete out of order, letting a slower earlier call overwrite the graph with stale data for a no-longer-selected query. Each call now captures a monotonically increasing generation token on entry and bails after every `await` if a newer call has been started. `renderDot` accepts the same predicate so the actual SVG mutation is also skipped once the call is stale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-05-19T02:44:46Z

    factory.addPathToHints("/jemalloc");
    factory.addHandler(jemalloc_handler);

+    auto processors_profile_handler = std::make_shared<HandlingRuleHTTPHandlerFactory<ProcessorsProfileWebUIRequestHandler>>(server);


/processors-profile is currently added only in addCommonDefaultHandlersFactory, but createHandlersFactoryFromConfig has no matching handler.type branch (unlike play, dashboard, binary, merges, and jemalloc).

That means any deployment that sets http_handlers cannot configure this endpoint at all, so the feature disappears outside default handlers. Please add a dedicated config handler type (for example processors_profile) in the config-driven switch and wire it to ProcessorsProfileWebUIRequestHandler.

clickhouse-gh · 2026-05-19T05:24:36Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	84.20%	84.10%	-0.10%
Functions	91.40%	91.40%	+0.00%
Branches	76.60%	76.60%	+0.00%

Changed lines: 80.00% (16/20) · Uncovered code

Full report · Diff report

clickhouse-gh Bot added the pr-feature Pull request with new product feature label May 11, 2026

nikitamikhaylov and others added 2 commits May 11, 2026 19:12

Query *_log via clusterAllReplicas(default, merge(system, '^...'))

0f63523

So the page sees every replica's local log table and any rotated query_log_N / processors_profile_log_N tables on Cloud / sharded setups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

clickhouse-gh Bot reviewed May 11, 2026

View reviewed changes

Comment thread programs/server/processors_profile.html Outdated

clickhouse-gh Bot reviewed May 11, 2026

View reviewed changes

Comment thread programs/server/processors_profile.html Outdated

clickhouse-gh Bot reviewed May 11, 2026

View reviewed changes

Comment thread programs/server/processors_profile.html Outdated

alexey-milovidov reviewed May 11, 2026

View reviewed changes

alexey-milovidov self-assigned this May 11, 2026

alexey-milovidov changed the title ~~Add /processors_profile web UI for per-query pipeline heatmap~~ Add /processors-profile web UI for per-query pipeline heatmap May 12, 2026

alexey-milovidov and others added 3 commits May 17, 2026 12:39

Merge remote-tracking branch 'origin/master' into processors-profile-…

e3ee333

…web-ui

Rename URL to /processors-profile

a7757b9

Per review feedback: hyphenated paths are more URL-friendly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

clickhouse-gh Bot reviewed May 17, 2026

View reviewed changes

alexey-milovidov mentioned this pull request May 17, 2026

Stop the bleeding in function_prop_fuzzer #105146

Merged

1 task

Merge branch 'master' into processors-profile-web-ui

29da2d9

clickhouse-gh Bot reviewed May 17, 2026

View reviewed changes

alexey-milovidov approved these changes May 18, 2026

View reviewed changes

clickhouse-gh Bot reviewed May 18, 2026

View reviewed changes

Comment thread programs/server/processors_profile.html

clickhouse-gh Bot reviewed May 19, 2026

View reviewed changes

nikitamikhaylov added this pull request to the merge queue May 19, 2026

Merged via the queue into master with commit d3861de May 19, 2026
323 of 326 checks passed

nikitamikhaylov deleted the processors-profile-web-ui branch May 19, 2026 11:45

robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 19, 2026

Conversation

nikitamikhaylov commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh Bot commented May 11, 2026 • edited by nikitamikhaylov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Findings

❌ Blockers

⚠️ Majors

Final Verdict

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexey-milovidov left a comment

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov May 17, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov commented May 17, 2026

Uh oh!

clickhouse-gh Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov May 17, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov commented May 17, 2026

Uh oh!

Uh oh!

clickhouse-gh Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot commented May 19, 2026

LLVM Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nikitamikhaylov commented May 11, 2026 •

edited

Loading

clickhouse-gh Bot commented May 11, 2026 •

edited by nikitamikhaylov

Loading