Show temporary data on disk usage in clickhouse-client progress bar#105190
Conversation
The progress bar displayed by `clickhouse-client` shows real-time CPU and memory usage. This adds the current amount of temporary data on disk (used by external sort, aggregation, JOIN, etc.) next to RAM. A new `TemporaryDataOnDiskUsage` gauge profile event is emitted by the server for each query, reading the current compressed size from the per-query `TemporaryDataOnDiskScope`. The client accumulates per-host values and renders them as `NN.MM GB disk` after `CPU` and `RAM`. The new field is shown only when it is non-zero, so unaffected queries look exactly the same as before. Example output for an external aggregation: `(0.8 CPU, 148.04 MB RAM, 3.28 GB disk)`
For distributed queries, append `NN.MM GB max/host` after the temporary data on disk total whenever it differs from the total (i.e., the data spans multiple hosts), mirroring the existing `max/host` annotation for `RAM`. Single-host queries are unaffected since the per-host maximum equals the total.
|
Workflow [PR], commit [18d0a90] Summary: ✅ AI ReviewSummaryThis PR adds a new per-query gauge ( Final VerdictStatus: ✅ Approve |
|
@groeneai, investigate the failure: |
|
@alexey-milovidov — looked into it. Summary: I can't read the failure log (private repo), and the test in question does not exist in the OSS repo — it's a Cloud-only integration test. I need the stack trace pasted here, or the failure escalated to the CH-Inc Cloud team, to proceed. Investigation:
What I need to make progress: the actual error/stack trace from that private job, since I can't open the URL. If the failure is in OSS code (engine, parser, formats, storage) it would also be reproducible publicly with appropriate setup, and I can take it. If it's in — @groeneai (session |
LLVM Coverage Report
Changed lines: 94.64% (53/56) | lost baseline coverage: 3 line(s) · Uncovered code |
The progress bar in
clickhouse-clientalready displays real-time CPU and memory (RAM) usage. This adds the current amount of temporary data on disk (used by external sort, aggregation, JOIN, etc.) next toRAM, shown only when non-zero.The server emits a new
TemporaryDataOnDiskUsagegauge profile event with the current compressed size of the per-queryTemporaryDataOnDiskScope. The client accumulates per-host values and renders them asNN.MM GB diskafterCPUandRAM. For distributed queries with uneven per-host pressure,NN.MM GB max/hostis appended after the disk total — mirroring the existingmax/hostannotation forRAM.Example output for an external aggregation on a single host:
(0.8 CPU, 148.04 MB RAM, 3.28 GB disk)Example output for a distributed query with uneven per-host disk pressure:
(0.8 CPU, 4.0 GB RAM, 2.0 GB max/host, 8.0 GB disk, 4.0 GB max/host)Queries that don't spill to disk look exactly the same as before.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
The progress bar in
clickhouse-clientnow shows temporary data on disk usage (e.g. for external sort, aggregation, or JOIN) next to RAM, including a per-host breakdown for distributed queries.Documentation entry for user-facing changes