Skip to content

Improve memory usage and cache overlaps for system.jemalloc_profile_text#99121

Merged
antonio2368 merged 4 commits intomasterfrom
jemalloc-profile-text
Mar 10, 2026
Merged

Improve memory usage and cache overlaps for system.jemalloc_profile_text#99121
antonio2368 merged 4 commits intomasterfrom
jemalloc-profile-text

Conversation

@antonio2368
Copy link
Copy Markdown
Member

@antonio2368 antonio2368 commented Mar 9, 2026

Summary

  • Reduce memory usage in JemallocProfileSource by storing cached symbol vectors as shared_ptr<const vector<string>> instead of copying on every cache hit.
  • Include the symbolize_with_inline flag in the cache key so that queries with different inline settings don't silently reuse each other's results.
  • Extract common resolveAddress helper to deduplicate symbolization code between generateSymbolized and generateCollapsed.
  • Re-read the profile file for heap lines in symbolized mode instead of storing all lines in memory.
  • Extract parseStackAddresses helper to deduplicate address parsing logic.

Test plan

  • Updated 03925_jemalloc_profile_system_table.sh to verify no duplicate lines in collapsed format using count() = uniqExact(line) with max_block_size = 1
  • All three output formats (raw, symbolized, collapsed) pass locally

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Reduce memory usage and fix potential duplicate output in system.jemalloc_profile_text collapsed format.

🤖 Generated with Claude Code


Note

Medium Risk
Touches profiling output generation and caching/streaming state; mistakes could change symbolization results or produce incomplete/duplicated profile text under chunked reads.

Overview
Reduces memory usage and improves determinism in system.jemalloc_profile_text processing by refactoring JemallocProfileSource symbolization and parsing logic.

Symbolization now uses an LRU keyed by (address, symbolize_with_inline) and stores cached frames as shared_ptr via a new resolveAddress() helper, avoiding vector copies and cross-setting cache pollution. Symbolized mode no longer buffers heap lines in memory; it re-reads the profile file when streaming the heap section, and address collection is deduplicated via parseStackAddresses() with sorted output.

Collapsed mode now aggregates once into a persisted CollapsedState and streams directly from the map, fixing a bug that could re-aggregate and emit duplicate rows; the test asserts uniqueness with max_block_size=1.

Written by Cursor Bugbot for commit fb39b3a. This will update automatically on new commits. Configure here.

antonio2368 and others added 3 commits March 9, 2026 15:46
Two bugs in the symbolization cache caused cross-mode cache hits to
produce wrong output:

1. Inconsistent frame storage order: symbolized mode cached frames in
   callback order (inline-first), while collapsed mode reversed frames
   before caching (main-first). Cross-mode cache hits silently produced
   wrong inline ordering or double-reversed stacks.

2. Missing `symbolize_with_inline` in cache key: a query with
   `symbolize_with_inline=false` cached a 1-element vector, and a later
   `=true` query silently reused it, dropping all inline frames.

Fix by:

- Changing the cache key to `(address, symbolize_with_inline)` so
  different inline settings get separate cache entries.
- Using `shared_ptr<const vector<string>>` as the value type to avoid
  copying the symbol vector on every cache hit.
- Introducing a single `resolveAddress` helper as the sole cache writer,
  always storing frames in callback order. Both symbolized and collapsed
  modes now reverse only at output time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Eliminate `profile_lines` vector: instead of storing all file lines in
  memory during the address-collection pass, re-read the file in the
  heap-output phase. This halves peak memory for large profiles.

- Eliminate `collapsed_lines` vector: stream collapsed output directly
  from the `stack_to_metric` map via an iterator, wrapped in a
  `CollapsedState` struct. The map is released once streaming completes.

- Use `WriteBufferFromString` for collapsed stack assembly instead of
  quadratic `operator+=` loop.

- Extract `parseStackAddresses` helper to deduplicate address parsing
  between `collectAddresses` and `generateCollapsed`.

- Sort collected addresses for deterministic symbolized output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 9, 2026

Workflow [PR], commit [fb39b3a5]

Summary:

job_name test_name status info comment
Stress test (amd_ubsan) failure
Logical error: 'assertHasValidVersionMetadata()' (STID: 2508-2d5e) FAIL cidb
Upgrade check (amd_release) failure
Error message in clickhouse-server.log (see upgrade_error_messages.txt) FAIL cidb

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Mar 9, 2026
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

…state

Remove the `aggregated` flag and instead make `CollapsedState` non-copyable
and non-movable to prevent accidental iterator invalidation. Ensure
`collapsed_state.reset()` is called on all completion paths to free memory
promptly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Fix a bug in generateCollapsed where collapsed_state.reset() without setting is_finished caused re-aggregation and duplicate output rows.

What was the problem? Wasn't it some intermediate version?

Comment on lines +546 to +547
if (state.iter == state.stack_to_metric.end())
is_finished = true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this second is_finished = true?

Copy link
Copy Markdown
Member Author

@antonio2368 antonio2368 Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know it's last chunk so why not set it instantly to true?

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 10, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.80% 83.80% +0.00%
Functions 23.90% 23.90% +0.00%
Branches 76.40% 76.30% -0.10%

PR changed lines: PR changed-lines coverage: 13.50% (22/163)
Diff coverage report
Uncovered code

@azat azat changed the title Improve memory usage and fix duplicate output in JemallocProfileSource Improve memory usage and cache overlaps for system.jemalloc_profile_text Mar 10, 2026
@antonio2368 antonio2368 added this pull request to the merge queue Mar 10, 2026
Merged via the queue into master with commit cbbb056 Mar 10, 2026
160 of 163 checks passed
@antonio2368 antonio2368 deleted the jemalloc-profile-text branch March 10, 2026 12:49
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 10, 2026
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Mar 13, 2026
robot-clickhouse-ci-2 added a commit that referenced this pull request Mar 13, 2026
Cherry pick #99121 to 26.2: Improve memory usage and cache overlaps for `system.jemalloc_profile_text`
robot-clickhouse added a commit that referenced this pull request Mar 13, 2026
clickhouse-gh bot added a commit that referenced this pull request Mar 13, 2026
Backport #99121 to 26.2: Improve memory usage and cache overlaps for `system.jemalloc_profile_text`
@robot-clickhouse robot-clickhouse added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-improvement Pull request with some product improvements pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo v25.12-must-backport v26.2-must-backport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants