Add CGroupMemoryUsedWithoutPageCache async metric and clarify CGroupMemoryUsed description#101513
Conversation
…emoryUsed description Add a new `CGroupMemoryUsedWithoutPageCache` async metric that subtracts the ClickHouse userspace page cache from `CGroupMemoryUsed`, mirroring what `MemoryResidentWithoutPageCache` does for RSS (added in ClickHouse#81233). Also clarify the `CGroupMemoryUsed` description: it previously said "excluding page cache" without specifying which page cache. It now explicitly states that the kernel page cache (OS-level file cache) is excluded because `memory.stat` does not account for the `file` field, while the ClickHouse userspace page cache is NOT excluded - that is what the new metric is for. Closes ClickHouse/support-escalation#7289 Co-authored-by: Claude <noreply@anthropic.com>
|
Workflow [PR], commit [0de40db] Summary: ✅ AI ReviewSummaryThis PR adds Missing context
ClickHouse Rules
Final Verdict
|
|
|
||
| UInt64 cgroup_page_cache_bytes = 0; | ||
| if (context && context->getPageCache()) | ||
| cgroup_page_cache_bytes = context->getPageCache()->sizeInBytes(); |
There was a problem hiding this comment.
We already have two metrics *WithoutPageCache can we expose size of user-space page cache in asynchronous metrics instead?
There was a problem hiding this comment.
we do expose the size of user-space page cache already but is not usable in clickhouse-scraper
because clickhouse-scraper can only fetch metrics from table and aggregate over 1 minute. so we end up with
max_over_1m(CGroupUsedMemory) - max_over_1m(page_cache) which is not the same as max_over_1m(CgroupUsedMemory-page_cache)
We already tried this before but we failed and that's why we ended up creating MemoryResidentWithoutPageCache in #81233
we already had this conversation in https://github.com/ClickHouse/data-plane-application/pull/19933 which lead to exposing a raw metric that already took the WithoutPageCache into account
There was a problem hiding this comment.
Sorry, maybe I wasn't clear, let me clarify.
Right now we expose PageCacheBytes only in system.metrics, I'm suggesting to expose it also in system.asynchronous_metric that way you will get it in one place and can do what ever arithmetic your need.
There was a problem hiding this comment.
ah , yeah i misunderstood. 👍
indeed right now we do graph the page cache usage from metrics and the max page cache from async metrics
I'm suggesting to expose it also in system.asynchronous_metric that way you will get it in one place and can do what ever arithmetic your need.
if you think is best sure, 💯
do you want to do it or should i throw my claude at it ?
Side note , i just noticed with use the page cache also in a warning about memory usage
There was a problem hiding this comment.
can you check the current version ? i have a feeling you meant something slighlty different because i don't really need to expose it as async metric as well as standard metric this way ...
Add `MemoryUserSpacePageCache` async metric that exposes the ClickHouse userspace page cache size in `system.asynchronous_metrics`, alongside the existing `MemoryResident`, `MemoryResidentWithoutPageCache`, `CGroupMemoryUsed` and `CGroupMemoryUsedWithoutPageCache` metrics. Previously this value was only available in `system.metrics` as `PageCacheBytes`. Having it in `system.asynchronous_metrics` means operators can do arbitrary arithmetic directly in one place, e.g.: CGroupMemoryUsed - MemoryUserSpacePageCache MemoryResident - MemoryUserSpacePageCache The value is computed from the already-fetched `context->getPageCache()->sizeInBytes()` local variable, so no extra call is made. Co-authored-by: Claude <noreply@anthropic.com>
…ved metrics `MemoryUserSpacePageCache` is now populated first (single call to `context->getPageCache()->sizeInBytes()`), and both `MemoryResidentWithoutPageCache` and `CGroupMemoryUsedWithoutPageCache` read back that value instead of calling `getPageCache()->sizeInBytes()` independently. Co-authored-by: Claude <noreply@anthropic.com>
…s test
- Reword `CGroupMemoryUsed` description to attribute the page cache
exclusion to ClickHouse's own field selection from `memory.stat`
(anonymous memory, socket buffers, non-reclaimable kernel memory),
rather than implying it is a kernel accounting guarantee.
- Add stateless test 04075 that:
- Verifies `MemoryUserSpacePageCache` is always present in
`system.asynchronous_metrics`
- When cgroup metrics are available, asserts the invariant
`CGroupMemoryUsedWithoutPageCache <= CGroupMemoryUsed`
Co-authored-by: Claude <noreply@anthropic.com>
Replace the WHERE-based filter that silently produced no output when CGroupMemoryUsedWithoutPageCache was missing with explicit if()-based assertions that always produce deterministic output. The existence check and invariant check now properly fail when cgroup metrics are available but the expected metric is absent, while gracefully passing in environments without cgroup support. Co-authored-by: Claude <noreply@anthropic.com>
…description Remove the intermediate MemoryUserSpacePageCache async metric — it is not needed since page_cache_bytes is already in scope. CGroupMemoryUsedWithoutPageCache now reads page_cache_bytes directly. Simplify the CGroupMemoryUsed description to say it excludes the kernel OS page cache, without claiming specifics about cgroup v1/v2 memory.stat field accounting. Update the stateless test accordingly. Co-authored-by: Claude <noreply@anthropic.com>
page_cache_bytes is local to an earlier #if block and not visible in the cgroup metrics section. Read the value from context->getPageCache() directly instead. Co-authored-by: Claude <noreply@anthropic.com>
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
The if() false-branch used literal `true` (Bool) while the true-branch returned UInt8 from comparisons. ClickHouse promotes both to Bool, printing `true` instead of `1`, causing the reference file mismatch. Co-authored-by: Claude <noreply@anthropic.com>
| (SELECT value FROM system.asynchronous_metrics WHERE metric = 'CGroupMemoryUsedWithoutPageCache') | ||
| <= (SELECT value FROM system.asynchronous_metrics WHERE metric = 'CGroupMemoryUsed'), |
There was a problem hiding this comment.
This will be flaky, you cannot read metric two times
There was a problem hiding this comment.
WITH
(SELECT groupArray((metric, value)) FROM system.asynchronous_metrics
WHERE metric IN ('CGroupMemoryUsed', 'CGroupMemoryUsedWithoutPageCache')) AS metrics,
arrayFirst(x -> x.1 = 'CGroupMemoryUsed', metrics) AS used,
arrayFirst(x -> x.1 = 'CGroupMemoryUsedWithoutPageCache', metrics) AS without_cache
SELECT
if(used.2 > 0, without_cache.2 > 0, 1),
if(used.2 > 0, without_cache.2 <= used.2, 1);There was a problem hiding this comment.
thank you ! updating now with your fix
…kiness Read `CGroupMemoryUsed` and `CGroupMemoryUsedWithoutPageCache` in one `groupArray` query instead of multiple subqueries against `system.asynchronous_metrics`. The metrics can be updated between separate reads, making the comparison racy. ClickHouse#101513 Co-authored-by: Claude <noreply@anthropic.com>
…then test Make `CGroupMemoryUsed` description explicitly document cgroup v1 vs v2 differences (RSS vs anon+sock+kernel). Simplify `CGroupMemoryUsedWithoutPageCache` description to accurately state it subtracts the userspace page cache from `CGroupMemoryUsed`. Strengthen `04075_memory_userspace_pagecache_metrics` test with explicit metric-existence assertions using `countIf` instead of relying on `arrayFirst` default values, which could mask registration/rename regressions. Co-authored-by: Claude <noreply@anthropic.com>
| -- Both metrics must be either both present or both absent. | ||
| has_used = has_without_cache, | ||
| -- When present, without_cache must be positive and <= used. | ||
| if(has_used = 1, without_cache.2 > 0, 1), |
There was a problem hiding this comment.
The assertion if(has_used = 1, without_cache.2 > 0, 1) is too strict for this metric.
CGroupMemoryUsedWithoutPageCache is computed as max(0, CGroupMemoryUsed - userspace_page_cache_bytes), so 0 is a valid value (for example, when userspace page cache fully covers cgroup usage, or when usage is zero). This can make the test fail even when the metric is present and correct.
Please remove the strict > 0 check and keep the presence + without_cache <= used invariant checks.
- Replace countIf (aggregate, no lambda support) with arrayCount. - Remove the without_cache > 0 check: the metric is max(0, CGroupMemoryUsed - page_cache_bytes) so 0 is valid. - Keep only presence and without_cache <= used invariants. Co-authored-by: Claude <noreply@anthropic.com>
|
The |
…userspacepagecache
|
The MSan stress test failure (MemorySanitizer: use-of-uninitialized-value, STID 4179-5154 or 4148-3044) is a known pre-existing issue unrelated to this PR. Fix: #102158 |
|
weird, now i am getting permission denied on filesystem errors according to claude but i will see if i can somehow get the whole pr green |
|
The |
LLVM Coverage Report
Changed lines: 100.00% (20/20) · Uncovered code |
CGroupMemoryUsedis the preferred metric for memory accounting in cgroup environments (as noted in support escalation #7289), where autoscaling and memory overload warnings are moving away fromMemoryResident.However, when the ClickHouse userspace page cache is enabled,
CGroupMemoryUsedincludes that cache's memory in its value — just likeMemoryResidentdoes. PR #81233 addressed this for the RSS-based path by introducingMemoryResidentWithoutPageCache. This PR adds the equivalent for the cgroup path.Additionally, the existing
CGroupMemoryUseddescription said"(excluding page cache)"without specifying which page cache — it actually excludes only the kernel OS page cache, not the ClickHouse userspace page cache. This was confusing.Changes
CGroupMemoryUseddescription to explicitly state it excludes the kernel OS page cache.CGroupMemoryUsedWithoutPageCacheasync metric:max(0, CGroupMemoryUsed - page_cache_bytes)CGroupMemoryUsedMemoryResidentWithoutPageCachein add MemoryResidentWithoutPageCache #81233CGroupMemoryUsedWithoutPageCachepresence and invariant (<= CGroupMemoryUsed).Related: #81233 (added
MemoryResidentWithoutPageCache)Related: #100901 (improves
CGroupMemoryUsedcalculation by subtractingslab_reclaimable)Related: https://github.com/ClickHouse/support-escalation/issues/7289
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Added
CGroupMemoryUsedWithoutPageCacheasync metric that reports cgroup memory usage excluding both the kernel OS page cache and the ClickHouse userspace page cache, mirroringMemoryResidentWithoutPageCache. Also clarified theCGroupMemoryUsedmetric description.Documentation entry for user-facing changes
Co-authored-by: Claude noreply@anthropic.com