Company or project name
No response
Describe what's wrong
We observed gradual memory accumulation in ClickHouse(v25.3.6.56) instances over 4-5 days running distributed queries with GLOBAL IN clauses eventually leading to “Memory limit exceeded” errors during query execution.
Creds: DefaultCode: 241. DB::Exception: Received from chi-clickhouse-chcluster1-0-0:9000. DB::Exception: (total) memory limit exceeded: would use 5.43 GiB (attempt to allocate chunk of 0.00 B bytes), current RSS: 11.57 GiB, maximum: 10.80 GiB. OvercommitTracker decision: Query was selected to stop by OvercommitTracker: While executing AggregatingTransform. (MEMORY_LIMIT_EXCEEDED)
container_memory_working_set_bytes is plotted here for clickhouse server pod over last 10 days
Analysis
We had check the cache sizes when the memory was up and cache sizes were not noticeable.
After jemalloc profiling and code analysis, we identified that temporary tables created by these queries are not fully cleaned up in the DatabaseMemory engine when dropped.
Specifically, the DatabaseMemory::dropTable method does not remove entries from snapshot_detached_tables, causing metadata for dropped tables to remain in memory indefinitely.
This leads to continuous memory growth proportional to the number of temporary tables created eventually leading to “Memory limit exceeded” errors for all query execution.
The number of GLOBAL IN queries started in our setup over one hour
Does it reproduce on the most recent release?
Yes
How to reproduce
ClickHouse Version: 25.3.6.56
- Run frequent distributed queries using GLOBAL IN (triggering temporary tables via Memory engine).
- Observe that memory usage of the ClickHouse server grows continuously over time.
Expected behavior
- Memory usage remains stable over time under steady query load.
- The server should not hit memory limits or show continuous growth during normal operation.
Error message and/or stacktrace
Attaching heap profile call graph generated by SYSTEM JEMALLOC FLUSH PROFILE and jeprof
my_current_profile.1.3.m3.heap_result.pdf
Additional context
No response
Company or project name
No response
Describe what's wrong
We observed gradual memory accumulation in ClickHouse(v25.3.6.56) instances over 4-5 days running distributed queries with GLOBAL IN clauses eventually leading to “Memory limit exceeded” errors during query execution.
Creds: DefaultCode: 241. DB::Exception: Received from chi-clickhouse-chcluster1-0-0:9000. DB::Exception: (total) memory limit exceeded: would use 5.43 GiB (attempt to allocate chunk of 0.00 B bytes), current RSS: 11.57 GiB, maximum: 10.80 GiB. OvercommitTracker decision: Query was selected to stop by OvercommitTracker: While executing AggregatingTransform. (MEMORY_LIMIT_EXCEEDED)container_memory_working_set_bytes is plotted here for clickhouse server pod over last 10 days
Analysis
We had check the cache sizes when the memory was up and cache sizes were not noticeable.
After jemalloc profiling and code analysis, we identified that temporary tables created by these queries are not fully cleaned up in the DatabaseMemory engine when dropped.
Specifically, the DatabaseMemory::dropTable method does not remove entries from snapshot_detached_tables, causing metadata for dropped tables to remain in memory indefinitely.
This leads to continuous memory growth proportional to the number of temporary tables created eventually leading to “Memory limit exceeded” errors for all query execution.
The number of GLOBAL IN queries started in our setup over one hour
Does it reproduce on the most recent release?
Yes
How to reproduce
ClickHouse Version: 25.3.6.56
Expected behavior
Error message and/or stacktrace
Attaching heap profile call graph generated by SYSTEM JEMALLOC FLUSH PROFILE and jeprof
my_current_profile.1.3.m3.heap_result.pdf
Additional context
No response