Skip to content

infra: aggressive ClickHouse idle-baseline tuning (HOL-24)#106

Merged
BrewingCoder merged 1 commit into
mainfrom
issue-24-clickhouse-tune
May 9, 2026
Merged

infra: aggressive ClickHouse idle-baseline tuning (HOL-24)#106
BrewingCoder merged 1 commit into
mainfrom
issue-24-clickhouse-tune

Conversation

@BrewingCoder
Copy link
Copy Markdown
Owner

Summary

Drops ClickHouse idle memory from ~1.14 GiB to ~600 MiB warm / ~350 MiB cold, thread count from 746 to ~80, and disk usage from 6.8 GB to 1.4 GB on the dev volume.

What changed (in infra/docker/config.xml)

  • max_thread_pool_size 10000 → 128 (the big lever — default 10000 was producing 700+ idle threads on a 16-core box)
  • background_pool_size 16 → 14 (kept ratio×size ≥ 28 to clear MergeTree sanity checks like number_of_free_entries_in_pool_to_execute_optimize_entire_partition=25)
  • Other background pools cut to 4 each (or 16 for background_schedule_pool_size, which needs headroom or AsyncLoader stalls during startup with our 25+ default-DB tables)
  • max_concurrent_queries 100 → 20
  • asynchronous_metrics_update_period_s 1 → 30
  • All system *_log tables disabled via remove="remove" (text_log was hoarding 5.5 GB on disk; query_log/trace_log/etc were chatty for no operator benefit on a self-hosted single-tenant deploy)
  • listen_host 0.0.0.0 added (silences a noisy IPv6 listen warning)

Floor analysis

ClickHouse binary alone occupies ~580 MiB in shared library mappings + code segments (MemoryShared 309 + MemoryCode 272). Working heap rounds out to ~620–700 MiB on a warm idle box. Going lower would require either a custom-compiled minimal CH build or a different storage engine entirely — that's the open question being explored in HOL-25 onward (Postgres-migration EPIC).

Operational note

The on-disk system-log data must be wiped manually on existing volumes (those volumes pre-date the config disable). The 5.4 GiB of text_log / opentelemetry_span_log / etc data is reclaimed by deleting metadata/system/*_log.sql + their store/ UUIDs. New deployments don't need this step — the disable takes effect at first start.

Test plan

  • CH starts cleanly, accepts connections, all 86 application tables load
  • Backend reconnects, /health returns Healthy
  • Idle stack memory: backend 84 MiB / postgres 26 MiB / clickhouse 595 MiB warm = 705 MiB total (down from 815)
  • Smoke ingest + query still works end-to-end

Closes HOL-24.

🤖 Generated with Claude Code

Drops ClickHouse idle memory from ~1.14 GiB to ~600 MiB (warm) / ~350 MiB
(cold) and thread count from 746 to ~80, and disk usage from 6.8 GB to
1.4 GB on the dev volume.

Specifically:
- max_thread_pool_size 10000 -> 128 (this is the big one — default
  10000 means CH allocated 700+ threads on a 16-core idle box)
- background_pool_size 16 -> 14 (kept ratio*size >= 28 to clear
  MergeTree sanity checks like number_of_free_entries_in_pool_to_
  execute_optimize_entire_partition=25)
- Other background pools cut to 4 each (or 16 for schedule_pool, which
  needs headroom or AsyncLoader stalls during startup with our 25+
  default-DB tables)
- max_concurrent_queries 100 -> 20
- asynchronous_metrics_update_period_s 1 -> 30
- All system *_log tables disabled via remove="remove" (text_log was
  hoarding 5.5 GB on disk; query_log/trace_log/etc were chatty for no
  operator benefit on a self-hosted single-tenant deploy)
- listen_host 0.0.0.0 added (silences a noisy IPv6 listen warning)

Floor analysis: CH binary alone occupies ~580 MiB in shared library
mappings + code segments (MemoryShared 309 + MemoryCode 272). Working
heap rounds out to ~620-700 MiB on a warm idle box. Going lower
would require either a custom-compiled minimal CH build or a different
storage engine entirely.

Note: the on-disk system log data must be wiped manually for existing
volumes (these volumes pre-date the config disable). The 5.4 GiB of
text_log/opentelemetry_span_log/etc data is reclaimed by deleting
metadata/system/*_log.sql + their store/ UUIDs.

Refs HOL-24.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@BrewingCoder BrewingCoder merged commit 5fc03b7 into main May 9, 2026
4 checks passed
@BrewingCoder BrewingCoder deleted the issue-24-clickhouse-tune branch May 9, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant