Release v0.2.0 — large-fleet debugging hardening · Kryst4lskyxx/clickhouse-debug-skills

Harden the skill against friction surfaced by a real large-fleet, chproxy-fronted investigation: fleet-scale resource caps, silent metric NAs, transient connectivity, shell-state loss across agent Bash calls, and config contamination from the wrapper's own caps. Scripts gain self-diagnosing behavior; the references gain the two recipes that incident most wanted.

Added

chq.sh cap-trip hints — names the exact knob to raise (CH_MAX_ROWS/BYTES/TIME/EST_TIME/MEM) when a safety cap aborts a query, with the clusterAllReplicas fan-out reminder.
Transient-failure retry in both scripts (curl exit 6/7 no longer reads as an outage).
promq.sh empty-result + error detection — distinguishes 0 series (wrong/absent metric) from a real 0, surfaces Prometheus in-band errors, and no longer crashes jq on empty/non-JSON bodies.
Fleet-aware cap recipe in SKILL.md (discover node count → narrow window → scale caps).
Per-node iowait + disk-straggler recipe in cluster-state.md (avg by/max by (instance), latency-profile proxy).
Per-node latency-profile query (p50/p99/p999 by hostName()) in query-state.md.
Shell-persistence guidance + gitignored .chenv pattern.

Changed

Cap-contamination warning across SKILL.md, query-state.md, and the chq.sh header (read system.settings, not query_log.Settings of probes).
README helper descriptions updated; dropped the stale readonly=1 mention.

Full changelog: https://github.com/Kryst4lskyxx/clickhouse-debug-skills/blob/main/CHANGELOG.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0 — large-fleet debugging hardening

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Added

Changed

Uh oh!