Skip to content

Releases: Kryst4lskyxx/clickhouse-debug-skills

v0.2.0 — large-fleet debugging hardening

18 Jun 07:53
b01e88a

Choose a tag to compare

Harden the skill against friction surfaced by a real large-fleet, chproxy-fronted investigation: fleet-scale resource caps, silent metric NAs, transient connectivity, shell-state loss across agent Bash calls, and config contamination from the wrapper's own caps. Scripts gain self-diagnosing behavior; the references gain the two recipes that incident most wanted.

Added

  • chq.sh cap-trip hints — names the exact knob to raise (CH_MAX_ROWS/BYTES/TIME/EST_TIME/MEM) when a safety cap aborts a query, with the clusterAllReplicas fan-out reminder.
  • Transient-failure retry in both scripts (curl exit 6/7 no longer reads as an outage).
  • promq.sh empty-result + error detection — distinguishes 0 series (wrong/absent metric) from a real 0, surfaces Prometheus in-band errors, and no longer crashes jq on empty/non-JSON bodies.
  • Fleet-aware cap recipe in SKILL.md (discover node count → narrow window → scale caps).
  • Per-node iowait + disk-straggler recipe in cluster-state.md (avg by/max by (instance), latency-profile proxy).
  • Per-node latency-profile query (p50/p99/p999 by hostName()) in query-state.md.
  • Shell-persistence guidance + gitignored .chenv pattern.

Changed

  • Cap-contamination warning across SKILL.md, query-state.md, and the chq.sh header (read system.settings, not query_log.Settings of probes).
  • README helper descriptions updated; dropped the stale readonly=1 mention.

Full changelog: https://github.com/Kryst4lskyxx/clickhouse-debug-skills/blob/main/CHANGELOG.md

v0.1.2

18 Jun 06:06
d47d1c9

Choose a tag to compare

Integrate the Altinity companion skills and deepen the source-confirmation step. Documentation/methodology only — no script behavior changes.

Added

  • Three-tier companion model. Alongside clickhouse-best-practices (Fix canon), diagnosis now routes into altinity-expert-clickhouse-* (per-domain system.* specialists, with a symptom→specialist routing table and a "caps don't travel with borrowed SQL" safety note) and altinity-profiler-clickhouse (per-cluster <cluster>-analyst schema map for the Frame stage).
  • references/source-map.md — the Confirm-stage playbook for navigating the matched ClickHouse source tree: version-match check, error-code → throw-site grep patterns, metric/setting/system.*-column lookup locations, crash-mechanism pointers, and worked examples for the codes the skill names.
  • Install instructions for both Altinity suites in SKILL.md and README.md; normalized the best-practices install path to clickhouse/agent-skills.

Full changelog: https://github.com/Kryst4lskyxx/clickhouse-debug-skills/blob/main/CHANGELOG.md

v0.1.1

18 Jun 03:38
a516d6a

Choose a tag to compare

Wrapper fixes from a real chproxy-fronted production debugging session, plus docs for the proxy/fleet realities it surfaced. No methodology changes.

Fixed

  • chq.sh passed settings in the POST body, parsed as raw SQL → Code: 62 ... Syntax error at position 29 (&). Settings now ride in the URL via curl -G.
  • readonly=1 was sent unconditionally, rejected by read-only accounts → Code: 164 ... Cannot modify 'readonly' setting in readonly mode. Now opt-in via CH_READONLY.
  • result_overflow_mode=break collided with the query cacheCode: 731. Wrapper now always sends use_query_cache=0.
  • Arg parsing assumed a TTY ([ -t 0 ]), ignoring the SQL arg under the Bash tool → Code: 62 ... Empty query. Now prefers $1/-f, falls back to stdin.

Added

  • Heavy-read override path + clusterAllReplicas() fan-out caution (Code: 307).
  • "Single node vs. proxy-fronted fleet" guidance (clusterAllReplicas() + hostName()).
  • Metric-name discovery via __name__ regex instead of dumping the label catalog.

Full changelog: https://github.com/Kryst4lskyxx/clickhouse-debug-skills/blob/main/CHANGELOG.md

v0.1.0 — clickhouse-debug skill

17 Jun 10:26

Choose a tag to compare

Initial public release of the clickhouse-debug agent skill.

Diagnose live ClickHouse cluster and query incidents from a version-matched source checkout: Prometheus triage → read-only system.* forensics → source-confirmed root cause, with every probe resource-capped so it can't OOM a node. Fix-stage routing delegates to the official clickhouse-best-practices skill.

Install

# npx skills (Claude Code, Cursor, Copilot…)
npx skills add Kryst4lskyxx/clickhouse-debug-skills

# Claude Code plugin marketplace
/plugin marketplace add Kryst4lskyxx/clickhouse-debug-skills
/plugin install clickhouse-debug@clickhouse-debug-skills

See CHANGELOG.md for full details.