Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,10 @@ jobs:
run: python3 -m unittest discover -s tests -v

- name: Run the validators on this repo (self-application)
# Corpus-agnostic checks: this repo ships templates, not a DEC corpus or
# a topic taxonomy, so topic-tags/frontmatter-schema do not apply here.
# --checks all: this repo has no DEC corpus or topic taxonomy, so
# topic-tags reports SKIPPED (green) and the DEC checks pass vacuously.
run: |
python3 templates/consistency-validators/validators.py \
--root . \
--checks counter-atomicity,principle-count-coherence,entity-count-coherence,band-unit,llm-ci-cost
python3 templates/consistency-validators/validators.py --root . --checks all

- name: Secrets scan (gitleaks)
# Working-tree scan; fails the job on any finding. Pinned image for
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

> CI guardrail templates, validators, and tooling for SliceOps™ adopters.

**Status: public · v0.1.0.** Companion to [sliceops-spec](https://github.com/SliceOps/spec). Licensed under the [MIT License](LICENSE) (ratified 2026-06-15, `DR-2026-06-15-sliceops-license-ratification`).
**Status: public · v0.1.1.** Companion to [sliceops-spec](https://github.com/SliceOps/spec). Licensed under the [MIT License](LICENSE) (ratified 2026-06-15, `DR-2026-06-15-sliceops-license-ratification`).

## What's here

Expand All @@ -11,8 +11,8 @@
| `templates/ci-guardrails/` | **Layer B.2 CI/Pipeline Cost Economy** reference templates (5 levers) — bootstrap defaults materializing P9 Shared-Resource Pre-flight |
| `templates/llm-ci-economy/` | **Layer B.2 sub-domain LLM-Inference-Cost-Economy** — workflow demonstrating prompt-caching, model-tier, diff-only context, trigger-set minimalism LLM-aware, and green-not-skipped draft gate |
| `templates/cost-ledger/` | **Layer B.1** cost-ledger template with three dimensions: token (billed-equivalent), infra/CI, and LLM-API-in-CI (P9) |
| `templates/consistency-validators/` | **Layer B.1 Layer 3** consistency validators — workflow and deterministic `validators.py` (cross-references-bidirectional, no-orphan-decs, frontmatter-schema, topic-tags, counter-atomicity) |
| `calibration/` | **Layer B.1 Calibration discipline** — deterministic `calibrate.py` (stdlib) parses session `.jsonl`, then percentiles, then bands; `band-calibration-register.md` is the append-only audit trail (v1 baseline 2026-06-15) |
| `templates/consistency-validators/` | **Layer B.1 Layer 3** consistency validators — workflow + deterministic `validators.py` (9 checks: frontmatter-schema, no-orphan-decs, cross-references-bidirectional, topic-tags, counter-atomicity, principle/entity-count-coherence, band-unit, llm-ci-cost). Stdlib-only; uses PyYAML automatically when installed |
| `calibration/` | **Layer B.1 Calibration discipline** — deterministic `calibrate.py` (stdlib) parses session `.jsonl` percentiles (clamped to the observed range) → **canonical** + data-driven **observed** bands; `band-calibration-register.md` is the append-only audit trail |

## Use it

Expand All @@ -39,7 +39,7 @@ python3 calibration/calibrate.py --root path/to/session-jsonl/ --label my-baseli

**4. Track cost** with the three-dimension [`templates/cost-ledger/`](templates/cost-ledger/) template (token billed-equivalent + infra/CI + LLM-API-in-CI).

> Design posture: these are **reference templates you adapt**, not a black-box dependency — bind `--root` and the conventions to your layout, swap the stdlib frontmatter parser for a real YAML one if you prefer, and so on.
> Design posture: these are **reference templates you adapt**, not a black-box dependency — bind `--root` and the conventions to your layout. The validator is stdlib-only but **uses PyYAML automatically when it's installed** (robust parsing), falling back to a documented YAML subset otherwise; path checks are OS-agnostic (Windows/Linux); an unconfigured `--topic-taxonomy` reports `SKIPPED` (green), a *configured-but-missing* one is a hard error.

## Roadmap (pending)

Expand Down
82 changes: 64 additions & 18 deletions calibration/calibrate.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,24 +79,63 @@ def session_metrics(path):
return peak_context, int(billed_eq), net_new, turns


MIN_SAMPLE = 8 # below this, percentiles are advisory only (flagged in output)


def percentiles(values, points=(25, 50, 75, 90, 95)):
"""Percentiles clamped to the observed range, using the *inclusive* method
(interpolates within [min, max]). The default 'exclusive' method assumes the
data is a sample of a larger population and can return a percentile ABOVE the
largest observed value (or below the smallest) on small corpora — misleading
in an audit. Inclusive + clamp guarantees min <= pN <= max; n==1 is handled."""
if not values:
return {p: 0 for p in points}
qs = quantiles(values, n=100)
return {p: int(qs[p - 1]) for p in points}


def propose_bands(p_context, p_billed):
"""Anchor proposed bands to canonical breakpoints (model windows and spec)."""
context_bands = [
("XS", "<32K"), ("S", "32-128K"), ("M", "128-200K"),
("L", "200-512K"), ("XL", ">512K"),
vals = sorted(values)
lo, hi = vals[0], vals[-1]
if len(vals) == 1:
return {p: int(lo) for p in points}
qs = quantiles(vals, n=100, method="inclusive")
return {p: int(min(max(qs[p - 1], lo), hi)) for p in points}


def canonical_bands():
"""The FIXED canonical breakpoints from the spec (model windows + baseline).
These are the reference; calibration compares the observed distribution
against them — it does not move them automatically. Renamed from the former
`propose_bands`, which misleadingly accepted percentiles it never used."""
return {
"context-band": [
("XS", "<32K"), ("S", "32-128K"), ("M", "128-200K"),
("L", "200-512K"), ("XL", ">512K"),
],
"token-band": [
("XS", "<2M"), ("S", "2-5M"), ("M", "5-10M"),
("L", "10-20M"), ("XL", ">20M"),
],
}


def _fmt(n):
if n >= 1_000_000:
return f"{n / 1_000_000:.1f}M"
if n >= 1_000:
return f"{round(n / 1000)}K"
return str(int(n))


def observed_bands(p):
"""Data-driven band edges derived from THIS corpus's percentiles
(p25/p50/p75/p90) — the actual proposal. Compare against canonical_bands()
to decide whether the canon needs recalibration. Percentiles are genuinely
used here (the former function ignored them)."""
e = [p[25], p[50], p[75], p[90]]
return [
("XS", f"<{_fmt(e[0])}"),
("S", f"{_fmt(e[0])}-{_fmt(e[1])}"),
("M", f"{_fmt(e[1])}-{_fmt(e[2])}"),
("L", f"{_fmt(e[2])}-{_fmt(e[3])}"),
("XL", f">{_fmt(e[3])}"),
]
token_bands = [
("XS", "<2M"), ("S", "2-5M"), ("M", "5-10M"),
("L", "10-20M"), ("XL", ">20M"),
]
return {"context-band": context_bands, "token-band": token_bands}


def main():
Expand Down Expand Up @@ -125,6 +164,10 @@ def main():
print(f"::error::no sessions found under {args.root}", file=sys.stderr)
sys.exit(2)

if n_sessions < MIN_SAMPLE:
print(f"::warning::small sample (N={n_sessions} < {MIN_SAMPLE}) — "
f"percentiles are advisory and clamped to the observed range.\n")

p_ctx = percentiles(contexts)
p_billed = percentiles(billeds)
p_netnew = percentiles(netnews)
Expand All @@ -145,10 +188,13 @@ def main():
for p, v in p_netnew.items():
print(f" p{p:>2}: {v:>12}")
print()
bands = propose_bands(p_ctx, p_billed)
print("Proposed bands (anchored to canonical breakpoints and observed distribution):")
for axis, bs in bands.items():
print(f" {axis}: {', '.join(f'{n}{r}' for n, r in bs)}")
print("Canonical bands (spec reference — fixed):")
for axis, bs in canonical_bands().items():
print(f" {axis}: {', '.join(f'{n} {r}' for n, r in bs)}")
print()
print("Observed bands (data-driven from this corpus's p25/p50/p75/p90):")
print(f" context-band: {', '.join(f'{n} {r}' for n, r in observed_bands(p_ctx))}")
print(f" token-band: {', '.join(f'{n} {r}' for n, r in observed_bands(p_billed))}")
print()
print("Register one-line summary (copy into band-calibration-register.md):")
print(
Expand Down
Loading
Loading