A maintainability ratchet for AI-assisted Python.
riskratchet computes a function-level risk score from coverage gaps, cyclomatic complexity, churn, public surface, and sprawl signals. Snapshot the current state as a baseline, then fail CI or block the commit whenever risk grows. The bar can only move down, never up.
The review workflow is directly inspired by two practical review surfaces:
cargo-crap, a Rust tool that made
the CRAP metric practical in CI with threshold gates, baseline deltas,
GitHub annotations, sticky PR comments, suppressions, missing-coverage
policies, and schema-backed JSON; and Cursor's
thermo-nuclear-code-quality-review
agent prompt, especially its emphasis on maintainability, structure,
unjustified file sprawl, ad-hoc branching growth, explicit boundaries, and
reviewing only what the diff shows. riskratchet is not a Python port of
cargo-crap or an agent prompt: it keeps CRAP as a reported metric, then adds
Python-specific signals such as branch gaps, churn, public surface, and
file/function sprawl.
Four real scenarios for how riskratchet earns its keep.
You've been vibe-coding a FastAPI backend with an AI agent for eight
months. It works, tests are green-ish (62% coverage), but you just
noticed that services/billing.py::reconcile_subscriptions quietly grew
to 180 lines and an 11-way match statement you don't remember writing.
pip install riskratchet
pytest --cov --cov-branch --cov-report=json:coverage.json
riskratchet scan src --coverage coverage.json --top 10reconcile_subscriptions shows up at score 71 (high) with
structural_complexity: 90, sprawl: 55, coverage_gap: 60. You also
spot a surprise: a 12-line public utility _normalize_plan_id scoring
48 because it has zero tests. Snapshot the bar:
riskratchet baseline src --coverage coverage.json --output .riskratchet.json
git add .riskratchet.json && git commit -m "Add riskratchet baseline"From here, every time the agent adds a webhook handler or "refactors the
billing flow," run riskratchet check before committing. If it quietly
bloated reconcile_subscriptions from 180 to 220 lines, the check exits
1 and names the regression. You stop having to remember to look.
Why this is the canonical use case: AI agents are excellent at adding code, mediocre at noticing they've made things worse. The baseline is your memory.
Five engineers maintain an internal Python SDK that 30 other services
consume. Coverage is 87%, reviews are decent, but PRs occasionally land
functions with CC > 25 because reviewers don't catch it. One-time
setup on main:
pytest --cov --cov-branch --cov-report=json:coverage.json
riskratchet baseline src --coverage coverage.json --output .riskratchet.json
git add .riskratchet.jsonIn GitHub Actions, on every PR:
- run: pytest --cov --cov-branch --cov-report=json:coverage.json
- run: |
riskratchet check src \
--coverage coverage.json \
--baseline .riskratchet.json \
--format pr-comment > regressions.md
status=$?
if [ $status -eq 1 ]; then
gh pr comment ${{ github.event.pull_request.number }} --body-file regressions.md
exit 1
fiTune pyproject.toml to reflect priorities — this SDK changes
constantly by design, so churn matters less than public surface:
[tool.riskratchet.weights]
churn = 0.0
public_surface = 0.20The pr-comment format starts with <!-- riskratchet-report -->, so
the bot updates the same comment on each push instead of spamming. check --format pr-comment uses the richer diff body, so reviewers see failing
regressions plus collapsed improvements, moves, removals, and unchanged
functions in one comment while the command still exits 1 only for
configured failing regressions.
Why this works for teams: the ratchet is mechanical and unowned. Nobody has to be "the complexity cop" in code review.
A data scientist has a pipelines/ repo. No CI, no PR review — just one
person pushing to main. They want a local guardrail before each
commit. Using Pattern A from the Pre-commit integration
section:
repos:
- repo: local
hooks:
- id: pytest-cov
entry: pytest --cov --cov-branch --cov-report=json:coverage.json -q
language: system
pass_filenames: false
always_run: true
- repo: https://github.com/KayhanB21/riskratchet
rev: v0.2.4
hooks:
- id: riskratchet
args: ["pipelines", "--coverage", "coverage.json", "--baseline", ".riskratchet.json"]After the initial riskratchet baseline pipelines …, every git commit regenerates coverage and gates the commit on no regressions. If
they try to commit a 90-line transform with no tests, the commit fails
with a list of regressed functions. They can use --allow "pipelines.experiments.*" to scope out an experimental folder where
rough code is intentional.
Why this matters here: without a team, there's no second pair of eyes. The hook is the review.
An engineer is assigned the dreaded inflect.engine._plnoun (CC=100,
from the Sample output section
below). They want to know why riskratchet flagged it and whether their
planned refactor actually helps.
riskratchet explain src --coverage coverage.json --qualname "engine._plnoun"This dumps the six component scores, the CRAP value, line numbers, and
what's driving the risk — e.g. structural_complexity: 100, sprawl: 78, coverage_gap: 2. Now they know the problem isn't tests; it's the
function shape.
They branch, spend two days breaking _plnoun into seven smaller
functions, then before opening the PR:
riskratchet diff src --coverage coverage.json --baseline .riskratchet.json --json \
| jq '.improved[], .regressed[]'diff shows improvements as well as regressions. _plnoun dropped from
36.2 → 18.4, and the seven new helpers all score under 20. They paste
that into the PR description as evidence that the refactor was
net-positive — not just rearranging deck chairs.
Why this scenario matters: scan tells you what's risky,
explain tells you why, diff tells you whether your change helped.
The classic CRAP score (CC^2 * (1 - line_coverage)^3 + CC) is great at
catching one specific shape of bad code: complex and poorly tested.
That's a real problem, but it misses several others that ship to production
just as often:
- A function with low complexity that has zero tests because no one wrote
any. CRAP gives it
CC(a single digit). Risk is real but not visible. - A function with no missing line coverage but every branch covered the same way. CRAP only looks at line coverage.
- A function in a 2,000-line module that everyone is afraid to touch. Sprawl is invisible to CRAP.
- A function that changed in 40 of the last 90 commits. Churn is invisible to CRAP.
riskratchet keeps CRAP as a reported metric (it's still useful as a single-number signal) and computes its own composite score from six weighted components so those other risks show up too.
The practical inspiration here is cargo-crap's treatment of CRAP as a CI and review signal instead of a static dashboard number. riskratchet borrows that operational shape for Python while widening the risk model beyond CRAP alone.
AI coding agents are very good at producing code that compiles, runs, and passes the tests it ships with. They are less good at:
- writing meaningful tests for the new code
- noticing when a 30-line function quietly became 130 lines
- catching that the public API now exposes a function with no callers in the tests
- realising that a small refactor turned an
ifladder into a 14-way cyclomatic monster
A traditional review catches some of this. A ratchet catches all of it, mechanically, every time. It pairs well with AI-assisted work because it turns "did this change introduce risk?" into a yes/no question with a diffable baseline.
# install
pip install riskratchet
# or run directly without installing
uvx riskratchet --help# 1. run your tests with coverage in JSON form
pytest --cov --cov-report=json:coverage.json
# 2. snapshot the current risk profile
riskratchet baseline src --coverage coverage.json --output .riskratchet.json
# 3. inspect what was captured
riskratchet scan src --coverage coverage.json
# 4. fail the build when risk regresses
riskratchet check src --coverage coverage.json --baseline .riskratchet.jsonriskratchet check exits with code 1 when any regression is detected,
exit 2 for usage errors (e.g. missing baseline), and 0 otherwise.
Two things about pre-commit matter for riskratchet:
- Pre-commit hides your unstaged edits before running hooks. It
"stashes" anything you've edited but not
git add-ed, so hooks only see the code you're actually about to commit. Useful in general, but it means riskratchet sees a different source tree than the one open in your editor. - Each
language: pythonhook runs in its own isolated virtualenv. That venv contains riskratchet and its declared dependencies — not your project's pytest, your application code, your fixtures, or your test plugins. So riskratchet can't simply "run your tests" from inside the hook environment; pytest there would fail to import your package.
Together these create one requirement: the coverage.json riskratchet
reads must reflect the same stashed source tree it's analyzing. If you
reuse an old coverage.json from before pre-commit stashed your edits,
the source and coverage drift out of sync — you may see phantom
"uncovered" lines for code that no longer exists, or score functions
against the wrong line ranges.
That's why the published hook ships with --no-auto-cov --allow-missing-coverage by default: it's safe but limited, and assumes
you'll wire coverage in yourself. Pick one of the two patterns below to
make it actually useful.
Add a sibling hook that runs pytest --cov inside the same pre-commit
chain. Because that hook runs after pre-commit has already stashed
unstaged edits, the coverage it produces matches the stashed source tree
exactly — riskratchet then reads a consistent picture.
repos:
- repo: local
hooks:
- id: pytest-cov
name: pytest --cov (produces coverage.json for riskratchet)
entry: pytest --cov --cov-branch --cov-report=json:coverage.json -q
language: system
pass_filenames: false
always_run: true
- repo: https://github.com/KayhanB21/riskratchet
rev: v0.2.4
hooks:
- id: riskratchet
args:
- "src"
- "--coverage"
- "coverage.json"
- "--baseline"
- ".riskratchet.json"riskratchet uses the freshly produced coverage.json directly, no auto-cov
needed. The pytest-cov hook also catches test failures early.
If your project uses uv (or poetry) and you'd rather not have pre-commit
manage a parallel venv at all, declare both hooks as language: system so
they run inside your project's environment. This is also what riskratchet
itself runs against its own source tree (see this repo's
.pre-commit-config.yaml):
repos:
- repo: local
hooks:
- id: pytest-cov
name: pytest --cov (produces coverage.json for riskratchet)
entry: uv run pytest --cov --cov-branch --cov-report=json:coverage.json -q
language: system
pass_filenames: false
always_run: true
- id: riskratchet
name: riskratchet check (gate on baseline regressions)
entry: uv run riskratchet check src --coverage coverage.json --baseline .riskratchet.json --no-auto-cov
language: system
pass_filenames: false
always_run: trueTwo upsides over the published-hook form: a single environment for both
hooks (no isolated-venv surprises), and uv run resolves the same Python
and deps uv sync set up. The downside is that contributors must have
uv installed locally; for repos with non-uv contributors, the published
form above is more portable.
To escape the isolated venv, override the hook to language: system so
it inherits your shell PATH (and finds your real pytest, your project,
and its deps):
repos:
- repo: local
hooks:
- id: riskratchet
entry: riskratchet check src --baseline .riskratchet.json
language: system
pass_filenames: false
always_run: trueriskratchet will run the configured [tool.riskratchet] test_command
(default pytest --cov --cov-branch --cov-report=json:{output} -q) and
cache the result under .riskratchet/coverage.json. The cache is reused
until any .py file under the scan paths is newer.
For local development outside pre-commit, the auto-coverage default applies
to plain riskratchet scan|baseline|check invocations as well; pass
--no-auto-cov to opt out.
riskratchet is designed to be called from agents and parsed without
screen-scraping. See AGENTS.md for the full operational
contract; the recipes below cover the common cases.
One-shot: list the top three highest-risk functions.
riskratchet scan src --coverage coverage.json --json \
| jq '.functions[:3] | .[] | {qualname, score, severity}'Show the full baseline diff, including improvements and removed functions.
riskratchet diff src --coverage coverage.json \
--baseline .riskratchet.json --jsonGate a CI job on regressions, printing the list when it fails.
riskratchet check src \
--coverage coverage.json \
--baseline .riskratchet.json \
--baseline-format riskratchet \
--json > regressions.json
status=$?
if [ "$status" -eq 1 ]; then
jq -r '.regressions[] | "- \(.qualname): \(.reason)"' regressions.json
exit 1
fi
exit "$status"Post regressions as a PR comment.
riskratchet check src --coverage coverage.json \
--baseline .riskratchet.json --format markdown \
| gh pr comment --body-file -For a sticky PR-bot body, use --format pr-comment. The output starts with
<!-- riskratchet-report --> so a GitHub Actions script can update an
existing comment instead of posting duplicates. For inline workflow warnings,
use --format github.
Markdown and PR-comment output can link each row back to source:
riskratchet scan src --format pr-comment \
--repo-url https://github.com/acme/project \
--commit-ref "$GITHUB_SHA"In GitHub Actions, riskratchet fills those values from
GITHUB_SERVER_URL, GITHUB_REPOSITORY, and GITHUB_SHA when available.
JSON output is validated against the schemas under
schemas/ on every release:
schemas/report.schema.json—scan --jsonschemas/regressions.schema.json—check --jsonschemas/diff.schema.json—diff --jsonschemas/baseline.schema.json—.riskratchet.jsonon diskschemas/summary.schema.json—scan|check|diff --summary --jsonschemas/config.schema.json—config show --json
Native JSON output includes $schema and version fields so consumers can
pin parsing behavior.
- Running
checkwithout a baseline.riskratchet baselinemust run first (typically onmain) and the resulting.riskratchet.jsonchecked in. Exits with code2when missing. - Passing
coverage.xmlto--coverage. riskratchet readscoverage.json. Generate it withpytest --cov --cov-report=json:coverage.jsonor let riskratchet auto-generate it (see Pre-commit integration). - Relying on the auto-coverage runner inside a sandbox with no pytest
installed. Pass
--no-auto-covplus--allow-missing-coverage, or set[tool.riskratchet] test_commandto a runner that does work in your environment. - Running without
--no-gitinside a sandbox that has no git history. Churn collection will be empty rather than failing, but pass--no-gitto be explicit and slightly faster. - Parsing stdout as both prose and JSON. Pick a format. With
--json, stdout is a single JSON object; status messages go to stderr. - When
checkexits1, it prints a short hint to stderr with the two ways out: regenerate the baseline (riskratchet baseline ...) or loosen the per-component gate (--no-component-regression-gate,--fail-component-regression-above). Stdout stays clean so--jsonconsumers are unaffected. - Bumping the baseline to silence a regression. The baseline is the bar; if it has to move up, do it in a dedicated PR with a written justification.
- Treating a "new" finding as necessarily part of the current commit. In
checkoutput, new means absent from the baseline. A function added in an earlier commit can still appear as new until the baseline intentionally accepts it.
For the broader trust boundaries and non-goals, see
docs/threat-model.md.
Use --exclude to skip files at discovery time. Use --allow to analyze a
file but suppress matching functions from reporting and gating:
riskratchet check src --baseline .riskratchet.json \
--allow "GeneratedModel.*" \
--allow "src/generated/**"Function patterns match dotted qualified names. Patterns containing / or
** match repo-relative POSIX paths.
When a coverage file is present but a scanned source file has no matching coverage entry, riskratchet warns on stderr. The default missing-coverage policy is pessimistic: treat those functions as uncovered. For partial local runs you can choose:
riskratchet scan src --coverage coverage.json --missing-coverage optimistic
riskratchet scan src --coverage coverage.json --missing-coverage skipoptimistic treats missing file coverage as fully covered. skip drops
functions from unmapped files and reports the skipped count in JSON summary.
riskratchet ships a pytest plugin that runs check as part of your test
session. After pip install riskratchet:
pytest \
--cov --cov-report=json:coverage.json \
--riskratchet \
--riskratchet-paths src \
--riskratchet-baseline .riskratchet.jsonThe session exits non-zero when riskratchet finds regressions, so you can
gate CI on pytest alone. Available flags:
--riskratchet(required to enable the plugin)--riskratchet-paths(default:src, repeatable)--riskratchet-baseline(default:.riskratchet.json)--riskratchet-coverage(default:coverage.json)--riskratchet-fail-new-above(default:50)--riskratchet-fail-regression-above(default:5)
Each function gets six component scores in [0, 100]:
| Component | Weight | What it measures |
|---|---|---|
| coverage_gap | 30% | 1 - line_coverage |
| structural_complexity | 25% | cyclomatic complexity, saturating at CC=20 |
| branch_gap | 15% | 1 - branch_coverage when branch coverage is known |
| churn | 10% | commits in the last 90 days, saturating at 10 |
| public_surface | 10% | coverage gap penalised harder when the function is public |
| sprawl | 10% | function length and file length blended |
The total risk is the weighted sum. Severity bands: 0-24 low, 25-49 medium, 50-74 high, 75-100 critical.
Each component is rescaled to [0, 100] (where 100 = maximum risk for
that signal) before being weighted into the total. Here's what each one
actually means, with a concrete example.
coverage_gap — "is this function tested at all?"
The fraction of lines in the function that your test suite never
executes. A function with 100% line coverage scores 0; a function with
0% line coverage scores 100.
Example: a 40-line
parse_invoicewhere your tests only exercise the happy path (28 lines covered, 12 missed) →coverage_gap = 30. A brand-newmigrate_to_v2with no tests at all →coverage_gap = 100.
structural_complexity — "how many ways can this function go?"
Cyclomatic complexity, which roughly counts independent paths through
the function (each if, elif, and, or, for, except adds
one). Saturates at CC=20 — anything past that is already "very
complex" and we don't need to keep counting.
Example: a getter with one return statement →
CC=1, score 0. Avalidate_user_inputwith 6 chainedif/elifbranches →CC=7, score ~35. A 14-waymatchstatement →CC=15, score ~75.
branch_gap — "are both sides of every if tested?"
Like coverage_gap, but for branches. A function whose tests only ever
take the if True path of an if/else will have full line coverage
but only 50% branch coverage. Only counts when your coverage run
included --cov-branch.
Example:
def discount(user): return 0.2 if user.is_premium else 0.0. A test that only passes premium users → 100% line coverage but 50% branch coverage →branch_gap = 50.
churn — "how often does this function change?"
Number of git commits touching the function's line range in the
configured churn window (default 90 days, set with --churn-days or
[tool.riskratchet] churn_window_days). Saturates at 10 commits.
High churn means many people have edited it recently, which correlates
with bugs.
Example: a stable
parse_iso_datelast touched two years ago →churn = 0. Apricing_engine.calculate_totalthat's been edited in 14 of the last 90 commits → saturates at 10 →churn = 100.
public_surface — "if this breaks, do callers we can't see break too?"
A multiplier on coverage gap: when a function is part of your public
API, its missing coverage is penalized harder than the same gap on a
private helper. A private helper with 40% coverage is a problem you can
fix locally; a public function with 40% coverage is a contract problem.
How "public" is determined:
- No
__all__in the module: by qualname.foois public;_foois private;Foo.__init__is public (dunder exception). - Module declares
__all__: additive promotion. A leading-underscore top-level name (e.g._LegacyExposed) is treated as public if it appears in__all__. Omission never demotes — a public-by-name function not in__all__stays public, because__all__only controlsimport *, not reachability. Nested segments still follow the naming rule:_LegacyExposed.baris public, but_LegacyExposed._helperis not. - Dynamic
__all__(__all__ += [...], concatenation, conditional) falls back to the naming rule.
Example:
_normalize_pathwith 50% coverage →public_surface = 25. Publicformat_currencywith 50% coverage →public_surface = 50._LegacyExposedin__all__with 50% coverage →public_surface = 50(promoted to public despite the underscore).
sprawl — "is this function (or its file) just too big?"
A blend of function length and the surrounding file's length. Long
functions are harder to hold in your head; long files mean any function
in them has more neighbors competing for attention. Both contribute.
Example: a 12-line function in a 200-line file →
sprawl = 5. A 180-line function in a 2,000-line module →sprawl = 85.
Suppose services/billing.py::reconcile_subscriptions is 180 lines,
public, has CC=14, 55% line coverage, 40% branch coverage, no recent
churn, and lives in a 900-line file. Its components might look like:
| Component | Raw signal | Score | Weight | Contribution |
|---|---|---|---|---|
| coverage_gap | 45% uncovered | 45 | 0.30 | 13.5 |
| structural_complexity | CC=14 of 20 saturating | 70 | 0.25 | 17.5 |
| branch_gap | 60% uncovered branches | 60 | 0.15 | 9.0 |
| churn | 0 commits in 90 days | 0 | 0.10 | 0.0 |
| public_surface | public + 45% gap | 45 | 0.10 | 4.5 |
| sprawl | long function, big file | 65 | 0.10 | 6.5 |
| total | 51.0 |
Score 51 → high severity. The dominant drivers are complexity and branch coverage; if you wanted to lower it without rewriting the function, the cheapest path is adding branch tests, not deleting lines.
Weights are configurable per project. Drop a [tool.riskratchet.weights]
table in pyproject.toml to override any subset; the remaining
components keep their defaults and the whole vector is renormalized so the
total still maps to [0, 100]. For example, to ignore churn entirely and
double-weight coverage:
[tool.riskratchet.weights]
coverage_gap = 0.6
churn = 0.0Unknown keys and negative values are rejected at startup so a typo cannot silently weaken the score.
riskratchet scan src --coverage coverage.json --format table # default
riskratchet scan src --coverage coverage.json --json # shortcut for --format json
riskratchet scan src --coverage coverage.json --format markdown # for PR comments
riskratchet scan src --coverage coverage.json --format sarif # for SARIF consumers
riskratchet scan src --coverage coverage.json --format github # GitHub Actions annotations
riskratchet scan src --coverage coverage.json --format pr-comment
riskratchet scan src --coverage coverage.json --summary # aggregate lines only
riskratchet scan src --coverage coverage.json --summary --json # schema-backed summary envelope
riskratchet scan src --coverage coverage.json --quiet # drops the trailing summary line
riskratchet scan src --coverage coverage.json --min-score 50 # hide lower-risk functions
riskratchet scan src --coverage coverage.json --top 10 # emit only the top Nriskratchet check accepts --baseline-format riskratchet, which is the
default and currently the only supported baseline format.
SARIF intentionally has a narrower contract than native JSON: scan --format sarif emits current findings after the same score filter used for
annotations, while check --format sarif and diff --format sarif emit only
failing regressions. A clean baseline still produces valid SARIF with an empty
results array.
Validate project config before relying on it in CI:
riskratchet config validate --config pyproject.toml
riskratchet config show --config pyproject.toml --jsonconfig validate exits 2 for malformed TOML, unknown keys, invalid value
types, invalid weights, or invalid group definitions. config show --json
prints the resolved config with CLI defaults filled where riskratchet already
has defaults.
Use [tool.riskratchet.groups] to roll function-level results up by package
or workspace area:
[tool.riskratchet.groups]
core = "src/core"
api = ["src/api", "src/public_api"]Each function is assigned to the longest matching repo-relative prefix.
Ungrouped functions are reported as null in JSON fields and ungrouped in
text or markdown summaries.
For packages/* / services/* layouts where a single coverage.json is
not practical, declare one coverage file per repo-relative prefix:
[tool.riskratchet]
paths = ["packages/alpha", "packages/beta"]
[tool.riskratchet.coverage_map]
"packages/alpha" = "packages/alpha/coverage.json"
"packages/beta" = "packages/beta/coverage.json"
[tool.riskratchet.groups]
alpha = "packages/alpha"
beta = "packages/beta"Or pass the map on the CLI (repeatable; longest matching prefix wins):
riskratchet scan packages/alpha packages/beta \
--coverage-map packages/alpha=packages/alpha/coverage.json \
--coverage-map packages/beta=packages/beta/coverage.jsonTwo workflows are supported:
- One repo-level baseline (recommended for tight coupling): a single
.riskratchet.jsoncovers every package; groups partition the reporting but the ratchet is global. - One baseline per package: each package has its own
pyproject.tomland.riskratchet.json, and you invokeriskratchetonce per package directory. Useful when packages release independently.
Every command prints a diagnostic banner to stderr summarizing the
resolved root, scan paths, and coverage source (coverage=map=...,
coverage=single=..., or coverage=none). Stdout stays payload-only.
Since 0.2.5, the comparison logic recognizes function renames and moves
before classifying them as new. A unique body-fingerprint or signature
match becomes MOVED; a tie between multiple plausible candidates
becomes AMBIGUOUS_RENAME and stays in the gating block of the PR
comment so risk growth isn't silently masked. See
AGENTS.md for the
full signal weighting.
For early adoption before a baseline exists, scan can also fail on an
absolute gate:
riskratchet scan src --coverage coverage.json --fail-above 75
riskratchet scan src --coverage coverage.json --fail-severity highBaseline gating is still the recommended mode for mature codebases.
JSON output (truncated):
{
"$schema": "https://github.com/KayhanB21/riskratchet/schemas/report.schema.json",
"version": "0.2",
"summary": {
"total_functions": 10,
"analyzed_functions": 42,
"emitted_functions": 10,
"total_files": 6,
"coverage_status": "present",
"suppressed_functions": 1,
"skipped_missing_coverage": 0,
"by_severity": {
"low": 1,
"medium": 6,
"high": 3,
"critical": 0
}
},
"functions": [
{
"path": "src/foo.py",
"qualname": "Foo.bar",
"score": 62.3,
"severity": "high",
"components": {
"coverage_gap": 80.0,
"structural_complexity": 55.0,
"branch_gap": 70.0,
"churn": 30.0,
"public_surface": 80.0,
"sprawl": 10.0
},
"crap": 12.4
}
]
}Markdown output is suitable for posting as a PR comment via gh pr comment.
See docs/ide-integration.md for how to view
findings inline in VS Code (via the SARIF Viewer extension) and JetBrains
IDEs.
I ran riskratchet against four widely-used Python libraries to show
what its output looks like on production code. Each was cloned fresh,
its own test suite run with pytest --cov --cov-report=json:coverage.json,
then scanned. Top findings:
| Library | Function | Score | CC | Line cov |
|---|---|---|---|---|
| python-slugify | __main__::main |
53.1 (high) | 3 | 11% (0% branch) |
| python-slugify | slugify |
33.3 | 27 | 88% |
| tabulate | _CustomTextWrap._wrap_chunks |
44.4 | 31 | 60% |
| tabulate | _normalize_tabular_data |
42.6 | 76 | 78% |
| tabulate | tabulate (entry) |
37.1 | 62 | 97% |
| humanize | precisedelta |
32.9 | 26 | 100% |
| humanize | naturaldelta |
32.4 | 33 | 100% |
| inflect | engine._sinoun |
36.7 | 108 | 98% |
| inflect | engine._plnoun |
36.2 | 100 | 99% |
The point is not that these libraries are bad. They have all-green CI and many users. The point is that even mature, well-tested code accumulates functions where complexity, coverage, and sprawl combine into something worth a second pair of eyes. A CC=108 function with 98% coverage is not on fire. It is a function that works and is tested. The ratchet's job is to keep those numbers from getting worse over time.
| Tool | Per-function risk | Baseline / ratchet | Combines complexity + coverage + churn |
|---|---|---|---|
| coverage.py | line / branch only | no | no |
| radon | complexity only | no | no |
| xenon | complexity only | yes (threshold) | no |
| pytest-crap | yes (CRAP) | no | partial (CC + line coverage) |
| riskratchet | yes | yes | yes |
The same commands run in GitHub Actions. Run them locally before pushing.
uv sync --locked
uv run ruff check .
uv run ruff format --check .
uv run mypy src tests
uv run pytest --cov=src/riskratchet --cov-branch --cov-report=term-missing
uv build --clearStrict typing covers both src/ and tests/.
Releases are published to PyPI via GitHub Actions Trusted Publishing (OIDC, no API tokens). To cut a release:
# 1. Bump `version` in pyproject.toml and move CHANGELOG entries from
# "Unreleased" into a new dated section. Commit on master.
# 2. Tag and push:
git tag vX.Y.Z
git push origin vX.Y.ZThe publish.yml workflow runs the same quality gates as CI (ruff, format
check, mypy, pytest), builds the distribution, verifies wheel metadata and
README metadata, runs isolated wheel/source install smoke tests, validates SARIF
output, and publishes via OIDC with PEP 740 attestations. The pypi GitHub
environment gates the upload step with a required-reviewer rule. If CI is red on
master, do not tag — the workflow's quality gate will fail anyway.
