ci: add ClusterFuzzLite + SLSA provenance file in releases by cohenrobinson · Pull Request #28 · Utilified/aemo-mdff-reader

cohenrobinson · 2026-05-10T13:45:43Z

Summary

Targeted at the two OpenSSF Scorecard checks that are tractable without external process changes (Code-Review/Contributors are structural; CII badge and Branch-Protection PAT need user action).

Fuzzing (0 → up to 10)

fuzz/ — three atheris harnesses targeting the streaming parser entry points (parse, parse_to_columns, parse_accumulations). Each catches the parser's documented exceptions (NEM12ParseError, ValueError, IndexError, KeyError, UnicodeDecodeError) so legitimate parser rejections aren't reported as crashes — anything else surfaces as a real bug.
.clusterfuzzlite/ — minimal Python project config (project.yaml, Dockerfile, build.sh) using OSS-Fuzz's base-builder-python image. build.sh runs compile_python_fuzzer on each fuzz/fuzz_*.py.
.github/workflows/cflite_pr.yml — per-PR fuzz: 5 minutes per sanitizer (address + undefined), only on PRs touching parser / fuzz / cflite paths.
.github/workflows/cflite_batch.yml — weekly corpus-extending run, Sundays 02:00 UTC. 30 minutes per sanitizer.

Signed-Releases (8 → 10)

The current pipeline already produces SLSA build provenance via actions/attest-build-provenance, but stores it in GitHub's attestations API rather than as a file. Scorecard's signed-releases check scans release assets, so the attestation is invisible to it. This PR captures the action's bundle-path output, copies the bundle to provenance/aemo_mdff_reader.intoto.jsonl, and attaches it to the GitHub Release alongside the existing dist/, signatures/, and sbom/ payloads. Score-wise this should close the 8/10 → 10/10 gap.

Out of scope (acknowledged)

Code-Review (0) — structural; solo-maintainer project.
Contributors (3) — structural; needs commits from 2+ orgs.
CII Best Practices (0) — requires you to register the project at https://www.bestpractices.dev/ and complete the ~50-question self-attestation.
Branch-Protection (-1) — scorecard-action's GITHUB_TOKEN can't read classic protection rules; needs a fine-grained PAT secret (SCORECARD_TOKEN) that you create.

Test plan

python -c "import ast; ..." syntax-checks all three fuzz harnesses
ruff check fuzz/ and ruff format --check fuzz/ clean
Per-PR cflite_pr.yml will run on this PR itself (since .clusterfuzzlite/ and fuzz/ paths are in the trigger filter), exercising the build + 5-min crash search end-to-end
Provenance file path will be exercised on the next release

Expected score impact

Realistic ceiling ~9.3 — Code-Review (0) and Contributors (3) cap us until the project grows beyond a single org/maintainer.

🤖 Generated with Claude Code

Targets the two Scorecard checks that are tractable without external process changes: - Fuzzing (0 → ~10): adds atheris harnesses for the three streaming parser entry points (parse, parse_to_columns, parse_accumulations) under fuzz/, with a ClusterFuzzLite project under .clusterfuzzlite/. cflite_pr.yml runs a 5-minute crash search on every PR touching parser/fuzz code (address + undefined sanitizers in matrix). cflite_batch.yml runs a 30-minute weekly corpus-extending pass — Sundays 02:00 UTC, off-cycle from CodeQL/Scorecard. - Signed-Releases (8 → 10): release.yml now stages the build provenance bundle emitted by actions/attest-build-provenance to provenance/aemo_mdff_reader.intoto.jsonl and attaches it to the GitHub Release. Scorecard's signed-releases check scans release assets (not GitHub's attestations API), so the file presence is what unlocks the last two points. Out of scope: - Code-Review (0): structural — solo-maintainer project, can't approve own PRs. - Contributors (3): structural — needs commits from 2+ orgs. - CII Best Practices (0): requires you to register the project at bestpractices.dev and complete the self-attestation. - Branch-Protection (-1): scorecard-action's GITHUB_TOKEN can't read classic protection rules; needs a fine-grained PAT secret you create. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pin both cflite_pr.yml and cflite_batch.yml to the actual commit SHA of google/clusterfuzzlite v1 (884713a) — the previous SHA didn't resolve and both jobs failed at "Set up job". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

compile_python_fuzzer just needs the harness path; --add-binary was trying to add a non-existent binary and pyinstaller bailed with 'Unable to find /src/aemo-mdff-reader/fuzz_parse'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ClusterFuzzLite immediately found `csv.Error: new-line character seen in unquoted field` from `_open_rows` — a legitimate parser rejection that the harness must classify as expected, not a crash. Pull the allowlist out into a shared `_EXPECTED` tuple per harness and add `csv.Error` and `OverflowError` (from int parsing of long numeric literals). Anything outside the allowlist (AttributeError, TypeError, RecursionError, …) still escapes and is reported as a real bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A pure-Python parser is memory-safe; coverage-guided fuzzing's value is hangs / infinite loops / pathological memory growth, not C-style crashes. The parser's expected behavior on malformed input is to raise — csv.Error, ValueError, IndexError, KeyError, NEM12ParseError, or whatever the stdlib happens to surface — and atheris was reporting each as an "uncaught Python exception" failure on first hit. Replace the per-class allowlist with `except Exception` in all three harnesses, with a comment explaining the design choice. SystemExit / KeyboardInterrupt deliberately propagate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per-PR fuzzing was running 5 min × 2 sanitizers = up to 10 min of matrix execution on every parser-touching PR. That's overkill for PR feedback — the value of cflite on a PR is "did the build still work + does a quick crash search find anything obvious", not a deep corpus pass. - Drop the address+undefined matrix; PR runs only address. - Cut fuzz-seconds from 300 to 60. - timeout-minutes 30 -> 10. - Job name is now "fuzz (address, 60s)" so the check is self-describing. The longer 30-min/sanitizer corpus run lives in cflite_batch.yml (scheduled Sundays 02:00 UTC) and still runs both sanitizers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three intelligent skips so fuzz only runs when it'd add signal: - paths is now enumerated explicitly: only __init__.py, parser.py, types.py, spec.py under aemo_mdff_reader/ trigger fuzz. CLI, reader, aggregate, and sql/* changes don't (the harnesses don't reach them). - skip on draft PRs: `if: github.event.pull_request.draft == false`. Fuzz on the final form, not the WIP. - fuzz-seconds 60 -> 30; timeout-minutes 10 -> 6. Combined with the setup+build steps, a fuzz-touching PR now finishes in ~3 min instead of ~6. Deeper passes still happen weekly via cflite_batch.yml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cuts a 2.2.1 patch so the in-toto provenance file (#28) and the SHA-pinned OSS-Fuzz base image (#29) ship in a tagged GitHub Release. Both were \`ci:\` commits that release-please skipped. Scorecard's Signed-Releases check will see the provenance bundle in v2.2.1's release assets and lift from 8/10 → 10/10. No source/behaviour change versus v2.2.0 — wheel and sdist bytes are identical apart from version metadata. Release-As: 2.2.1

cohenrobinson and others added 7 commits May 10, 2026 23:45

ci: fix ClusterFuzzLite action SHA

bd9cb98

Pin both cflite_pr.yml and cflite_batch.yml to the actual commit SHA of google/clusterfuzzlite v1 (884713a) — the previous SHA didn't resolve and both jobs failed at "Set up job". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cohenrobinson merged commit ef7953d into main May 10, 2026
18 of 19 checks passed

cohenrobinson deleted the chore/scorecard-fuzzing-and-provenance branch May 10, 2026 14:12

This was referenced May 10, 2026

ci: pin OSS-Fuzz base-builder-python by SHA #29

Merged

docs: bake CI hardening into a 2.2.1 release #30

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add ClusterFuzzLite + SLSA provenance file in releases#28

ci: add ClusterFuzzLite + SLSA provenance file in releases#28
cohenrobinson merged 7 commits into
mainfrom
chore/scorecard-fuzzing-and-provenance

cohenrobinson commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cohenrobinson commented May 10, 2026

Summary

Fuzzing (0 → up to 10)

Signed-Releases (8 → 10)

Out of scope (acknowledged)

Test plan

Expected score impact

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant