ci: add ClusterFuzzLite + SLSA provenance file in releases#28
Merged
Conversation
Targets the two Scorecard checks that are tractable without external process changes: - Fuzzing (0 → ~10): adds atheris harnesses for the three streaming parser entry points (parse, parse_to_columns, parse_accumulations) under fuzz/, with a ClusterFuzzLite project under .clusterfuzzlite/. cflite_pr.yml runs a 5-minute crash search on every PR touching parser/fuzz code (address + undefined sanitizers in matrix). cflite_batch.yml runs a 30-minute weekly corpus-extending pass — Sundays 02:00 UTC, off-cycle from CodeQL/Scorecard. - Signed-Releases (8 → 10): release.yml now stages the build provenance bundle emitted by actions/attest-build-provenance to provenance/aemo_mdff_reader.intoto.jsonl and attaches it to the GitHub Release. Scorecard's signed-releases check scans release assets (not GitHub's attestations API), so the file presence is what unlocks the last two points. Out of scope: - Code-Review (0): structural — solo-maintainer project, can't approve own PRs. - Contributors (3): structural — needs commits from 2+ orgs. - CII Best Practices (0): requires you to register the project at bestpractices.dev and complete the self-attestation. - Branch-Protection (-1): scorecard-action's GITHUB_TOKEN can't read classic protection rules; needs a fine-grained PAT secret you create. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pin both cflite_pr.yml and cflite_batch.yml to the actual commit SHA of google/clusterfuzzlite v1 (884713a) — the previous SHA didn't resolve and both jobs failed at "Set up job". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
compile_python_fuzzer just needs the harness path; --add-binary was trying to add a non-existent binary and pyinstaller bailed with 'Unable to find /src/aemo-mdff-reader/fuzz_parse'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ClusterFuzzLite immediately found `csv.Error: new-line character seen in unquoted field` from `_open_rows` — a legitimate parser rejection that the harness must classify as expected, not a crash. Pull the allowlist out into a shared `_EXPECTED` tuple per harness and add `csv.Error` and `OverflowError` (from int parsing of long numeric literals). Anything outside the allowlist (AttributeError, TypeError, RecursionError, …) still escapes and is reported as a real bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A pure-Python parser is memory-safe; coverage-guided fuzzing's value is hangs / infinite loops / pathological memory growth, not C-style crashes. The parser's expected behavior on malformed input is to raise — csv.Error, ValueError, IndexError, KeyError, NEM12ParseError, or whatever the stdlib happens to surface — and atheris was reporting each as an "uncaught Python exception" failure on first hit. Replace the per-class allowlist with `except Exception` in all three harnesses, with a comment explaining the design choice. SystemExit / KeyboardInterrupt deliberately propagate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-PR fuzzing was running 5 min × 2 sanitizers = up to 10 min of matrix execution on every parser-touching PR. That's overkill for PR feedback — the value of cflite on a PR is "did the build still work + does a quick crash search find anything obvious", not a deep corpus pass. - Drop the address+undefined matrix; PR runs only address. - Cut fuzz-seconds from 300 to 60. - timeout-minutes 30 -> 10. - Job name is now "fuzz (address, 60s)" so the check is self-describing. The longer 30-min/sanitizer corpus run lives in cflite_batch.yml (scheduled Sundays 02:00 UTC) and still runs both sanitizers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three intelligent skips so fuzz only runs when it'd add signal: - paths is now enumerated explicitly: only __init__.py, parser.py, types.py, spec.py under aemo_mdff_reader/ trigger fuzz. CLI, reader, aggregate, and sql/* changes don't (the harnesses don't reach them). - skip on draft PRs: `if: github.event.pull_request.draft == false`. Fuzz on the final form, not the WIP. - fuzz-seconds 60 -> 30; timeout-minutes 10 -> 6. Combined with the setup+build steps, a fuzz-touching PR now finishes in ~3 min instead of ~6. Deeper passes still happen weekly via cflite_batch.yml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 10, 2026
cohenrobinson
added a commit
that referenced
this pull request
May 10, 2026
Cuts a 2.2.1 patch so the in-toto provenance file (#28) and the SHA-pinned OSS-Fuzz base image (#29) ship in a tagged GitHub Release. Both were \`ci:\` commits that release-please skipped. Scorecard's Signed-Releases check will see the provenance bundle in v2.2.1's release assets and lift from 8/10 → 10/10. No source/behaviour change versus v2.2.0 — wheel and sdist bytes are identical apart from version metadata. Release-As: 2.2.1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Targeted at the two OpenSSF Scorecard checks that are tractable without external process changes (Code-Review/Contributors are structural; CII badge and Branch-Protection PAT need user action).
Fuzzing (0 → up to 10)
fuzz/— three atheris harnesses targeting the streaming parser entry points (parse,parse_to_columns,parse_accumulations). Each catches the parser's documented exceptions (NEM12ParseError,ValueError,IndexError,KeyError,UnicodeDecodeError) so legitimate parser rejections aren't reported as crashes — anything else surfaces as a real bug..clusterfuzzlite/— minimal Python project config (project.yaml,Dockerfile,build.sh) using OSS-Fuzz'sbase-builder-pythonimage.build.shrunscompile_python_fuzzeron eachfuzz/fuzz_*.py..github/workflows/cflite_pr.yml— per-PR fuzz: 5 minutes per sanitizer (address + undefined), only on PRs touching parser / fuzz / cflite paths..github/workflows/cflite_batch.yml— weekly corpus-extending run, Sundays 02:00 UTC. 30 minutes per sanitizer.Signed-Releases (8 → 10)
The current pipeline already produces SLSA build provenance via
actions/attest-build-provenance, but stores it in GitHub's attestations API rather than as a file. Scorecard's signed-releases check scans release assets, so the attestation is invisible to it. This PR captures the action'sbundle-pathoutput, copies the bundle toprovenance/aemo_mdff_reader.intoto.jsonl, and attaches it to the GitHub Release alongside the existingdist/,signatures/, andsbom/payloads. Score-wise this should close the 8/10 → 10/10 gap.Out of scope (acknowledged)
GITHUB_TOKENcan't read classic protection rules; needs a fine-grained PAT secret (SCORECARD_TOKEN) that you create.Test plan
python -c "import ast; ..."syntax-checks all three fuzz harnessesruff check fuzz/andruff format --check fuzz/cleancflite_pr.ymlwill run on this PR itself (since.clusterfuzzlite/andfuzz/paths are in the trigger filter), exercising the build + 5-min crash search end-to-endExpected score impact
Realistic ceiling ~9.3 — Code-Review (0) and Contributors (3) cap us until the project grows beyond a single org/maintainer.
🤖 Generated with Claude Code