Skip to content

ci: add ClusterFuzzLite + SLSA provenance file in releases#28

Merged
cohenrobinson merged 7 commits into
mainfrom
chore/scorecard-fuzzing-and-provenance
May 10, 2026
Merged

ci: add ClusterFuzzLite + SLSA provenance file in releases#28
cohenrobinson merged 7 commits into
mainfrom
chore/scorecard-fuzzing-and-provenance

Conversation

@cohenrobinson
Copy link
Copy Markdown
Contributor

Summary

Targeted at the two OpenSSF Scorecard checks that are tractable without external process changes (Code-Review/Contributors are structural; CII badge and Branch-Protection PAT need user action).

Fuzzing (0 → up to 10)

  • fuzz/ — three atheris harnesses targeting the streaming parser entry points (parse, parse_to_columns, parse_accumulations). Each catches the parser's documented exceptions (NEM12ParseError, ValueError, IndexError, KeyError, UnicodeDecodeError) so legitimate parser rejections aren't reported as crashes — anything else surfaces as a real bug.
  • .clusterfuzzlite/ — minimal Python project config (project.yaml, Dockerfile, build.sh) using OSS-Fuzz's base-builder-python image. build.sh runs compile_python_fuzzer on each fuzz/fuzz_*.py.
  • .github/workflows/cflite_pr.yml — per-PR fuzz: 5 minutes per sanitizer (address + undefined), only on PRs touching parser / fuzz / cflite paths.
  • .github/workflows/cflite_batch.yml — weekly corpus-extending run, Sundays 02:00 UTC. 30 minutes per sanitizer.

Signed-Releases (8 → 10)

The current pipeline already produces SLSA build provenance via actions/attest-build-provenance, but stores it in GitHub's attestations API rather than as a file. Scorecard's signed-releases check scans release assets, so the attestation is invisible to it. This PR captures the action's bundle-path output, copies the bundle to provenance/aemo_mdff_reader.intoto.jsonl, and attaches it to the GitHub Release alongside the existing dist/, signatures/, and sbom/ payloads. Score-wise this should close the 8/10 → 10/10 gap.

Out of scope (acknowledged)

  • Code-Review (0) — structural; solo-maintainer project.
  • Contributors (3) — structural; needs commits from 2+ orgs.
  • CII Best Practices (0) — requires you to register the project at https://www.bestpractices.dev/ and complete the ~50-question self-attestation.
  • Branch-Protection (-1) — scorecard-action's GITHUB_TOKEN can't read classic protection rules; needs a fine-grained PAT secret (SCORECARD_TOKEN) that you create.

Test plan

  • python -c "import ast; ..." syntax-checks all three fuzz harnesses
  • ruff check fuzz/ and ruff format --check fuzz/ clean
  • Per-PR cflite_pr.yml will run on this PR itself (since .clusterfuzzlite/ and fuzz/ paths are in the trigger filter), exercising the build + 5-min crash search end-to-end
  • Provenance file path will be exercised on the next release

Expected score impact

Realistic ceiling ~9.3 — Code-Review (0) and Contributors (3) cap us until the project grows beyond a single org/maintainer.

🤖 Generated with Claude Code

cohenrobinson and others added 7 commits May 10, 2026 23:45
Targets the two Scorecard checks that are tractable without external
process changes:

- Fuzzing (0 → ~10): adds atheris harnesses for the three streaming
  parser entry points (parse, parse_to_columns, parse_accumulations)
  under fuzz/, with a ClusterFuzzLite project under
  .clusterfuzzlite/. cflite_pr.yml runs a 5-minute crash search on
  every PR touching parser/fuzz code (address + undefined sanitizers
  in matrix). cflite_batch.yml runs a 30-minute weekly corpus-extending
  pass — Sundays 02:00 UTC, off-cycle from CodeQL/Scorecard.
- Signed-Releases (8 → 10): release.yml now stages the build
  provenance bundle emitted by actions/attest-build-provenance to
  provenance/aemo_mdff_reader.intoto.jsonl and attaches it to the
  GitHub Release. Scorecard's signed-releases check scans release
  assets (not GitHub's attestations API), so the file presence is
  what unlocks the last two points.

Out of scope:
- Code-Review (0): structural — solo-maintainer project, can't
  approve own PRs.
- Contributors (3): structural — needs commits from 2+ orgs.
- CII Best Practices (0): requires you to register the project at
  bestpractices.dev and complete the self-attestation.
- Branch-Protection (-1): scorecard-action's GITHUB_TOKEN can't read
  classic protection rules; needs a fine-grained PAT secret you
  create.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pin both cflite_pr.yml and cflite_batch.yml to the actual commit SHA
of google/clusterfuzzlite v1 (884713a) — the previous SHA didn't
resolve and both jobs failed at "Set up job".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
compile_python_fuzzer just needs the harness path; --add-binary was
trying to add a non-existent binary and pyinstaller bailed with
'Unable to find /src/aemo-mdff-reader/fuzz_parse'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ClusterFuzzLite immediately found `csv.Error: new-line character seen
in unquoted field` from `_open_rows` — a legitimate parser rejection
that the harness must classify as expected, not a crash. Pull the
allowlist out into a shared `_EXPECTED` tuple per harness and add
`csv.Error` and `OverflowError` (from int parsing of long numeric
literals).

Anything outside the allowlist (AttributeError, TypeError,
RecursionError, …) still escapes and is reported as a real bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A pure-Python parser is memory-safe; coverage-guided fuzzing's value
is hangs / infinite loops / pathological memory growth, not C-style
crashes. The parser's expected behavior on malformed input is to
raise — csv.Error, ValueError, IndexError, KeyError, NEM12ParseError,
or whatever the stdlib happens to surface — and atheris was reporting
each as an "uncaught Python exception" failure on first hit.

Replace the per-class allowlist with `except Exception` in all three
harnesses, with a comment explaining the design choice.
SystemExit / KeyboardInterrupt deliberately propagate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-PR fuzzing was running 5 min × 2 sanitizers = up to 10 min of
matrix execution on every parser-touching PR. That's overkill for
PR feedback — the value of cflite on a PR is "did the build still
work + does a quick crash search find anything obvious", not a deep
corpus pass.

- Drop the address+undefined matrix; PR runs only address.
- Cut fuzz-seconds from 300 to 60.
- timeout-minutes 30 -> 10.
- Job name is now "fuzz (address, 60s)" so the check is self-describing.

The longer 30-min/sanitizer corpus run lives in cflite_batch.yml
(scheduled Sundays 02:00 UTC) and still runs both sanitizers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three intelligent skips so fuzz only runs when it'd add signal:

- paths is now enumerated explicitly: only __init__.py, parser.py,
  types.py, spec.py under aemo_mdff_reader/ trigger fuzz. CLI, reader,
  aggregate, and sql/* changes don't (the harnesses don't reach them).
- skip on draft PRs: `if: github.event.pull_request.draft == false`.
  Fuzz on the final form, not the WIP.
- fuzz-seconds 60 -> 30; timeout-minutes 10 -> 6. Combined with the
  setup+build steps, a fuzz-touching PR now finishes in ~3 min instead
  of ~6.

Deeper passes still happen weekly via cflite_batch.yml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cohenrobinson cohenrobinson merged commit ef7953d into main May 10, 2026
18 of 19 checks passed
@cohenrobinson cohenrobinson deleted the chore/scorecard-fuzzing-and-provenance branch May 10, 2026 14:12
cohenrobinson added a commit that referenced this pull request May 10, 2026
Cuts a 2.2.1 patch so the in-toto provenance file (#28) and the SHA-pinned OSS-Fuzz base image (#29) ship in a tagged GitHub Release. Both were \`ci:\` commits that release-please skipped.

Scorecard's Signed-Releases check will see the provenance bundle in v2.2.1's release assets and lift from 8/10 → 10/10. No source/behaviour change versus v2.2.0 — wheel and sdist bytes are identical apart from version metadata.

Release-As: 2.2.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant