Skip to content

verify-action-build: skip binary-download check for pure data fetches#775

Open
potiuk wants to merge 1 commit intoapache:mainfrom
potiuk:verify-action-build-skip-data-fetches
Open

verify-action-build: skip binary-download check for pure data fetches#775
potiuk wants to merge 1 commit intoapache:mainfrom
potiuk:verify-action-build-skip-data-fetches

Conversation

@potiuk
Copy link
Copy Markdown
Member

@potiuk potiuk commented Apr 25, 2026

Summary

The binary-download check currently flags every tc.downloadTool / fetch / https.get in JS source that doesn't sit alongside a hash/signature pattern. That over-flags actions that fetch HTTP responses as data — a metadata badge regex'd for an integer, an API JSON parsed for a version number — where there is no binary or executable to verify in the first place.

Add a per-file heuristic that exempts pure data fetches from the check.

The false-positive that motivated this

PR #752 (dependabot/fetch-metadata 3.0.0 → 3.1.0) was flagged on src/dependabot/verified_commits.ts:99:

const svg = await new Promise<string>((resolve) => {
  https.get(`https://dependabot-badges.githubapp.com/badges/compatibility_score?...`, res => { ... })
})
const scoreChunk = svg.match(/<title>compatibility: (?<score>\d+)%<\/title>/m)
return scoreChunk?.groups ? parseInt(scoreChunk.groups.score) : 0

This is a computed-on-demand SVG badge — no possible "expected hash" exists, and a regex-extracted integer doesn't cross a security boundary. Adding a checksum upstream would be cargo-cult.

The heuristic

A file's download calls are exempted iff both hold:

  1. The file contains at least one data-parse marker: .match / .matchAll / .replace / .test / .split / .includes / .startsWith / .endsWith / .toLowerCase() / .toUpperCase() / JSON.parse / parseInt / parseFloat / Number(...).
  2. The file contains no binary-handling markers: tc.extractTar / tc.extractZip / tc.extract7z / tc.extractXar, tc.cacheFile / tc.cacheDir, fs.writeFile family, exec.exec / child_process / spawn, chmod calls, 'chmod +x' strings.

Per-file scope (not per-finding): a file that mixes a metadata fetch with a real archive download keeps the strict check on every download in it. Per-finding scoping would need a real AST to track which response gets passed to which use, and the conservative blanket rule is preferable to a flaky one.

Verified behaviour

Action Before After
dependabot/fetch-metadata@v3.1.0 (the false positive) ✗ 1 unverified JS download ✓ no downloads or all verified
leafo/gh-actions-luarocks@v6.1.0 (real tc.downloadTool + tc.extractTar) ✗ 1 unverified JS download ✗ 1 unverified JS download (still flagged — correct)

Test plan

  • 7 new unit tests in test_security.py::TestPureDataFetchExemption covering: pure-data-fetch exempted, real-binary still flagged, mixed-file not exempt, no-data-parse-marker not exempt, chmod +x string disables exemption, JSON.parse alone exempts, .split alone exempts.
  • Full suite: 146 tests pass.
  • Live smoke against dependabot/fetch-metadata@25dd0e34Binary download verification: ✓ (was ✗).
  • Live smoke against leafo/gh-actions-luarocks@35d062de — still ✗ (the real flag we want).

Generated-by: Claude Opus 4.7 (1M context)

The binary-download check flags any tc.downloadTool / fetch / https.get
in JS source unless the file also has a verification pattern (sha256,
gpg --verify, etc.). That over-flags actions that fetch HTTP responses
purely as data — a metadata badge regex'd for an integer, an API JSON
parsed for a version number — where there is no binary or executable
to verify in the first place.

Add a per-file heuristic: a download is exempt iff the file contains
data-parse markers (.match, .matchAll, JSON.parse, parseInt, .split,
.test, .replace, .startsWith, etc.) AND no binary-handling markers
(tc.extractTar/Zip/7z/Xar, tc.cacheFile/Dir, fs.writeFile family,
exec.exec, child_process, spawn, chmod). Both conditions must hold —
a file that mixes a metadata fetch and a real archive download keeps
the strict check, since per-finding scoping without a real AST would
be fragile.

The motivating false-positive was apache/infrastructure-actions PR
apache#752 (dependabot/fetch-metadata 3.0.0 → 3.1.0), which was flagged for
this code in src/dependabot/verified_commits.ts:

  https.get(`https://dependabot-badges.githubapp.com/.../compatibility_score?...`, res => { ... })
  const scoreChunk = svg.match(/<title>compatibility: (?<score>\d+)%/)
  return scoreChunk?.groups ? parseInt(scoreChunk.groups.score) : 0

A computed-on-demand badge with no possible 'expected hash' to pin
against, and a regex-extracted integer with no security boundary
crossed. With this change the file is recognised as a data fetch and
the check passes; meanwhile leafo/gh-actions-luarocks (real
tc.downloadTool + tc.extractTar) is still correctly flagged.
@potiuk potiuk requested review from dave2wave, dfoulks1 and raboof April 25, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant