Skip to content

feat(core): brotli-compress .socket.facts.json on full-scan upload#219

Merged
Martin Torp (mtorp) merged 1 commit into
mainfrom
martin/eng-5093-tier-1-reachability-scans-failing-with-502-error
Jun 2, 2026
Merged

feat(core): brotli-compress .socket.facts.json on full-scan upload#219
Martin Torp (mtorp) merged 1 commit into
mainfrom
martin/eng-5093-tier-1-reachability-scans-failing-with-502-error

Conversation

@mtorp
Copy link
Copy Markdown
Contributor

Summary

Brotli-compress the reachability facts file (.socket.facts.json) before it is uploaded as part of a full scan. The Socket API transparently decompresses any multipart part whose basename is exactly .socket.facts.json.br and stores it as plain .socket.facts.json, so the stored result is unchanged — but the on-the-wire payload shrinks dramatically (typically ~10–40×).

Motivation

Large tier‑1 reachability facts files can exceed the API's per‑file upload size cap. When that happens the full‑scan upload fails (surfaced to the CLI as an HTTP 4xx/“502”), leaving the scan stuck with no report. Compressing on upload keeps even very large facts files well under the cap, relying on the API's existing transparent .br decompression.

What changed

  • Compression happens at the single upload boundary (Core.create_full_scan), so it covers every full‑scan path (normal, diff, API‑mode, --only-facts-file, pre‑generated SBOMs).
  • The on‑disk .socket.facts.json is left untouched — local consumers (SARIF/JSON output, tier‑1 finalize, alert selection) keep reading the plain file. Only the uploaded multipart part is swapped to a temporary .socket.facts.json.br sibling, which is deleted after upload.
  • Only a file whose basename is exactly .socket.facts.json is compressed (the API matches that exact name). A custom --reach-output-file name and empty baseline‑scan placeholder files are uploaded plain, as before.
  • Compression never blocks an upload: any failure (e.g. unwritable dir) falls back to uploading the plain file.
  • Streams the source in 1 MiB chunks so large files aren't held fully in memory.
  • Adds a brotli (CPython) / brotlicffi (PyPy) dependency. Patch version bump 2.3.02.3.1.

Testing

  • Unit (tests/core/test_facts_compression.py): roundtrip, multipart‑entry rewrite, directory‑prefix preservation, empty‑file skip, custom‑name skip, and compression‑failure fallback. Full suite: 261 passed, 2 skipped.
  • End‑to‑end: drove the real socketdev SDK + requests multipart encoder against a local capture server. Confirmed the server receives a part named exactly .socket.facts.json.br containing valid brotli that decompresses to the byte‑exact original facts JSON; the plain part is not sent; the temp file is cleaned up; the on‑disk file is preserved.

Not exercised in CI: the live server‑side decompression round‑trip (depends on the deployed API) and a real reachability (Coana) run — the analysis/generation code path is unchanged by this PR.

🤖 Generated with Claude Code

@socket-security
Copy link
Copy Markdown

socket-security Bot commented Jun 2, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedpypi/​brotli@​1.2.0100100100100100
Addedpypi/​brotlicffi@​1.2.0.1100100100100100

View full report

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

🚀 Preview package published!

Install with:

pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple socketsecurity==2.3.1.dev3

Docker image: socketdev/cli:pr-219

@socket-security-staging
Copy link
Copy Markdown

socket-security-staging Bot commented Jun 2, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedpypi/​brotli@​1.2.0100100100100100
Addedpypi/​brotlicffi@​1.2.0.1100100100100100

View full report

Compress the reachability facts file to a `.socket.facts.json.br` multipart
part before uploading it as part of a full scan. The Socket API transparently
decompresses parts named exactly `.socket.facts.json.br` and stores plain JSON,
so the stored result is unchanged while the on-the-wire payload shrinks by
roughly 10-40x for typical facts files.

This keeps large tier-1 reachability facts files under the API's per-file
upload size cap. Previously an oversized facts file made the full-scan upload
fail (surfaced as an HTTP 4xx/502 with the scan stuck and no report produced).

- Compress at the upload boundary (Core.create_full_scan); the on-disk file is
  left untouched so local consumers still read plain .socket.facts.json.
- Only files whose basename is exactly .socket.facts.json are compressed (the
  API matches that exact name); a custom --reach-output-file name and empty
  placeholder files are left as plain uploads.
- Stream in 1 MiB chunks so large files aren't held fully in memory.
- Never blocks an upload: any compression failure falls back to the plain file,
  and a partially-written .socket.facts.json.br is removed rather than left
  behind in the target directory.
- Add brotli (CPython) / brotlicffi (PyPy) dependency.
@mtorp Martin Torp (mtorp) force-pushed the martin/eng-5093-tier-1-reachability-scans-failing-with-502-error branch from 9eb52f5 to 52a5a7e Compare June 2, 2026 11:12
@mtorp Martin Torp (mtorp) marked this pull request as ready for review June 2, 2026 11:32
@mtorp Martin Torp (mtorp) requested a review from a team as a code owner June 2, 2026 11:32
@mtorp Martin Torp (mtorp) merged commit 152ea21 into main Jun 2, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants