Skip to content

ci(pr-leakage): add reusable workflow that scans PRs for customer-data leaks#85

Closed
btipling wants to merge 1 commit into
mainfrom
bt/pr-leakage-check
Closed

ci(pr-leakage): add reusable workflow that scans PRs for customer-data leaks#85
btipling wants to merge 1 commit into
mainfrom
bt/pr-leakage-check

Conversation

@btipling
Copy link
Copy Markdown
Contributor

Summary

Add a reusable workflow and stdlib Python scanner that mechanically block PRs whose title, body, or commit messages contain customer-identifying data, internal tenant identifiers, internal service names, or internal infra URLs.

What changed

  • Add .github/workflows/pr-leakage-check.yamlworkflow_call-shaped reusable workflow. Takes pr_number, fetches title and body and commit messages via gh pr view --json title,body,commits, runs the scanner.
  • Add .github/scripts/pr_leakage_scan.py — pure-stdlib scanner. Twelve always-on regexes and five context-sensitive rules with adjacency or quote-or-error gating. Supports an --expect-fail inversion for regression-fixture use.
  • Add .github/pr-leakage-banned-tokens.yaml — externalized rule set so updates are a one-file PR here; consumer stubs pin @main and pick up changes on the next run.
  • Add .github/pr-leakage-customer-names.txt — whole-word, case-insensitive customer-name denylist seeded from the captured fixtures.
  • Add .github/pr-leakage-skip-allowlist.txt — empty file documenting the skip-token escape hatch (see the README for the literal token and allowlist semantics).
  • Add .github/workflows/pr-leakage-self-test.yaml — runs the scanner against three captured leaky fixtures with --expect-fail and against a clean-fixture set without the flag. Triggers on push to main and on PRs that touch any pr-leakage file.
  • Add tests/fixtures/leakage/{781,863,865}.txt plus a clean/ subfolder.
  • Update README.md with consumer wiring instructions.

Why

config-validation — Public connector repos must not name a specific customer or expose internal service topology in a permanent, world-readable artifact. The motivating examples are three real PRs whose bodies and commit messages named a customer and quoted internal service names.

The check is reusable across every baton-* repo via a small caller stub. The companion stub PR wires it up on ConductorOne/baton-sdk as the pilot consumer.

Test plan

  • python3 .github/scripts/pr_leakage_scan.py --tokens ... --input tests/fixtures/leakage/<N>.txt for each of the three captured leaky fixtures — all three produce findings.
  • python3 .github/scripts/pr_leakage_scan.py --tokens ... --input tests/fixtures/leakage/clean/<N>.txt for each clean fixture — all pass.
  • The scanner runs against this PR's own title and body before opening, and produces zero findings.
  • The self-test workflow encodes both checks under .github/workflows/pr-leakage-self-test.yaml so future rule changes cannot silently weaken the regex set.

Risk

low — No source code changes outside the pr-leakage feature. The reusable workflow is opt-in per consumer repo and ships with if: github.repository != 'ConductorOne/github-workflows' so it does not loop on this repo's own PRs. Branch-protection enforcement is intentionally out of scope and is a follow-up.

Follow-ups

  • After bake-in, a repo admin adds pr-leakage / check to the consumer's required checks.
  • The caller stub currently pins @main; pin to a tagged ref once this repo cuts one.

Refs: ConductorOne/baton-sdk#781, ConductorOne/baton-sdk#863, ConductorOne/baton-sdk#865

…a leaks

What: Add reusable workflow pr-leakage-check.yaml, stdlib Python scanner
      pr_leakage_scan.py, externalized banned-tokens YAML, customer-name
      denylist, skip-allowlist, and a self-test workflow with leaky and clean
      fixtures.
How:  Workflow is invoked via workflow_call from a per-repo caller stub;
      scanner pulls title, body, and commit messages via gh pr view and runs
      twelve always-on regexes plus five context-sensitive rules.
Why:  config-validation — Public connector repos must not name a specific
      customer or expose internal service topology in a permanent
      world-readable artifact.

Refs: ConductorOne/baton-sdk#781, ConductorOne/baton-sdk#863, ConductorOne/baton-sdk#865
@btipling
Copy link
Copy Markdown
Contributor Author

btipling commented May 24, 2026

Closing —

@btipling btipling closed this May 24, 2026
@btipling btipling deleted the bt/pr-leakage-check branch May 24, 2026 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant