ci(pr-leakage): add reusable workflow that scans PRs for customer-data leaks#85
Closed
btipling wants to merge 1 commit into
Closed
ci(pr-leakage): add reusable workflow that scans PRs for customer-data leaks#85btipling wants to merge 1 commit into
btipling wants to merge 1 commit into
Conversation
…a leaks
What: Add reusable workflow pr-leakage-check.yaml, stdlib Python scanner
pr_leakage_scan.py, externalized banned-tokens YAML, customer-name
denylist, skip-allowlist, and a self-test workflow with leaky and clean
fixtures.
How: Workflow is invoked via workflow_call from a per-repo caller stub;
scanner pulls title, body, and commit messages via gh pr view and runs
twelve always-on regexes plus five context-sensitive rules.
Why: config-validation — Public connector repos must not name a specific
customer or expose internal service topology in a permanent
world-readable artifact.
Refs: ConductorOne/baton-sdk#781, ConductorOne/baton-sdk#863, ConductorOne/baton-sdk#865
Contributor
Author
|
Closing — |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a reusable workflow and stdlib Python scanner that mechanically block PRs whose title, body, or commit messages contain customer-identifying data, internal tenant identifiers, internal service names, or internal infra URLs.
What changed
.github/workflows/pr-leakage-check.yaml—workflow_call-shaped reusable workflow. Takespr_number, fetches title and body and commit messages viagh pr view --json title,body,commits, runs the scanner..github/scripts/pr_leakage_scan.py— pure-stdlib scanner. Twelve always-on regexes and five context-sensitive rules with adjacency or quote-or-error gating. Supports an--expect-failinversion for regression-fixture use..github/pr-leakage-banned-tokens.yaml— externalized rule set so updates are a one-file PR here; consumer stubs pin@mainand pick up changes on the next run..github/pr-leakage-customer-names.txt— whole-word, case-insensitive customer-name denylist seeded from the captured fixtures..github/pr-leakage-skip-allowlist.txt— empty file documenting the skip-token escape hatch (see the README for the literal token and allowlist semantics)..github/workflows/pr-leakage-self-test.yaml— runs the scanner against three captured leaky fixtures with--expect-failand against a clean-fixture set without the flag. Triggers on push to main and on PRs that touch any pr-leakage file.tests/fixtures/leakage/{781,863,865}.txtplus aclean/subfolder.README.mdwith consumer wiring instructions.Why
config-validation — Public connector repos must not name a specific customer or expose internal service topology in a permanent, world-readable artifact. The motivating examples are three real PRs whose bodies and commit messages named a customer and quoted internal service names.
The check is reusable across every
baton-*repo via a small caller stub. The companion stub PR wires it up onConductorOne/baton-sdkas the pilot consumer.Test plan
python3 .github/scripts/pr_leakage_scan.py --tokens ... --input tests/fixtures/leakage/<N>.txtfor each of the three captured leaky fixtures — all three produce findings.python3 .github/scripts/pr_leakage_scan.py --tokens ... --input tests/fixtures/leakage/clean/<N>.txtfor each clean fixture — all pass..github/workflows/pr-leakage-self-test.yamlso future rule changes cannot silently weaken the regex set.Risk
low — No source code changes outside the pr-leakage feature. The reusable workflow is opt-in per consumer repo and ships with
if: github.repository != 'ConductorOne/github-workflows'so it does not loop on this repo's own PRs. Branch-protection enforcement is intentionally out of scope and is a follow-up.Follow-ups
pr-leakage / checkto the consumer's required checks.@main; pin to a tagged ref once this repo cuts one.Refs: ConductorOne/baton-sdk#781, ConductorOne/baton-sdk#863, ConductorOne/baton-sdk#865