Summary
Add a fixture-file corpus test harness to the repository so that every masking rule can be regression-tested against an arbitrarily large set of (input, expected_output) pairs without touching Go source code. New edge-cases reported by users or found during audits can be added by appending a single line to a text file — no code change required.
Background and motivation
The current test pyramid is solid:
- Unit tests (
rules_*_test.go) — cover the happy path and a curated set of fail-closed cases per rule.
- BDD scenarios (
tests/bdd/features/*.feature) — machine-readable specification examples from the requirements document; run with godog under the bdd build tag.
- Matrix tests (
rules_matrices_test.go) — cross-cutting idempotency and mask-character-override contracts for identity, financial, and health categories.
- Fuzz targets (
rules_fuzz_test.go) — invariant-checking (no panic, valid UTF-8, fail-closed) for parsing-heavy rules including email_address, phone_number, url, iban, postal_code, jwt_token, ipv6_address.
What is missing is a high-volume, human-editable fixture corpus. The BDD scenarios are excellent specifications, but they are deliberately minimal — each scenario demonstrates a distinct behavioural rule, not exhaustive format coverage. The unit tests are comprehensive for the inputs the original author thought of, but real-world data quickly surfaces formats nobody anticipated:
- Phone numbers:
00352 vs +352, 352 bare, leading zeros, extension suffixes (+1-212-555-0100 x42), ITU-T vs NANP notation, dot separators (+1.212.555.0100), parentheses variants ((0044) 7911 123456), non-ASCII digits (Arabic-Indic ٠٧٩١١).
- IBANs: space-separated vs compact, lowercase country codes, 2-char vs 34-char bodies across different countries.
- Payment card PANs: 13-digit Visa, 15-digit Amex, 19-digit Maestro, formatted with spaces/hyphens/none.
- Postal codes: every country's regex variants, leading zeros, extended ZIP+4, lowercase, mixed case.
- IP addresses: leading-zero octets, IPv6 abbreviations, zone IDs, mapped addresses.
- Dates of birth: all three spec formats (ISO, slash, month-name), partial dates, edge years.
When a new real-world input is reported as incorrectly masked, there is currently no lightweight way to pin the fix. The fix goes into the unit test, which requires a Go code change, a PR, and a review cycle. A plain-text fixture file lowers this friction to near zero.
Proposed design
Directory layout
tests/
corpus/
phone_number.txt
mobile_phone_number.txt
email_address.txt
payment_card_pan.txt
payment_card_pan_first6.txt
payment_card_pan_last4.txt
payment_card_cvv.txt
payment_card_pin.txt
iban.txt
swift_bic.txt
bank_account_number.txt
uk_sort_code.txt
us_aba_routing_number.txt
monetary_amount.txt
us_ssn.txt
ca_sin.txt
uk_nino.txt
in_aadhaar.txt
in_pan.txt
au_medicare_number.txt
sg_nric_fin.txt
br_cpf.txt
br_cnpj.txt
mx_curp.txt
mx_rfc.txt
cn_resident_id.txt
za_national_id.txt
es_dni_nif_nie.txt
ipv4_address.txt
ipv6_address.txt
mac_address.txt
hostname.txt
url.txt
url_credentials.txt
jwt_token.txt
api_key.txt
password.txt
uuid.txt
imei.txt
imsi.txt
msisdn.txt
postal_code.txt
geo_latitude.txt
geo_longitude.txt
geo_coordinates.txt
date_of_birth.txt
person_name.txt
given_name.txt
family_name.txt
street_address.txt
username.txt
passport_number.txt
driver_license_number.txt
generic_national_id.txt
tax_identifier.txt
medical_record_number.txt
health_plan_beneficiary_id.txt
medical_device_identifier.txt
diagnosis_code.txt
prescription_text.txt
One file per rule. The rule name is the filename stem — the loader derives the rule name directly from the stem, so no mapping table is needed.
Fixture file format
Plain UTF-8 text, one pair per line:
# comment lines (start with #) and blank lines are ignored
<TAB-separated> input<TAB>expected_output
Example — phone_number.txt:
# E.164 — international dialling prefix
+44 7911 123456 +44 **** **3456
+1-800-555-0199 +1-***-***-0199
+33 1 42 86 83 26 +33 * ** ** **26
+352 26 12 34 +352 ** **34
# NANP local format
(555) 123-4567 (***) ***-4567
555-123-4567 ***-***-4567
# UK domestic (no + prefix)
07911 123456 ***** **3456
0044 7911 123456 **** **** **3456
# 00-prefix international — with spaces (no + prefix, fail-closed)
00352 26 12 34 **************
0033 1 42 86 83 26 ******************
# 00-prefix international — compact no spaces (also fail-closed — no + prefix)
00352261234 ***********
00441234567890 **************
# Dot separator
+1.212.555.0100 +1.***.***.0100
# Fail-closed cases
1-800-FLOWERS *************
+ *
+44 ***
arabic-indic ٠٧٩١١ *********
Rules:
- Lines beginning with
# (after optional whitespace) are ignored.
- Blank lines are ignored.
- Fields are separated by a single tab (
\t). Tab was chosen over comma because masked output can contain commas; it is chosen over pipe because the library targets log-scrubbing where pipes appear in DSNs and connection strings.
- The input field may be empty (an empty string before the tab) — this encodes the
"" → "" contract.
- The expected-output field may also be empty.
- A line with no tab at all is a format error and must fail the test loudly (not silently skip).
- Trailing newline is optional;
\r\n line endings are normalised.
Go test runner
New file: tests/corpus/corpus_test.go
//go:build corpus
package corpus_test
import (
"bufio"
"os"
"path/filepath"
"strings"
"testing"
"unicode/utf8"
"github.com/axonops/mask"
)
// TestCorpus iterates every tests/corpus/*.txt file, derives the rule
// name from the filename stem, and asserts that mask.Apply(rule, input)
// == expected for each non-comment, non-blank line.
func TestCorpus(t *testing.T) {
t.Parallel()
files, err := filepath.Glob("*.txt")
if err != nil || len(files) == 0 {
t.Fatal("no corpus files found — run from tests/corpus/")
}
for _, path := range files {
path := path
rule := strings.TrimSuffix(filepath.Base(path), ".txt")
t.Run(rule, func(t *testing.T) {
t.Parallel()
runCorpusFile(t, rule, path)
})
}
}
func runCorpusFile(t *testing.T, rule, path string) {
t.Helper()
f, err := os.Open(path)
if err != nil {
t.Fatalf("open %s: %v", path, err)
}
defer f.Close()
lineNo := 0
passed, failed := 0, 0
sc := bufio.NewScanner(f)
for sc.Scan() {
lineNo++
raw := sc.Text()
// Normalise Windows line endings.
raw = strings.TrimRight(raw, "\r")
// Skip blank lines and comment lines.
trimmed := strings.TrimSpace(raw)
if trimmed == "" || strings.HasPrefix(trimmed, "#") {
continue
}
parts := strings.SplitN(raw, "\t", 2)
if len(parts) != 2 {
t.Errorf("%s:%d: malformed line (no tab separator): %q", path, lineNo, raw)
failed++
continue
}
input, want := parts[0], parts[1]
got := mask.Apply(rule, input)
if got != want {
t.Errorf("%s:%d:\n rule: %s\n input: %q\n want: %q\n got: %q",
path, lineNo, rule, input, want, got)
failed++
} else {
passed++
}
// Invariant: output is always valid UTF-8.
if !utf8.ValidString(got) {
t.Errorf("%s:%d: rule %s produced invalid UTF-8 for input %q: % x",
path, lineNo, rule, input, []byte(got))
}
}
if err := sc.Err(); err != nil {
t.Fatalf("%s: scanner error: %v", path, err)
}
t.Logf("corpus %s: %d passed, %d failed (total %d)", rule, passed, failed, passed+failed)
if failed > 0 {
t.Logf("To update: edit %s and re-run 'make test-corpus'", path)
}
if passed+failed == 0 {
t.Errorf("%s: file contained no test cases — add at least one fixture or delete the file", path)
}
}
// TestCorpusCompleteness verifies that every rule registered in
// rule_names.go has a corresponding corpus fixture file. This prevents
// adding a new masking rule without a corpus file.
func TestCorpusCompleteness(t *testing.T) {
files, err := filepath.Glob("*.txt")
if err != nil {
t.Fatal(err)
}
have := make(map[string]bool, len(files))
for _, f := range files {
have[strings.TrimSuffix(filepath.Base(f), ".txt")] = true
}
for _, name := range mask.RuleNames() {
if !have[name] {
t.Errorf("rule %q has no corpus file — create tests/corpus/%s.txt", name, name)
}
}
}
// TestCorpusUnknownRule verifies that a corpus file whose name does not
// match any registered rule fails immediately with a clear error rather
// than silently returning [REDACTED] for every line.
func TestCorpusUnknownRule(t *testing.T) {
unknown := "zz_unknown_rule_for_test"
result := mask.Apply(unknown, "anything")
// mask.Apply returns full-redact for unknown rules. Verify the
// exact value matches whatever the library currently returns so
// this test doesn't depend on an unexported constant.
if result == "anything" {
t.Fatal("expected full-redact for unknown rule, got original value back")
}
}
// BenchmarkCorpus_PhoneNumber provides a rough throughput figure for the
// phone_number corpus to catch any unexpected O(n²) behaviour introduced
// when the corpus grows.
func BenchmarkCorpus_PhoneNumber(b *testing.B) {
f, err := os.Open("phone_number.txt")
if err != nil {
b.Skip("phone_number.txt not present")
}
defer f.Close()
type pair struct{ in, want string }
var cases []pair
sc := bufio.NewScanner(f)
for sc.Scan() {
raw := strings.TrimRight(sc.Text(), "\r")
trimmed := strings.TrimSpace(raw)
if trimmed == "" || strings.HasPrefix(trimmed, "#") {
continue
}
parts := strings.SplitN(raw, "\t", 2)
if len(parts) == 2 {
cases = append(cases, pair{parts[0], parts[1]})
}
}
if len(cases) == 0 {
b.Skip("no cases")
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
c := cases[i%len(cases)]
got := mask.Apply("phone_number", c.in)
if got != c.want {
b.Fatalf("mismatch: input=%q want=%q got=%q", c.in, c.want, got)
}
}
}
Note on the corpus build tag: This test file is gated behind //go:build corpus, meaning go test ./... will NOT run corpus tests. This is intentional — corpus tests are a separate quality gate. Contributors must use make test-corpus (or make check, which includes it). The CONTRIBUTING.md note (see below) must make this explicit.
Note on TestCorpusCompleteness: This test requires a mask.RuleNames() function that returns all registered built-in rule names. If this function does not already exist, it must be added as part of this issue. It should return a []string of all built-in rule name constants from rule_names.go.
Note on TestCorpusUnknownRule: The original version referenced mask.FullRedactMarker. To avoid depending on an unexported or potentially non-existent constant, the test instead verifies the result is not the original input value. If FullRedactMarker is an exported constant, prefer using it directly.
Makefile targets
CORPUS_PKG := ./tests/corpus/...
.PHONY: test-corpus
test-corpus: ## Run corpus fixture tests
@if [ -d tests/corpus ]; then \
cd tests/corpus && $(GO) test -race -count=1 -tags corpus .; \
else \
echo "tests/corpus not present yet — skipping corpus run"; \
fi
.PHONY: check
check: fmt-check vet lint tidy-check test test-bdd test-corpus coverage security
Add to the check target so corpus runs as part of the full quality gate.
CI workflow addition
In .github/workflows/ci.yml, add a step parallel to the existing bdd job:
corpus:
name: Corpus fixture tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version-file: go.mod
- name: Run corpus tests
run: make test-corpus
Ensure the CI step runs from the repo root (the Makefile cd tests/corpus && handles the working directory).
Initial seed fixtures — priority matrix
The table below ranks categories by their real-world format surface area and therefore the return-on-investment for seeding fixtures first.
| Priority |
Category |
Rule(s) |
Why high value |
| 🔴 P1 |
Telecom |
phone_number, mobile_phone_number |
International dialling prefixes, 00NNN vs +NNN, dot separators, extensions, domestic formats for every major country code |
| 🔴 P1 |
Financial |
payment_card_pan, iban, swift_bic |
Card scheme length variants (13/15/16/19), space vs hyphen vs no separator; IBAN per-country length (GB=22, DE=22, FR=27, NL=18, etc.) |
| 🔴 P1 |
Country identity |
us_ssn, uk_nino, br_cpf, in_aadhaar |
Dashed vs compact, uppercase vs lowercase, leading zeros |
| 🟠 P2 |
Technology |
ipv4_address, ipv6_address, url, jwt_token |
IPv6 abbreviation forms; URL userinfo with special chars; JWTs with non-standard padding |
| 🟠 P2 |
Telecom |
postal_code |
UK outcodes (1-2 letters + 1-2 digits), US ZIP+4, Canadian FSA+LDU |
| 🟡 P3 |
Identity |
date_of_birth, email_address, person_name |
ISO vs slash vs month-name dates; email punycode domains; CJK names |
| 🟡 P3 |
Health |
medical_record_number, health_plan_beneficiary_id |
Prefix walk on non-ASCII letters (documented edge case) |
Phone number seed examples (illustrative — not exhaustive)
The following groups illustrate the kind of cases the phone_number.txt file should cover. This is not the complete file — it is a structured catalogue to guide the initial author.
# ── E.164 (leading +, up to 3-digit country code) ─────────────────────────
+1 212 555 0100 +1 *** ***0100
+1-212-555-0100 +1-***-***0100
+1.212.555.0100 +1.***.***.0100
+44 7911 123456 +44 **** **3456
+44-7911-123456 +44-****-**3456
+352 26 12 34 +352 ** **34
+33 1 42 86 83 26 +33 * ** ** **26
+81 3 1234 5678 +81 * **** **78
+86 138 0013 8000 +86 *** **** **00
+49 89 636 48018 +49 ** *** ***18
+55 11 91234 5678 +55 ** ***** **78
+61 2 9374 4000 +61 * **** **00
+7 495 123-45-67 +7 *** ***-**-67
+34 91 123 45 67 +34 ** *** **67
+39 02 1234 5678 +39 ** **** **78
+27 21 123 4567 +27 ** *** **67
+971 4 123 4567 +971 * *** **67
+82 2 1234 5678 +82 * **** **78
+65 6234 5678 +65 **** **78
+91 98765 43210 +91 ***** **210
# ── 00-prefix with spaces (ITU-T alternative to +) ───────────────────────
# These do NOT match the +NN prefix rule — no + is present.
# Expected output per fail-closed contract: same-length mask.
00352 26 12 34 **************
0044 7911 123456 ****************
001 212 555 0100 ****************
# ── 00-prefix compact (no spaces, no + prefix) ───────────────────────────
# Also fail-closed — pins the behaviour for the compact variant explicitly.
# If the rule is later taught to treat 00 as equivalent to +, these fixtures
# will surface the change as a deliberate, visible regression.
00352261234 ***********
00441234567890 **************
0013105551234 *************
# ── NANP domestic ─────────────────────────────────────────────────────────
(555) 123-4567 (***) ***-4567
555-123-4567 ***-***-4567
5551234567 ******4567
555 123 4567 *** ***4567
# ── UK domestic ───────────────────────────────────────────────────────────
07911 123456 ***** **3456
020 7946 0958 *** **** **58
01632 960961 ***** ***61
# ── Fail-closed cases ─────────────────────────────────────────────────────
1-800-FLOWERS *************
+ *
+44 ***
٠٧٩١١ ١٢٣٤٥٦ ************
Note on 00NNN vs +NNN: the current phone_number rule requires a literal + to identify the country code prefix. 00352 has no +, so the entire string is treated as a body — if it fails the digit-only check (because it contains spaces), it routes to SameLengthMask. The corpus file pins both the spaced and compact 00-prefix variants so any future change (e.g. teaching the rule to treat 00 as equivalent to +) is a deliberate, visible regression.
IBAN seed examples
The IBAN rule preserves first 4 (country code + check digits) and last 4 non-separator chars. Fixtures should cover all country codes in ISO 13616 because body length varies from 15 (Norway) to 34 (Malta).
Note: The v0.9.0 requirements document states that IBAN should "preserve grouping spaces if present." The implementer must verify whether the current implementation handles space-grouped IBANs. If it does, the space-grouped fixture below should show preserved separators with masking applied. If it does not, the fixture should show fail-closed behaviour. Either way, the corpus pins the actual current behaviour — any future change to add or remove space handling will surface as a regression.
# Compact form
GB82WEST12345698765432 GB82**************5432
DE89370400440532013000 DE89**************3000
FR7630006000011234567890189 FR76*******************89
NL91ABNA0417164300 NL91**********4300
NO9386011117947 NO93*******7947
MT84MALT011000012345MTLCAST001S MT84**********************001S
# Space-grouped — verify current behaviour and pin it
# If the rule handles spaces: preserve separators with masking
# If the rule does not: fail closed to same-length mask
# The implementer MUST run the rule against these inputs and record
# the actual output as the expected value.
GB82 WEST 1234 5698 7654 32 <VERIFY_AND_PIN_ACTUAL_OUTPUT>
# Lowercase country code (fail closed)
gb82WEST12345698765432 **********************
# Too short (fail closed)
GB82WEST ********
Minimum fixture count — tiered by rule complexity
Not all rules have the same format surface area. The minimum fixture count is tiered:
| Tier |
Rule type |
Minimum fixtures |
Examples |
| Simple |
Full-redact or same-length wrappers |
10 |
payment_card_cvv, payment_card_pin, password, private_key_pem, diagnosis_code, prescription_text, monetary_amount |
| Standard |
Format-aware with limited variants |
20 |
us_ssn, uk_nino, bank_account_number, mac_address, uuid, username, given_name, family_name |
| High-variance |
Multiple separator styles, international formats, complex parsing |
50+ |
phone_number, iban, payment_card_pan, postal_code, email_address, url, ipv6_address, date_of_birth, person_name |
CONTRIBUTING.md addition
Add the following section to CONTRIBUTING.md under the testing guidance:
### Corpus Fixture Tests
The `tests/corpus/` directory contains bulk fixture files for regression testing. Each file
is named after a masking rule (e.g., `phone_number.txt`) and contains tab-separated
`input<TAB>expected_output` pairs, one per line.
**To run corpus tests:**
```bash
make test-corpus
Important: go test ./... does NOT run corpus tests — they are gated behind the
corpus build tag. Always use make test-corpus or make check (which includes it).
To add a fixture for a bug report:
- Identify the masking rule (e.g.,
phone_number).
- Determine the expected masked output for the problematic input.
- Append a line to
tests/corpus/<rule_name>.txt:
+33 (0)1 42 86 83 26 +33 (*)* ** ** **26
- Run
make test-corpus to confirm the test fails (red).
- Fix the rule implementation.
- Run
make test-corpus to confirm the test passes (green).
- Open a PR. The fixture line IS the regression test.
***
## Contribution workflow
Once the harness is in place, the workflow for a reported bug becomes:
1. Receive report: `phone_number` applied to `+33 (0)1 42 86 83 26` returns wrong output.
2. Confirm the bug locally: `echo '+33 (0)1 42 86 83 26\t<expected>' >> tests/corpus/phone_number.txt && make test-corpus`.
3. Fix the rule in `rules_telecom.go`.
4. Re-run `make test-corpus` — green.
5. Open a PR. The fixture line is the regression test. No Go test code written.
***
## Out of scope for this issue
- A generator script that auto-produces fixture lines from public number registries (useful, separate issue).
- Integration with the fuzz seed corpus (`testdata/fuzz/`) — the fuzz targets already have their own seeds and the corpus files serve a different purpose (exact input/output pinning rather than invariant checking).
- A `--update` / golden-file rewrite flag — the corpus format is deliberately simple and human-editable; an update flag would undermine the regression-detection purpose.
***
## Implementation notes
- **`mask.RuleNames()` function:** The `TestCorpusCompleteness` test requires a function that returns all registered built-in rule names as a `[]string`. If this function does not already exist in the public API, it must be added as part of this issue. It should return the names from the `RuleXxx` constants in `rule_names.go`.
- **Working directory:** The Makefile `test-corpus` target uses `cd tests/corpus &&` to set the working directory. The CI step runs from the repo root and relies on the Makefile to handle this. Do not set a custom working directory in the CI workflow step.
- **IBAN space-grouped behaviour:** The implementer must verify whether the current `iban` rule handles space-grouped input or fails closed, and pin whichever behaviour exists. If this reveals a gap between the requirements doc and the implementation, file a separate issue to track the discrepancy.
***
## Acceptance criteria
- [ ] `tests/corpus/` directory exists with one `.txt` file per rule (matching every `RuleXxx` constant in `rule_names.go`).
- [ ] `tests/corpus/corpus_test.go` compiles under `-tags corpus` and passes `go vet`.
- [ ] `TestCorpusCompleteness` exists and verifies every rule in `mask.RuleNames()` has a corresponding `.txt` file.
- [ ] `mask.RuleNames()` function exists in the public API if it does not already (returns `[]string` of all built-in rule names).
- [ ] `make test-corpus` passes on a clean checkout.
- [ ] `make check` includes `test-corpus`.
- [ ] CI `corpus` job is green on `main`.
- [ ] Fixture count meets the tiered minimum: 10 for simple rules, 20 for standard rules, 50+ for high-variance rules.
- [ ] `phone_number.txt` covers `+NNN`, NANP, UK domestic, dot separators, and explicitly pins both spaced and compact `00NNN` fail-closed behaviour.
- [ ] `iban.txt` covers at least GB, DE, FR, NL, NO, and MT (range of body lengths), and pins the space-grouped behaviour (whichever it currently is).
- [ ] `payment_card_pan.txt` covers 13-digit (Visa electron), 15-digit (Amex), 16-digit (Visa/MC), 19-digit (Maestro), all separator variants.
- [ ] `CONTRIBUTING.md` is updated with the corpus fixture section explaining how to add fixtures, with an explicit note that `make test-corpus` (not `go test ./...`) is the command to run.
Summary
Add a fixture-file corpus test harness to the repository so that every masking rule can be regression-tested against an arbitrarily large set of
(input, expected_output)pairs without touching Go source code. New edge-cases reported by users or found during audits can be added by appending a single line to a text file — no code change required.Background and motivation
The current test pyramid is solid:
rules_*_test.go) — cover the happy path and a curated set of fail-closed cases per rule.tests/bdd/features/*.feature) — machine-readable specification examples from the requirements document; run withgodogunder thebddbuild tag.rules_matrices_test.go) — cross-cutting idempotency and mask-character-override contracts for identity, financial, and health categories.rules_fuzz_test.go) — invariant-checking (no panic, valid UTF-8, fail-closed) for parsing-heavy rules includingemail_address,phone_number,url,iban,postal_code,jwt_token,ipv6_address.What is missing is a high-volume, human-editable fixture corpus. The BDD scenarios are excellent specifications, but they are deliberately minimal — each scenario demonstrates a distinct behavioural rule, not exhaustive format coverage. The unit tests are comprehensive for the inputs the original author thought of, but real-world data quickly surfaces formats nobody anticipated:
00352vs+352,352bare, leading zeros, extension suffixes (+1-212-555-0100 x42), ITU-T vs NANP notation, dot separators (+1.212.555.0100), parentheses variants ((0044) 7911 123456), non-ASCII digits (Arabic-Indic٠٧٩١١).When a new real-world input is reported as incorrectly masked, there is currently no lightweight way to pin the fix. The fix goes into the unit test, which requires a Go code change, a PR, and a review cycle. A plain-text fixture file lowers this friction to near zero.
Proposed design
Directory layout
One file per rule. The rule name is the filename stem — the loader derives the rule name directly from the stem, so no mapping table is needed.
Fixture file format
Plain UTF-8 text, one pair per line:
Example —
phone_number.txt:Rules:
#(after optional whitespace) are ignored.\t). Tab was chosen over comma because masked output can contain commas; it is chosen over pipe because the library targets log-scrubbing where pipes appear in DSNs and connection strings.""→""contract.\r\nline endings are normalised.Go test runner
New file:
tests/corpus/corpus_test.goNote on the
corpusbuild tag: This test file is gated behind//go:build corpus, meaninggo test ./...will NOT run corpus tests. This is intentional — corpus tests are a separate quality gate. Contributors must usemake test-corpus(ormake check, which includes it). The CONTRIBUTING.md note (see below) must make this explicit.Note on
TestCorpusCompleteness: This test requires amask.RuleNames()function that returns all registered built-in rule names. If this function does not already exist, it must be added as part of this issue. It should return a[]stringof all built-in rule name constants fromrule_names.go.Note on
TestCorpusUnknownRule: The original version referencedmask.FullRedactMarker. To avoid depending on an unexported or potentially non-existent constant, the test instead verifies the result is not the original input value. IfFullRedactMarkeris an exported constant, prefer using it directly.Makefile targets
Add to the
checktarget so corpus runs as part of the full quality gate.CI workflow addition
In
.github/workflows/ci.yml, add a step parallel to the existingbddjob:Ensure the CI step runs from the repo root (the Makefile
cd tests/corpus &&handles the working directory).Initial seed fixtures — priority matrix
The table below ranks categories by their real-world format surface area and therefore the return-on-investment for seeding fixtures first.
phone_number,mobile_phone_number00NNNvs+NNN, dot separators, extensions, domestic formats for every major country codepayment_card_pan,iban,swift_bicus_ssn,uk_nino,br_cpf,in_aadhaaripv4_address,ipv6_address,url,jwt_tokenpostal_codedate_of_birth,email_address,person_namemedical_record_number,health_plan_beneficiary_idPhone number seed examples (illustrative — not exhaustive)
The following groups illustrate the kind of cases the
phone_number.txtfile should cover. This is not the complete file — it is a structured catalogue to guide the initial author.IBAN seed examples
The IBAN rule preserves first 4 (country code + check digits) and last 4 non-separator chars. Fixtures should cover all country codes in ISO 13616 because body length varies from 15 (Norway) to 34 (Malta).
Note: The v0.9.0 requirements document states that IBAN should "preserve grouping spaces if present." The implementer must verify whether the current implementation handles space-grouped IBANs. If it does, the space-grouped fixture below should show preserved separators with masking applied. If it does not, the fixture should show fail-closed behaviour. Either way, the corpus pins the actual current behaviour — any future change to add or remove space handling will surface as a regression.
Minimum fixture count — tiered by rule complexity
Not all rules have the same format surface area. The minimum fixture count is tiered:
payment_card_cvv,payment_card_pin,password,private_key_pem,diagnosis_code,prescription_text,monetary_amountus_ssn,uk_nino,bank_account_number,mac_address,uuid,username,given_name,family_namephone_number,iban,payment_card_pan,postal_code,email_address,url,ipv6_address,date_of_birth,person_nameCONTRIBUTING.md addition
Add the following section to
CONTRIBUTING.mdunder the testing guidance:To add a fixture for a bug report:
phone_number).tests/corpus/<rule_name>.txt:make test-corpusto confirm the test fails (red).make test-corpusto confirm the test passes (green).