v0.6.0 -- Privacy-Preserving Record Linkage (PPRL)
Phase 2 of the v1.0.0 Roadmap
Privacy-Preserving Record Linkage
Match records across organizations without sharing raw data. Two modes:
Trusted Third Party (default): Both parties compute bloom filters locally, send them to a coordinator who computes similarity and returns cluster IDs.
SMC (Secure Multi-Party Computation): Secret-shared dice similarity where only match/no-match bits are revealed. No party sees the other's bloom filters.
# Party A and B each have their own CSV
goldenmatch pprl link \
--file-a hospital_a.csv \
--file-b hospital_b.csv \
--fields first_name,last_name,dob,zip \
--security high \
--output clusters.csvBloom Filter Security Levels
| Level | Filter Size | Hash Functions | Features |
|---|---|---|---|
| standard | 512 bits | 20 | Basic CLK |
| high | 1024 bits | 30 | + per-field HMAC salting |
| paranoid | 2048 bits | 40 | + balanced padding + trigrams |
Stats
- 894 tests passing (19 new, 0 regressions)
- CI green on Python 3.11/3.12/3.13
Install / Upgrade
pip install --upgrade goldenmatch
pip install goldenmatch[pprl] # for SMC protocol (optional)What's Next
- v1.0.0: API freeze, production-stable release