Skip to content

v0.6.0 -- Privacy-Preserving Record Linkage (PPRL)

Choose a tag to compare

@benzsevern benzsevern released this 23 Mar 13:55
· 1612 commits to main since this release

Phase 2 of the v1.0.0 Roadmap

Privacy-Preserving Record Linkage

Match records across organizations without sharing raw data. Two modes:

Trusted Third Party (default): Both parties compute bloom filters locally, send them to a coordinator who computes similarity and returns cluster IDs.

SMC (Secure Multi-Party Computation): Secret-shared dice similarity where only match/no-match bits are revealed. No party sees the other's bloom filters.

# Party A and B each have their own CSV
goldenmatch pprl link \
  --file-a hospital_a.csv \
  --file-b hospital_b.csv \
  --fields first_name,last_name,dob,zip \
  --security high \
  --output clusters.csv

Bloom Filter Security Levels

Level Filter Size Hash Functions Features
standard 512 bits 20 Basic CLK
high 1024 bits 30 + per-field HMAC salting
paranoid 2048 bits 40 + balanced padding + trigrams

Stats

  • 894 tests passing (19 new, 0 regressions)
  • CI green on Python 3.11/3.12/3.13

Install / Upgrade

pip install --upgrade goldenmatch
pip install goldenmatch[pprl]  # for SMC protocol (optional)

What's Next

  • v1.0.0: API freeze, production-stable release