This repository was archived by the owner on Jun 13, 2026. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 0
Scorers
bsevern edited this page Apr 10, 2026
·
2 revisions
| Scorer | Weight | Signal |
|---|---|---|
| ExactScorer | 1.0 | Case-insensitive exact column name match |
| AliasScorer | 0.95 | Built-in synonym registry + schema file aliases + Domain Dictionaries |
| InitialismScorer | 0.75 | Abbreviation matching (assay_id ↔ ASSI, confidence_score ↔ CONSC) (new in v0.3) |
| PatternTypeScorer | 0.7 | Regex-based semantic type detection |
| ProfileScorer | 0.5 | Statistical profile comparison |
| FuzzyNameScorer | 0.4 | Jaro-Winkler fuzzy name similarity (with common-prefix canonicalization) |
For each (source, target) pair:
score = sum(scorer.weight * result.score) / sum(scorer.weight)
- Scorers returning
Noneare excluded (abstain) - Scorers returning
0.0are included (real "no match" signal) - Pairs with fewer than 2 non-None scorers get score 0.0
- Pairs below
min_confidence(default 0.2 since v0.3) are dropped
The Hungarian algorithm then finds the globally optimal 1:1 assignment.
15 canonical groups ship built-in:
| Canonical | Aliases |
|---|---|
| first_name | fname, first, given_name, first_nm, forename |
| last_name | lname, last, surname, family_name, last_nm |
| email_address, e_mail, email_addr, mail, contact_email | |
| phone | phone_number, ph, telephone, tel, mobile, cell |
| zip | zipcode, zip_code, postal_code, postal, postcode |
| address | addr, street_address, addr_line_1, mailing_address |
| name | full_name, fullname, customer_name, display_name |
| company | organization, org, business, employer, firm |
| dob | date_of_birth, birth_date, birthdate, birthday |
| ... | + city, state, country, gender, id, created_at |
Extend via infermap.yaml:
aliases:
mrn: [medical_record_number, patient_id, chart_number]PatternTypeScorer recognizes: email, phone, zip_us, date_iso, uuid, url, currency, ip_v4. A field is classified when 60%+ of sampled values match a pattern.
import infermap
from infermap.types import FieldInfo, ScorerResult
@infermap.scorer(name="my_scorer", weight=0.7)
def my_scorer(source: FieldInfo, target: FieldInfo) -> ScorerResult | None:
# Access source.name, source.sample_values, source.dtype, etc.
return ScorerResult(score=0.8, reasoning="my logic")