Skip to content
This repository was archived by the owner on Jun 13, 2026. It is now read-only.

Scorers

bsevern edited this page Apr 10, 2026 · 2 revisions

Scorers

Built-in Scorers

Scorer Weight Signal
ExactScorer 1.0 Case-insensitive exact column name match
AliasScorer 0.95 Built-in synonym registry + schema file aliases + Domain Dictionaries
InitialismScorer 0.75 Abbreviation matching (assay_id ↔ ASSI, confidence_score ↔ CONSC) (new in v0.3)
PatternTypeScorer 0.7 Regex-based semantic type detection
ProfileScorer 0.5 Statistical profile comparison
FuzzyNameScorer 0.4 Jaro-Winkler fuzzy name similarity (with common-prefix canonicalization)

Score Combination

For each (source, target) pair:

score = sum(scorer.weight * result.score) / sum(scorer.weight)
  • Scorers returning None are excluded (abstain)
  • Scorers returning 0.0 are included (real "no match" signal)
  • Pairs with fewer than 2 non-None scorers get score 0.0
  • Pairs below min_confidence (default 0.2 since v0.3) are dropped

The Hungarian algorithm then finds the globally optimal 1:1 assignment.

Alias Registry

15 canonical groups ship built-in:

Canonical Aliases
first_name fname, first, given_name, first_nm, forename
last_name lname, last, surname, family_name, last_nm
email email_address, e_mail, email_addr, mail, contact_email
phone phone_number, ph, telephone, tel, mobile, cell
zip zipcode, zip_code, postal_code, postal, postcode
address addr, street_address, addr_line_1, mailing_address
name full_name, fullname, customer_name, display_name
company organization, org, business, employer, firm
dob date_of_birth, birth_date, birthdate, birthday
... + city, state, country, gender, id, created_at

Extend via infermap.yaml:

aliases:
  mrn: [medical_record_number, patient_id, chart_number]

Semantic Type Patterns

PatternTypeScorer recognizes: email, phone, zip_us, date_iso, uuid, url, currency, ip_v4. A field is classified when 60%+ of sampled values match a pattern.

Custom Scorers

import infermap
from infermap.types import FieldInfo, ScorerResult

@infermap.scorer(name="my_scorer", weight=0.7)
def my_scorer(source: FieldInfo, target: FieldInfo) -> ScorerResult | None:
    # Access source.name, source.sample_values, source.dtype, etc.
    return ScorerResult(score=0.8, reasoning="my logic")

Clone this wiki locally