This repository was archived by the owner on Jun 13, 2026. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture
benzsevern edited this page Mar 29, 2026
·
1 revision
Source (file/DB/DataFrame)
|
v
SchemaProvider.extract() --> SchemaInfo (fields + metadata)
|
Target (file/DB/DataFrame)
|
v
SchemaProvider.extract() --> SchemaInfo
|
v
MapEngine
|-- For each (source_field, target_field) pair:
| |-- Run all scorers --> ScorerResult | None
| |-- Weighted average (min 2 contributors)
| \-- --> Score matrix (M x N)
|
|-- scipy.linear_sum_assignment (Hungarian algorithm)
|-- Filter by min_confidence
|-- Generate warnings for unmapped required fields
|
v
MapResult
|-- .report() --> structured dict with per-scorer breakdown
|-- .apply(df) --> remapped DataFrame
|-- .to_config() --> saveable YAML
\-- .to_json() --> JSON string
- Consumer Layer — CLI, Python API, TypeScript SDK (v1.1)
- Orchestrator — MapEngine coordinates everything
- Scorer Pipeline — Independent scorers, each returns (score, reasoning)
- Schema Providers — Normalize any source into SchemaInfo
Weighted average, not staged filtering: All scorers run on all pairs. No early locking that could steal targets from better matches.
Optimal assignment, not greedy: The Hungarian algorithm finds the globally optimal 1:1 mapping, not the locally best one. This matters when two source fields compete for the same target.
None vs 0.0: Scorers return None to abstain (excluded from calculation) or ScorerResult(0.0) to signal a real negative (included in denominator). This prevents a single weak scorer from inflating a match.
Minimum 2 contributors: A pair needs at least 2 non-None scorer results to receive a score. Single-scorer matches are too unreliable.
infermap/
├── __init__.py # Public API
├── engine.py # MapEngine orchestrator
├── types.py # Core dataclasses
├── errors.py # Exception hierarchy
├── assignment.py # Hungarian algorithm
├── config.py # YAML config + from_config()
├── cli.py # Typer CLI (4 commands)
├── scorers/
│ ├── base.py # Scorer protocol
│ ├── exact.py # ExactScorer
│ ├── alias.py # AliasScorer + registry
│ ├── pattern_type.py # PatternTypeScorer + regex
│ ├── profile.py # ProfileScorer
│ ├── fuzzy_name.py # FuzzyNameScorer
│ └── llm.py # LLMScorer (v1.1 stub)
└── providers/
├── base.py # Provider protocol
├── file.py # CSV/Parquet/Excel
├── db.py # SQLite/Postgres/DuckDB/MySQL
├── schema_file.py # YAML/JSON definitions
└── memory.py # DataFrame/dict