block · vigneshnarayanaswamy · Apr 16, 2026 · Apr 15, 2026 · Apr 15, 2026 · Apr 15, 2026
@@ -28,3 +28,16 @@ docs/superpowers/
 
 # Detailed internal blocklist (committed config is .gitleaks.toml)
 .gitleaks-internal.toml
+
+# rp1:start
+!.rp1/
+.rp1/*
+!.rp1/context/
+!.rp1/context/**
+!.rp1/config/
+!.rp1/config/**
+!.rp1/work/
+!.rp1/work/**
+.rp1/context/meta.json
+.rp1/settings.toml
+# rp1:end
@@ -1,4 +1,32 @@
 repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v5.0.0
+    hooks:
+      - id: trailing-whitespace
+      - id: end-of-file-fixer
+      - id: check-yaml
+      - id: check-json
+      - id: check-toml
+      - id: check-added-large-files
+      - id: check-merge-conflict
+      - id: debug-statements
+      - id: mixed-line-ending
+
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.11.6
+    hooks:
+      - id: ruff
+        args: [--fix]
+      - id: ruff-format
+
+  - repo: https://github.com/pre-commit/mirrors-mypy
+    rev: v1.15.0
+    hooks:
+      - id: mypy
+        args: [--config-file=pyproject.toml, src/]
+        additional_dependencies: [pydantic>=2.0, httpx>=0.28]
+        pass_filenames: false
+
   - repo: https://github.com/gitleaks/gitleaks
     rev: v8.24.3
     hooks:

@@ -0,0 +1,219 @@
+# System Architecture
+
+**Project**: model-ledger
+**Architecture Pattern**: Event-Sourced, Protocol-First, Layered with Tool-Shaped API
+**Last Updated**: 2026-04-16
+
+## High-Level Architecture
+
+```mermaid
+graph TB
+    subgraph Presentation["Presentation Layer"]
+        CLI["CLI<br/>(Typer)"]
+        MCP["MCP Server<br/>(FastMCP)"]
+        REST["REST API<br/>(FastAPI)"]
+    end
+
+    subgraph Tools["Tool Layer"]
+        T_record["record"]
+        T_query["query"]
+        T_investigate["investigate"]
+        T_trace["trace"]
+        T_changelog["changelog"]
+        T_discover["discover"]
+    end
+
+    subgraph SDK["SDK Layer"]
+        Ledger["Ledger SDK<br/>(register, record, tag,<br/>add, connect, trace,<br/>members, groups)"]
+    end
+
+    subgraph Backends["Backend Layer"]
+        SQLite["SQLiteLedgerBackend"]
+        Snowflake["SnowflakeLedgerBackend"]
+        HTTP_BE["HttpLedgerBackend"]
+        JSON["JsonFileLedgerBackend"]
+        Memory["InMemoryLedgerBackend"]
+    end
+
+    subgraph Connectors["Connector Layer"]
+        SQL_C["sql_connector"]
+        REST_C["rest_connector"]
+        GitHub_C["github_connector"]
+        Prefect_C["prefect_connector"]
+    end
+
+    Agent["AI Agent"] -->|MCP stdio| MCP
+    User["User / Script"] -->|HTTP| REST
+    User -->|CLI| CLI
+
+    CLI --> Ledger
+    MCP --> T_record
+    MCP --> T_query
+    MCP --> T_investigate
+    MCP --> T_trace
+    MCP --> T_changelog
+    MCP --> T_discover
+    REST --> T_record
+    REST --> T_query
+    REST --> T_investigate
+    REST --> T_trace
+    REST --> T_changelog
+    REST --> T_discover
+    T_record --> Ledger
+    T_query --> Ledger
+    T_investigate --> Ledger
+    T_trace --> Ledger
+    T_changelog --> Ledger
+    T_discover --> Ledger
+
+    Ledger --> SQLite
+    Ledger --> Snowflake
+    Ledger --> HTTP_BE
+    Ledger --> JSON
+    Ledger --> Memory
+
+    Connectors --> Ledger
+
+    HTTP_BE -.->|pass-through| REST
+
+    SQL_C -->|queries| ExtDB[("External DB")]
+    REST_C -->|fetches| ExtAPI["External API"]
+    GitHub_C -->|reads| GitHubAPI["GitHub API"]
+    Prefect_C -->|discovers| PrefectCloud["Prefect Cloud"]
+```
+
+## Component Architecture
+
+### Presentation Layer
+**Purpose**: User-facing interfaces — CLI, MCP server for AI agents, REST API for programmatic access
+**Components**:
+- `src/model_ledger/cli/app.py` — Typer CLI (list, show, validate, audit-log, export, introspect, mcp, serve)
+- `src/model_ledger/mcp/server.py` — FastMCP server with 6 tools + 3 resources, stdio transport
+- `src/model_ledger/rest/app.py` — FastAPI with 6 endpoints mirroring tools, uvicorn
+
+### Tool Layer
+**Purpose**: Six agent-protocol tool functions with Pydantic I/O contracts — the canonical API surface
+**Components**:
+- `src/model_ledger/tools/schemas.py` — Pydantic I/O models (single source of truth)
+- `src/model_ledger/tools/{record,query,investigate,trace,changelog,discover}.py`
+**Pattern**: Pure functions with signature `(Input, Ledger) -> Output`, all JSON-serializable
+
+### SDK Layer
+**Purpose**: Core business logic — Ledger class orchestrates registration, recording, tagging, dependency linking, membership, and change propagation
+**Components**:
+- `src/model_ledger/sdk/ledger.py` — Ledger (v0.3.0+ event-log paradigm)
+- `src/model_ledger/sdk/inventory.py` — Inventory (v0.2.0 legacy with DraftVersion context manager)
+
+### Backend Layer (Storage)
+**Purpose**: Pluggable persistence implementing LedgerBackend protocol
+**Components**:
+- `SQLiteLedgerBackend` — WAL mode, zero-dep, stdlib sqlite3
+- `SnowflakeLedgerBackend` — Production, batched writes with pandas/SQL MERGE fallback
+- `HttpLedgerBackend` — REST API pass-through via httpx
+- `JsonFileLedgerBackend` — Git-friendly directory tree (models/, snapshots/, tags/)
+- `InMemoryLedgerBackend` — Testing and demo
+
+### Connector Layer (Discovery)
+**Purpose**: Config-driven factories that discover models from external data sources
+**Components**:
+- `sql_connector` — SQL query to DataNode mapping with table parsing
+- `rest_connector` — REST API pagination and JSON field mapping
+- `github_connector` — GitHub Contents API, repo config scanning
+- `prefect_connector` — Prefect Cloud deployment discovery
+
+## Data Flow
+
+### Agent Tool Invocation (MCP)
+```mermaid
+sequenceDiagram
+    participant Agent as AI Agent
+    participant MCP as FastMCP Server
+    participant Tool as Tool Function
+    participant SDK as Ledger SDK
+    participant BE as LedgerBackend
+
+    Agent->>MCP: Tool call (stdio)
+    MCP->>Tool: Construct Pydantic Input
+    Tool->>SDK: Call Ledger methods
+    SDK->>BE: Read/Write via protocol
+    BE-->>SDK: Data
+    SDK-->>Tool: Results
+    Tool-->>MCP: Pydantic Output (JSON)
+    MCP-->>Agent: Tool response
+```
+
+### Model Discovery via Connectors
+```mermaid
+sequenceDiagram
+    participant Conn as Connector Factory
+    participant Ext as External Source
+    participant SDK as Ledger.add()
+    participant Linker as Ledger.connect()
+    participant BE as LedgerBackend
+
+    Conn->>Ext: Query (SQL/REST/GitHub/Prefect)
+    Ext-->>Conn: Raw rows/items
+    Conn->>Conn: Map to DataNodes with DataPorts
+    Conn->>SDK: Register DataNodes as ModelRefs
+    SDK->>BE: Content-hash dedup + append snapshots
+    Linker->>Linker: Match DataPort I/O across all nodes
+    Linker->>BE: Create dependency links
+```
+
+### Composite Change Propagation
+```mermaid
+sequenceDiagram
+    participant Caller as record()
+    participant SDK as Ledger SDK
+    participant Groups as groups()
+    participant Parent as Parent Composite
+
+    Caller->>SDK: record(member_model, event)
+    SDK->>SDK: Check event not in _INTERNAL_EVENTS
+    SDK->>Groups: Find parent composites
+    Groups-->>SDK: List of parent ModelRefs
+    SDK->>Parent: record(parent, member_changed, metadata)
+    Note over Parent: _propagating=True prevents cascading
+```
+
+## Integration Points
+
+### External Services
+| Service | Purpose | Integration Type |
+|---------|---------|-----------------|
+| Snowflake | Production storage | Database (MERGE, write_pandas) |
+| SQLite | Local persistent storage | Embedded database (stdlib) |
+| GitHub API | Discover models from config files | REST API (v3 Contents) |
+| Prefect Cloud | Discover orchestration deployments | Python SDK (async) |
+| PyPI | Package distribution | CI/CD (GitHub Actions) |
+| FastMCP | Expose tools to AI agents | MCP protocol (stdio) |
+
+### MCP-to-REST Pass-Through
+The MCP server supports `HttpLedgerBackend` which creates a pass-through mode — all tool calls are forwarded as HTTP requests to a remote REST API deployment.
+
+## Architectural Patterns
+
+| Pattern | Evidence | Description |
+|---------|----------|-------------|
+| Event Sourcing | ModelRef + Snapshot | All state changes are immutable Snapshots. History replayed for current state. |
+| Protocol-First | `@runtime_checkable Protocol` | All extension points use Protocols, not ABCs. Duck typing with type-checker support. |
+| Tool-Shaped SDK | 6 tool functions | Every method has clear inputs, JSON-serializable outputs, no side effects beyond the ledger. |
+| Factory Pattern | Connector + Backend factories | Config-driven factories return fully-wired instances. |
+| Plugin Architecture | Entry points | `importlib.metadata.entry_points()` for introspectors and scanners. |
+| Composite Governance | Member tracking via events | Business composites aggregate technical nodes. Membership tracked via snapshots, not FK. |
+
+## Deployment Architecture
+
+**Distribution**: Python package via PyPI (Apache-2.0)
+**Build System**: hatchling
+**Python**: >=3.10
+**Core Dependencies**: pydantic, httpx (minimal)
+
+**Runtime Modes**:
+- CLI: `model-ledger <command>`
+- MCP server: `model-ledger mcp [--backend sqlite|snowflake|json|http|memory]`
+- REST API: `model-ledger serve [--backend ...] [--port 8000]`
+- Python SDK: `from model_ledger.sdk.ledger import Ledger`
+- Embedded: `Ledger(backend=SQLiteLedgerBackend('db.sqlite'))`
+
+**Optional Dependency Groups**: cli, mcp, rest-api, snowflake, github, excel, introspect-*
@@ -0,0 +1,80 @@
+# Project Charter: model-ledger
+
+**Version**: 1.0.0
+**Status**: Complete
+**Created**: 2026-04-16
+
+### Problem & Context
+
+Financial institutions operate hundreds to thousands of ML models and rules-based systems across 10+ platforms, yet typically track only a fraction in any formal inventory. Regulatory mandates — SR 11-7, EU AI Act (enforcement August 2026), OSFI E-23, PRA SS1/23 — require comprehensive, auditable model inventories with full lineage and change trails.
+
+Existing model registries (MLflow, SageMaker, Weights & Biases) are single-platform silos. They track what was trained in their environment but cannot provide a unified, cross-platform view of every deployed model, its dependencies, or its governance status. The result: MRM teams resort to spreadsheets, coverage gaps go undetected, and regulatory audits surface material findings.
+
+**Why now**: Regulatory frameworks like SR 11-7 and the EU AI Act are raising the bar for model governance. Organizations increasingly need a solution that can discover and govern models across all platforms — this is a growing industry need, not a single-deadline event.
+
+### Target Users
+
+| Segment | Role | Primary Need |
+|---------|------|--------------|
+| Model Risk Management (MRM) teams | Govern and inventory all models | Complete, living inventory satisfying regulatory requirements |
+| ML Engineers | Build and deploy models | Discover dependencies, trace impact of changes, understand lineage |
+| AI Agents (via MCP) | Query inventory conversationally | Tool-shaped API for natural-language governance queries |
+| Regulators / Auditors | Examine compliance posture | Audit trails, compliance documentation, coverage reports |
+
+### Business Rationale
+
+model-ledger is the missing governance layer that sits above platform-specific registries and provides a unified, cross-platform model inventory.
+
+**Core value delivered**:
+1. **Unified discovery**: Discovers models across all platforms as one connected graph, eliminating blind spots from platform silos
+2. **Immutable audit trail**: Every change is tracked as a content-addressed, append-only event — satisfying regulatory auditability requirements
+3. **Dependency mapping**: Maps upstream/downstream dependencies so teams can trace the blast radius of any change
+4. **Regulatory compliance**: Validates inventory against SR 11-7, EU AI Act, and NIST AI RMF compliance profiles out of the box
+5. **Agent-native interface**: Exposes everything through a tool-shaped API (MCP, REST, CLI) so AI agents can query and manage the inventory conversationally
+6. **Composite governance**: Aggregates technical components into business-level composite models, letting regulators examine governable entities rather than raw artifacts
+
+**Differentiation**: Unlike MLflow/SageMaker/W&B (single-platform training registries), model-ledger is a cross-platform governance framework. Unlike GRC tools, it is code-native, event-sourced, and agent-accessible. Apache-2.0 licensing enables adoption without vendor lock-in.
+
+### Scope Guardrails
+
+**In Scope (v0.7.x)**:
+- Model registration with content-addressed identity (ModelRef, Snapshot, Tag)
+- Append-only event-log paradigm with immutable audit trail
+- Dependency graph construction and traversal (add, connect, trace)
+- Composite model governance (groups, members, automatic change propagation)
+- 6 agent tools: record, query, investigate, trace, changelog, discover
+- Three transport surfaces: MCP server, REST API, CLI
+- 5 pluggable backends: InMemory, SQLite, Snowflake, HTTP pass-through, JSON files
+- 4 source connectors: SQL, REST, GitHub, Prefect
+- 3 regulatory compliance profiles: SR 11-7, EU AI Act, NIST AI RMF
+- ML model introspection plugins (sklearn, xgboost, lightgbm)
+- Audit pack export (HTML, JSON, Markdown)
+- Observations, validation runs, and feedback lifecycle
+- Scanner protocol for platform-level model discovery
+- Plugin discovery via entry_points
+
+**Out of Scope (by design)**:
+- Model training / experiment tracking (MLflow/W&B territory)
+- Real-time monitoring / alerting
+- Automated remediation of findings
+- Model serving / deployment
+- Feature stores
+- Data quality monitoring
+- UI / dashboard frontend (REST API exists but no bundled frontend)
+- Organization-specific connectors, auth, backends (separate companion packages)
+- Model comparison / A/B testing
+
+### Success Criteria
+
+**Success looks like**:
+1. **Regulatory readiness**: Model inventory is comprehensive enough to satisfy SR 11-7 and EU AI Act (August 2026 deadline) audit requirements — complete coverage of deployed models with audit trails
+2. **Coverage at scale**: Organizations move from partial tracking (~15%) to >90% coverage across all platforms
+3. **OSS adoption**: External organizations (banks, fintechs) adopt model-ledger as their model inventory solution — evidenced by PyPI downloads, GitHub stars, and external contributions
+4. **Agent-native usage**: AI agents (via MCP) become the primary interface for querying and managing the inventory — model governance becomes conversational
+5. **Composite governance**: Business-level composites successfully aggregate thousands of technical nodes with automatic change propagation, enabling regulators to examine governable entities
+
+**Failure looks like**:
+- Regulatory audit finds significant gaps in model inventory coverage
+- Framework is too complex for MRM teams to adopt — they fall back to spreadsheets
+- OSS project stays internal-only with no external adoption
+- Event-log paradigm creates performance bottlenecks at scale (>10K models)