Skip to content

dart corr

Bang Juwon edited this page May 17, 2026 · 6 revisions

dart-corr · Cross-artifact correlation engine

Surfaces contradictions between artifacts as UNRESOLVED rather than smoothing them over. This is the single most important component for the project's "architecture-first" claim — without dart-corr, the agent would just believe whatever the first source told it.


What it owns

  • The correlation rule pack (dart_corr/correlation-rules.yaml)
  • DuckDB-backed in-process joins for time-proximity correlation
  • The contradiction state machine: OPENRESOLVED | UNRESOLVED
  • Two MCP-surface functions: correlate_events and correlate_timeline

What it does not own

  • Hypothesis revision — that's dart-agent's job
  • Storing audit entries — that's dart-audit's job
  • The artifacts themselves — those come from dart-mcp functions

The mechanical guarantee

When two artifacts disagree on a fact, dart-corr flags it.

Example from Case-PtH-Timestomp:

Source Claim
Auth events (4624) Pass-the-Hash at 14:23:09 UTC
MFT $SI vs $FN Timestomp at 14:21:55 UTC (11 sec earlier)

A naïve LLM agent might pick whichever claim supports its current hypothesis. dart-corr raises UNRESOLVED and forces the agent to revise — there must be a third explanation that reconciles both, or the hypothesis is wrong.

# Illustrative — real implementation lives in dart_mcp/__init__.py
# (correlate_events, correlate_timeline). dart_corr is currently a docs-only
# scaffold; see "Files" below.
def correlate(events_a, events_b, time_window_sec=15):
    contradictions = []
    for a in events_a:
        for b in events_b:
            if abs((a.ts - b.ts).total_seconds()) <= time_window_sec:
                if a.fact != b.fact:  # disagreement
                    contradictions.append({
                        "claim_a": a.fact, "source_a": a.source, "ts_a": a.ts,
                        "claim_b": b.fact, "source_b": b.source, "ts_b": b.ts,
                        "status": "UNRESOLVED",
                    })
    return contradictions

The agent's playbook requires it to handle UNRESOLVED before emitting findings. Skipping is not an option — the finding emitter inside DeterministicAnalyst (in dart_agent/__init__.py) refuses to write a finding while a relevant UNRESOLVED contradiction is open.


Why DuckDB

dart-corr runs in-process (no server, no port). For multi-million-row MFT timelines, naïve Python joins OOM. DuckDB handles 5M+ row joins in seconds with window functions for time-proximity, all without leaving the process.

import duckdb
con = duckdb.connect(":memory:")
con.execute("INSTALL parquet; LOAD parquet")
con.execute("CREATE TABLE auth AS SELECT * FROM read_csv('auth.csv')")
con.execute("CREATE TABLE mft  AS SELECT * FROM read_csv('mft.csv')")
con.execute("""
    SELECT a.user, a.ts, m.path, m.timestomp
    FROM auth a, mft m
    WHERE a.ts BETWEEN m.ts - INTERVAL 15 SECOND AND m.ts + INTERVAL 15 SECOND
      AND m.timestomp = TRUE
""").fetchall()

The agent doesn't write SQL. dart-corr exposes correlate_events and correlate_timeline as typed MCP calls — the agent supplies the source files and a hypothesis ID, the engine returns the contradictions.


Files

dart_corr/
├── README.md                  # design contract + usage
├── pyproject.toml             # package metadata (duckdb + PyYAML)
├── correlation-rules.yaml     # operator-tunable rule pack (9 default rules)
├── src/dart_corr/
│   └── __init__.py            # the engine — three public correlate_* functions
└── tests/
    └── test_dart_corr.py      # 14 unit tests, run independently of dart_mcp

Implementation note (v0.7.1): As of v0.7.1, dart_corr is a real standalone package — not a docs-only scaffold. The three public functions (correlate_events, correlate_timeline, correlate_download_to_execution) plus load_rules() live in dart_corr/src/dart_corr/__init__.py and have 14 dedicated tests in dart_corr/tests/test_dart_corr.py (all passing). The MCP wire surface is unchanged: dart_mcp.correlate_events and friends are thin wrappers that delegate to dart_corr, with correlate_timeline additionally enforcing a SQL-injection allow-list at the boundary before calling the engine. Both call paths produce identical output.


See also

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project


Project links

Clone this wiki locally