Skip to content

dart corr

Bang Juwon edited this page May 17, 2026 · 6 revisions

dart-corr · Cross-artifact correlation engine

Surfaces contradictions between artifacts as UNRESOLVED rather than smoothing them over. This is the single most important component for the project's "architecture-first" claim — without dart-corr, the agent would just believe whatever the first source told it.


What it owns

  • The correlation rule pack (dart_corr/correlation-rules.yaml)
  • DuckDB-backed in-process joins for time-proximity correlation
  • The contradiction state machine: OPENRESOLVED | UNRESOLVED
  • Two MCP-surface functions: correlate_events and correlate_timeline

What it does not own

  • Hypothesis revision — that's dart-agent's job
  • Storing audit entries — that's dart-audit's job
  • The artifacts themselves — those come from dart-mcp functions

The mechanical guarantee

When two artifacts disagree on a fact, dart-corr flags it.

Example from Case-PtH-Timestomp:

Source Claim
Auth events (4624) Pass-the-Hash at 14:23:09 UTC
MFT $SI vs $FN Timestomp at 14:21:55 UTC (11 sec earlier)

A naïve LLM agent might pick whichever claim supports its current hypothesis. dart-corr raises UNRESOLVED and forces the agent to revise — there must be a third explanation that reconciles both, or the hypothesis is wrong.

# Illustrative — real implementation lives in dart_mcp/__init__.py
# (correlate_events, correlate_timeline). dart_corr is currently a docs-only
# scaffold; see "Files" below.
def correlate(events_a, events_b, time_window_sec=15):
    contradictions = []
    for a in events_a:
        for b in events_b:
            if abs((a.ts - b.ts).total_seconds()) <= time_window_sec:
                if a.fact != b.fact:  # disagreement
                    contradictions.append({
                        "claim_a": a.fact, "source_a": a.source, "ts_a": a.ts,
                        "claim_b": b.fact, "source_b": b.source, "ts_b": b.ts,
                        "status": "UNRESOLVED",
                    })
    return contradictions

The agent's playbook requires it to handle UNRESOLVED before emitting findings. Skipping is not an option — the finding emitter inside DeterministicAnalyst (in dart_agent/__init__.py) refuses to write a finding while a relevant UNRESOLVED contradiction is open.


Why DuckDB

dart-corr runs in-process (no server, no port). For multi-million-row MFT timelines, naïve Python joins OOM. DuckDB handles 5M+ row joins in seconds with window functions for time-proximity, all without leaving the process.

import duckdb
con = duckdb.connect(":memory:")
con.execute("INSTALL parquet; LOAD parquet")
con.execute("CREATE TABLE auth AS SELECT * FROM read_csv('auth.csv')")
con.execute("CREATE TABLE mft  AS SELECT * FROM read_csv('mft.csv')")
con.execute("""
    SELECT a.user, a.ts, m.path, m.timestomp
    FROM auth a, mft m
    WHERE a.ts BETWEEN m.ts - INTERVAL 15 SECOND AND m.ts + INTERVAL 15 SECOND
      AND m.timestomp = TRUE
""").fetchall()

The agent doesn't write SQL. dart-corr exposes correlate_events and correlate_timeline as typed MCP calls — the agent supplies the source files and a hypothesis ID, the engine returns the contradictions.


Files

dart_corr/
└── README.md                  # implementation status + design contract

Implementation today (v0.7.1):
  dart_mcp/src/dart_mcp/__init__.py
    ├── correlate_events                 (line 657)
    ├── correlate_timeline               (line 694)
    └── correlate_download_to_execution  (line 1409)
  dart_mcp/src/dart_mcp/_v04_expansion.py
    └── supporting helpers

Implementation note (v0.7.1): dart_corr/ is currently a docs-only scaffold — it contains only README.md (the design contract). The actual correlation logic lives inside dart_mcp/__init__.py: three @register'd MCP functions (correlate_events, correlate_timeline, correlate_download_to_execution) plus a DuckDB engine helper in dart_mcp/_v04_expansion.py. They are reachable on the MCP wire today, verified by tests/test_mcp_surface.py and exercised in case-04 (download → execution) and the PtH-Timestomp wiki walkthrough. The mid-2026 milestone is to extract this code into a standalone dart_corr package with its own rule pack (correlation-rules.yaml) and event-driven state machine; the MCP-facing API will stay the same so case studies and the agent loop do not break.


See also

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project


Project links

Clone this wiki locally