Skip to content

dart corr

Juwon1405 edited this page Apr 30, 2026 · 6 revisions

dart-corr · Cross-artifact correlation engine

Surfaces contradictions between artifacts as UNRESOLVED rather than smoothing them over. This is the single most important component for the project's "architecture-first" claim — without dart-corr, the agent would just believe whatever the first source told it.


What it owns

  • The correlation rule pack (dart_corr/correlation-rules.yaml)
  • DuckDB-backed in-process joins for time-proximity correlation
  • The contradiction state machine: OPENRESOLVED | UNRESOLVED
  • Two MCP-surface functions: correlate_events and correlate_timeline

What it does not own

  • Hypothesis revision — that's dart-agent's job
  • Storing audit entries — that's dart-audit's job
  • The artifacts themselves — those come from dart-mcp functions

The mechanical guarantee

When two artifacts disagree on a fact, dart-corr flags it.

Example from Case-PtH-Timestomp:

Source Claim
Auth events (4624) Pass-the-Hash at 14:23:09 UTC
MFT $SI vs $FN Timestomp at 14:21:55 UTC (11 sec earlier)

A naïve LLM agent might pick whichever claim supports its current hypothesis. dart-corr raises UNRESOLVED and forces the agent to revise — there must be a third explanation that reconciles both, or the hypothesis is wrong.

# dart_corr/__init__.py — simplified
def correlate(events_a, events_b, time_window_sec=15):
    contradictions = []
    for a in events_a:
        for b in events_b:
            if abs((a.ts - b.ts).total_seconds()) <= time_window_sec:
                if a.fact != b.fact:  # disagreement
                    contradictions.append({
                        "claim_a": a.fact, "source_a": a.source, "ts_a": a.ts,
                        "claim_b": b.fact, "source_b": b.source, "ts_b": b.ts,
                        "status": "UNRESOLVED",
                    })
    return contradictions

The agent's playbook requires it to handle UNRESOLVED before emitting findings. Skipping is not an option — the serializer (dart_agent/serializer.py) refuses.


Why DuckDB

dart-corr runs in-process (no server, no port). For multi-million-row MFT timelines, naïve Python joins OOM. DuckDB handles 5M+ row joins in seconds with window functions for time-proximity, all without leaving the process.

import duckdb
con = duckdb.connect(":memory:")
con.execute("INSTALL parquet; LOAD parquet")
con.execute("CREATE TABLE auth AS SELECT * FROM read_csv('auth.csv')")
con.execute("CREATE TABLE mft  AS SELECT * FROM read_csv('mft.csv')")
con.execute("""
    SELECT a.user, a.ts, m.path, m.timestomp
    FROM auth a
    ASOF JOIN mft m ON a.ts BETWEEN m.ts - INTERVAL 15 SECOND AND m.ts + INTERVAL 15 SECOND
    WHERE m.timestomp = TRUE
""").fetchall()

The agent doesn't write SQL. dart-corr exposes correlate_events and correlate_timeline as typed MCP calls — the agent supplies the source files and a hypothesis ID, the engine returns the contradictions.


Files

dart_corr/
├── README.md                  # implementation status
├── correlation-rules.yaml     # rule pack (operator-tunable)
└── (joins live in dart_mcp/_v04_expansion.py + dart_mcp/__init__.py)

Implementation note: dart_corr is currently a thin scaffold. The actual correlation logic lives inside dart_mcp/__init__.py (functions correlate_events, correlate_timeline, correlate_download_to_execution). Phase 2 will move it into a proper package; the API surface is already stable.


See also

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project


Project links

Clone this wiki locally