dart corr

dart-corr · Cross-artifact correlation engine

Surfaces contradictions between artifacts as UNRESOLVED rather than smoothing them over. This is the single most important component for the project's "architecture-first" claim — without dart-corr, the agent would just believe whatever the first source told it.

What it owns

The correlation rule pack (dart_corr/correlation-rules.yaml)
DuckDB-backed in-process joins for time-proximity correlation
The contradiction state machine: OPEN → RESOLVED | UNRESOLVED
Two MCP-surface functions: correlate_events and correlate_timeline

What it does not own

Hypothesis revision — that's dart-agent's job
Storing audit entries — that's dart-audit's job
The artifacts themselves — those come from dart-mcp functions

The mechanical guarantee

When two artifacts disagree on a fact, dart-corr flags it.

Example from Case-PtH-Timestomp:

Source	Claim
Auth events (4624)	Pass-the-Hash at `14:23:09 UTC`
MFT `$SI` vs `$FN`	Timestomp at `14:21:55 UTC` (11 sec earlier)

A naïve LLM agent might pick whichever claim supports its current hypothesis. dart-corr raises UNRESOLVED and forces the agent to revise — there must be a third explanation that reconciles both, or the hypothesis is wrong.

# dart_corr/__init__.py — simplified
def correlate(events_a, events_b, time_window_sec=15):
    contradictions = []
    for a in events_a:
        for b in events_b:
            if abs((a.ts - b.ts).total_seconds()) <= time_window_sec:
                if a.fact != b.fact:  # disagreement
                    contradictions.append({
                        "claim_a": a.fact, "source_a": a.source, "ts_a": a.ts,
                        "claim_b": b.fact, "source_b": b.source, "ts_b": b.ts,
                        "status": "UNRESOLVED",
                    })
    return contradictions

The agent's playbook requires it to handle UNRESOLVED before emitting findings. Skipping is not an option — the serializer (dart_agent/serializer.py) refuses.

Why DuckDB

dart-corr runs in-process (no server, no port). For multi-million-row MFT timelines, naïve Python joins OOM. DuckDB handles 5M+ row joins in seconds with window functions for time-proximity, all without leaving the process.

import duckdb
con = duckdb.connect(":memory:")
con.execute("INSTALL parquet; LOAD parquet")
con.execute("CREATE TABLE auth AS SELECT * FROM read_csv('auth.csv')")
con.execute("CREATE TABLE mft  AS SELECT * FROM read_csv('mft.csv')")
con.execute("""
    SELECT a.user, a.ts, m.path, m.timestomp
    FROM auth a
    ASOF JOIN mft m ON a.ts BETWEEN m.ts - INTERVAL 15 SECOND AND m.ts + INTERVAL 15 SECOND
    WHERE m.timestomp = TRUE
""").fetchall()

The agent doesn't write SQL. dart-corr exposes correlate_events and correlate_timeline as typed MCP calls — the agent supplies the source files and a hypothesis ID, the engine returns the contradictions.

Files

dart_corr/
├── README.md                  # implementation status
├── correlation-rules.yaml     # rule pack (operator-tunable)
└── (joins live in dart_mcp/_v04_expansion.py + dart_mcp/__init__.py)

Implementation note: dart_corr is currently a thin scaffold. The actual correlation logic lives inside dart_mcp/__init__.py (functions correlate_events, correlate_timeline, correlate_download_to_execution). Phase 2 will move it into a proper package; the API surface is already stable.

dart corr

dart-corr · Cross-artifact correlation engine

What it owns

What it does not own

The mechanical guarantee

Why DuckDB

Files

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project

Project links

Clone this wiki locally