Parse multi-format agent session traces into Polars DataFrames and Parquet.
Reads JSONL files from Pi, Claude Code, Codex, and ATIF formats → normalized three-table layout (sessions, events, content) → Parquet. Designed for analytical workflows: behavioral analysis, cost tracking, error patterns, training data curation.
pip install agent-traces @ git+https://github.com/davanstrien/agent-traces.gitOr with uv:
uv pip install "agent-traces @ git+https://github.com/davanstrien/agent-traces.git"from agent_traces import TraceDataset
# Load from HuggingFace Hub
ds = TraceDataset.from_hub("badlogicgames/pi-mono")
# Three normalized tables
ds.sessions # 1 row/session: model, counts, tokens, cost
ds.events # 1 row/entry: type, role, tool_name, is_error
ds.content # 1 row/entry with text
# Convenience views for common analyses
ds.user_messages # turn, nTurns, model, msg + session metadata
ds.assistant_messages # content_text, thinking, tool_calls
ds.tool_calls # tool_name + session metadata
# Aggregates
ds.tool_counts(group_by="model")
ds.token_stats(group_by="model")
ds.error_rate(group_by="model")
ds.summary()
# Export
ds.to_parquet("output/") # sessions.parquet + events.parquet + content.parquet
ds.to_flat_parquet("flat.parquet") # single 44-column table (backward compat)# Multiple datasets at once
ds = TraceDataset.from_hub_batch([
"badlogicgames/pi-mono",
"0xSero/pi-sessions",
"moikapy/0xKobolds",
])
# Search Hub by tag
for repo_id, ds in TraceDataset.from_hub_search(limit=10):
print(ds.summary())
# Merge datasets
combined = ds1 + ds2ds = TraceDataset.from_dir("path/to/sessions/")Or use the parser directly for the raw 44-column flat table:
from agent_traces import parse_sessions
df = parse_sessions("path/to/*.jsonl")# Convert a Hub dataset to Parquet
agent-traces convert badlogicgames/pi-mono -o output.parquet
# Convert local files
agent-traces convert ./sessions/*.jsonl -o output.parquet
# Batch convert from a file of repo IDs
agent-traces convert --from-file repos.txt --output-dir parsed/
# Inspect a dataset
agent-traces inspect badlogicgames/pi-mono| Format | Example Dataset | Detection |
|---|---|---|
| Pi | badlogicgames/pi-mono |
type field in JSON |
| Claude Code | ultralazr/claude-code-traces |
type + uuid/sessionId |
| Codex | cfahlgren1/agent-sessions-list |
type: "item" + payload wrapper |
| ATIF | vinhnx90/vtcode-sessions |
First-byte ` |
Format is auto-detected per file. The agent column in the events table identifies the source format.
JSONL lines → msgspec decode → Entry (dataclass)
│
┌───────────┼───────────┐
▼ ▼ ▼
sessions events content
(1 row/ (1 row/ (1 row per
session) entry) entry with
text only)
The three-table layout avoids the 80%-null problem of a single wide table:
| Table | Rows | Columns | Size (pi-mono) |
|---|---|---|---|
| sessions | 626 | 15 | 64 KB |
| events | 36,791 | 20 | 1.4 MB |
| content | 16,841 | 9 | 13 MB |
| flat (legacy) | 36,791 | 44 | 16 MB |
ds = TraceDataset.from_hub("badlogicgames/pi-mono")
ds.error_rate(group_by="model")ds.tool_counts(group_by="model")s = ds.sessions
struggling = s.filter(pl.col("n_errors") >= 5)
healthy = s.filter(pl.col("n_errors") == 0)
print(f"Struggling: ${struggling['cost_total'].mean():.2f}/session")
print(f"Healthy: ${healthy['cost_total'].mean():.2f}/session")The behavioral features (tool choice ratios, verbosity, error rates) predict session outcomes with AUC 0.84 after just 2 turns — even without seeing error counts. See the blog post for details.
# Install with dev dependencies
uv sync
# Run tests
uv run pytest
# Lint
uv run ruff check .
# Type check
uv run ty check agent_traces/
# Format
uv run black agent_traces/ tests/MIT