agent-traces

Parse multi-format agent session traces into Polars DataFrames and Parquet.

Reads JSONL files from Pi, Claude Code, Codex, and ATIF formats → normalized three-table layout (sessions, events, content) → Parquet. Designed for analytical workflows: behavioral analysis, cost tracking, error patterns, training data curation.

Install

pip install agent-traces @ git+https://github.com/davanstrien/agent-traces.git

Or with uv:

uv pip install "agent-traces @ git+https://github.com/davanstrien/agent-traces.git"

Quick Start

from agent_traces import TraceDataset

# Load from HuggingFace Hub
ds = TraceDataset.from_hub("badlogicgames/pi-mono")

# Three normalized tables
ds.sessions    # 1 row/session: model, counts, tokens, cost
ds.events      # 1 row/entry: type, role, tool_name, is_error
ds.content     # 1 row/entry with text

# Convenience views for common analyses
ds.user_messages       # turn, nTurns, model, msg + session metadata
ds.assistant_messages  # content_text, thinking, tool_calls
ds.tool_calls          # tool_name + session metadata

# Aggregates
ds.tool_counts(group_by="model")
ds.token_stats(group_by="model")
ds.error_rate(group_by="model")
ds.summary()

# Export
ds.to_parquet("output/")          # sessions.parquet + events.parquet + content.parquet
ds.to_flat_parquet("flat.parquet") # single 44-column table (backward compat)

Batch Loading

# Multiple datasets at once
ds = TraceDataset.from_hub_batch([
    "badlogicgames/pi-mono",
    "0xSero/pi-sessions",
    "moikapy/0xKobolds",
])

# Search Hub by tag
for repo_id, ds in TraceDataset.from_hub_search(limit=10):
    print(ds.summary())

# Merge datasets
combined = ds1 + ds2

Local Files

ds = TraceDataset.from_dir("path/to/sessions/")

Or use the parser directly for the raw 44-column flat table:

from agent_traces import parse_sessions

df = parse_sessions("path/to/*.jsonl")

CLI

# Convert a Hub dataset to Parquet
agent-traces convert badlogicgames/pi-mono -o output.parquet

# Convert local files
agent-traces convert ./sessions/*.jsonl -o output.parquet

# Batch convert from a file of repo IDs
agent-traces convert --from-file repos.txt --output-dir parsed/

# Inspect a dataset
agent-traces inspect badlogicgames/pi-mono

Supported Formats

Format	Example Dataset	Detection
Pi	`badlogicgames/pi-mono`	`type` field in JSON
Claude Code	`ultralazr/claude-code-traces`	`type` + `uuid`/`sessionId`
Codex	`cfahlgren1/agent-sessions-list`	`type: "item"` + `payload` wrapper
ATIF	`vinhnx90/vtcode-sessions`	First-byte `

Format is auto-detected per file. The agent column in the events table identifies the source format.

Architecture

JSONL lines → msgspec decode → Entry (dataclass)
                                     │
                         ┌───────────┼───────────┐
                         ▼           ▼           ▼
                    sessions     events      content
                   (1 row/     (1 row/     (1 row per
                    session)    entry)      entry with
                                           text only)

The three-table layout avoids the 80%-null problem of a single wide table:

Table	Rows	Columns	Size (pi-mono)
sessions	626	15	64 KB
events	36,791	20	1.4 MB
content	16,841	9	13 MB
flat (legacy)	36,791	44	16 MB

Example Analyses

Error patterns by model

ds = TraceDataset.from_hub("badlogicgames/pi-mono")
ds.error_rate(group_by="model")

Tool usage distribution

ds.tool_counts(group_by="model")

Cost of struggling sessions

s = ds.sessions
struggling = s.filter(pl.col("n_errors") >= 5)
healthy = s.filter(pl.col("n_errors") == 0)
print(f"Struggling: ${struggling['cost_total'].mean():.2f}/session")
print(f"Healthy:    ${healthy['cost_total'].mean():.2f}/session")

Session health predictor

The behavioral features (tool choice ratios, verbosity, error rates) predict session outcomes with AUC 0.84 after just 2 turns — even without seeing error counts. See the blog post for details.

Development

# Install with dev dependencies
uv sync

# Run tests
uv run pytest

# Lint
uv run ruff check .

# Type check
uv run ty check agent_traces/

# Format
uv run black agent_traces/ tests/

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent_traces		agent_traces
tests		tests
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
agent.md		agent.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-traces

Install

Quick Start

Batch Loading

Local Files

CLI

Supported Formats

Architecture

Example Analyses

Error patterns by model

Tool usage distribution

Cost of struggling sessions

Session health predictor

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-traces

Install

Quick Start

Batch Loading

Local Files

CLI

Supported Formats

Architecture

Example Analyses

Error patterns by model

Tool usage distribution

Cost of struggling sessions

Session health predictor

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages