Skip to content

McKern3l/Chatmap

Repository files navigation

Chatmap

Python 3.10+ License: MIT Version

AI red team conversation mapper


What is Chatmap?

Chatmap is a transparent HTTP proxy and analysis toolkit for capturing, indexing, and searching LLM interactions during security assessments. It intercepts API traffic to OpenAI, Anthropic, Ollama, and MCP endpoints, auto-detects sensitive data (credentials, file paths, errors, network breadcrumbs), and stores everything in a searchable SQLite database with full-text search. When the assessment is done, export markdown evidence blocks for your report.

55 built-in detection patterns across 5 tag categories, extensible via YAML. 517 tests. Zero external database dependencies.

Installation

git clone https://github.com/McKern3l/chatmap.git
cd chatmap
pip install -e .

Requires Python 3.10+ and mitmproxy 10.0+.

Quick Start

Start intercepting LLM traffic in three steps:

# 1. Start the proxy
chatmap proxy --port 8888 --session "pentest-2026-04-05"

# 2. Point your LLM client at the proxy
#    export HTTPS_PROXY=http://127.0.0.1:8888

# 3. Search for leaked credentials
chatmap search "api_key" --tag-type credential

Or import existing conversation logs — including Claude Code sessions:

# Import JSONL, JSON, or ChatGPT exports
chatmap import ~/Downloads/conversations.jsonl --session "external"

# Import a Claude Code conversation directly
chatmap import ~/.claude/projects/-Users-me/abc123.jsonl --session "cc-session"

# See what got tagged
chatmap tags --type credential

That's it. No config files, no external databases. SQLite + FTS5 handles everything.

Commands

Command Description
proxy Start mitmproxy intercepting LLM API traffic (--max-db-size for capture limits)
import Load conversation logs from JSONL, JSON, ChatGPT exports, or Claude Code sessions
search Full-text search across all captured messages (with tag/provider filters)
list Show all captured conversations with metadata
show Display a single conversation with messages and detected tags
tags List all auto-detected sensitive patterns (filterable by type)
export Export evidence as markdown, JSON, or CSV (-f json|csv|markdown, --session for bulk, --no-content for lightweight CSV)
diff Compare tag profiles across two capture sessions
stats Summary dashboard with database size, tag breakdown, and provider token usage (--session for per-session)
sessions Manage capture sessions — list, archive, delete, rename
correlate Cross-session tag correlation — find values appearing in multiple sessions
timeline Chronological view of tag detections (per-session or global)
threads Auto-group conversations into threads by endpoint and time window
suppress Manage tag suppression rules — list, add, remove
mcp Start MCP server (JSON-RPC over stdio) for Claude Code integration
setup-ca Configure mitmproxy CA certificate for HTTPS interception

Supported Providers

Chatmap auto-detects and parses structured content from:

Provider Detection Parsed Fields
OpenAI /v1/chat/completions Messages, model, usage tokens, finish reason
Anthropic /v1/messages Messages, model, usage tokens, stop reason
Ollama /api/generate, /api/chat Messages, model, context length
MCP JSON-RPC method calls Method, params, results

Auto-Tagging

The tagger scans every intercepted message for security-relevant patterns:

Category Patterns Detects
Credential 20 API keys (OpenAI, Anthropic, AWS, GitHub, GitLab, GCP, Slack), JWTs, bearer tokens, NTLM/NetNTLMv2 hashes, private keys, connection strings, password assignments
File Path 7 /etc/* configs, home directories, dotfiles, Windows paths, path traversal, temp files
Error 14 Python tracebacks, SQL errors, HTTP 401/403, stack traces (Java, .NET, Go, Node.js)
Breadcrumb 10 URLs, IPv4/IPv6, CIDR ranges, MAC addresses, email addresses, domain names
Sensitive 4 PII patterns, internal hostnames, service discovery indicators

All patterns are tuned for precision: case-sensitive where it matters (API key prefixes), case-insensitive where it helps (error messages). Ordering prevents greedy shorter patterns from shadowing specific ones (e.g., Anthropic sk-ant- before generic sk-).

Architecture

chatmap/
├── cli.py          # Click CLI — 18 commands, lazy imports for fast startup
├── proxy.py        # mitmproxy addon — intercepts LLM traffic by URL pattern
├── parsers.py      # Per-provider request/response extractors (never raise)
├── tagger.py       # 55-pattern regex engine across 5 categories
├── db.py           # SQLite + FTS5 — all DB ops centralized here
├── exporter.py     # Markdown evidence block generator with inline tags
├── importer.py     # JSONL / JSON / ChatGPT format auto-detection
├── models.py       # Pydantic v2 data models
├── config.py       # YAML config loading, suppression rules, defaults
├── mcp_server.py   # MCP server — JSON-RPC 2.0 over stdio
└── ca.py           # Proxy CA certificate setup for HTTPS interception

Storage: SQLite with FTS5 full-text search and WAL journaling. Default location: ~/.chatmap/chatmap.db. Override with --db flag or CHATMAP_DB env var.

Design principles:

  • Parsers never raise on malformed input — return empty dicts and continue
  • Lazy imports in CLI keep chatmap --help fast even with mitmproxy installed
  • All database operations go through db.py — no raw SQL in other modules
  • Pydantic v2 models for all data structures

Credential hygiene: Chatmap detects credentials but doesn't hoard them:

Layer What's redacted Where
Proxy headers Authorization, x-api-key, api-key, x-auth-token, proxy-authorization Replaced with [REDACTED] before DB storage
Tag values Passwords in connection_string and url_credentials patterns redis://user:[REDACTED]@host:6379 — preserves scheme/user/host/port
File permissions Database directory 0700, database file 0600 Owner-only access on multi-user systems

Database Schema

sessions ──┐
            ├── conversations ──┐
            │                   ├── messages ──── tags
            │                   │        │
            │                   │        └── messages_fts (FTS5)
Table Purpose
sessions Proxy capture runs (name, timestamps, metadata)
conversations Grouped API exchanges (endpoint, provider, model)
messages Request/response pairs (headers, body, parsed content, role, tokens)
tags Auto-detected findings (type, value, pattern, confidence)
messages_fts FTS5 virtual table for full-text search on content + raw body

Testing

python3 -m pytest tests/ -v

517 tests across 13 modules (+ 4 performance benchmarks). Tests use real SQLite databases via tmp_path (no DB mocking). Mitmproxy flows are mocked. Sample API payloads in tests/conftest.py. Performance tests (@pytest.mark.slow) validate 10K-message scale operations.

Examples

# Start proxy on custom port with named session
chatmap proxy --host 0.0.0.0 --port 9090 --session "engagement-alpha"

# Start proxy with DB size limit (warns at 500MB)
chatmap proxy --port 8888 --session "big-target" --max-db-size 500

# Search for leaked AWS keys
chatmap search "AKIA" --tag-type credential

# List all Anthropic conversations
chatmap list --provider anthropic

# Show conversation 42 with all messages and tag annotations
chatmap show 42

# Export evidence as markdown, JSON, or CSV
chatmap export 42 55 67 --tag-type credential -o evidence.md
chatmap export 42 -f json -o findings.json
chatmap export 42 -f csv -o findings.csv

# Compare two sessions
chatmap diff 1 2

# Import Claude Code session logs (auto-detected)
chatmap import ~/.claude/projects/-Users-me/session.jsonl

# Import with custom detection patterns
chatmap import ~/Downloads/chatgpt-export.json --session "review" --patterns my-patterns.yaml

# View assessment stats (includes DB size and token usage)
chatmap stats

# Per-session stats
chatmap stats --session 3

# Bulk export entire session as JSON
chatmap export --session 5 -f json -o engagement-report.json

# Lightweight CSV export (tags only, no message content)
chatmap export --session 5 -f csv --no-content -o tags-only.csv

# Cross-session correlation: find reused credentials
chatmap correlate --type credential --min 2

# Drill into a specific API key across sessions
chatmap correlate --value "AKIA1234EXAMPLE5678"

# Timeline of all credential detections since April
chatmap timeline --type credential --since 2026-04-01

# Chronological tag feed for a specific session
chatmap timeline --session 3

# Auto-thread conversations in a session
chatmap threads 1 --window 10

# Suppress noisy tags from known test infrastructure
chatmap suppress add value "sk-test-not-real-key"
chatmap suppress add pattern internal_host

# Install mitmproxy CA for HTTPS interception
chatmap setup-ca

Custom Patterns

Add your own detection patterns via YAML:

# ~/.chatmap/patterns.yaml (auto-loaded) or pass via --patterns flag
patterns:
  credential:
    - pattern: "MYCOMPANY_KEY_[A-Z0-9]{32}"
      label: "mycompany_api_key"
      case_sensitive: true
  breadcrumb:
    - pattern: "internal-[a-z]+-\\d+\\.mycompany\\.com"
      label: "internal_host"

Install PyYAML for custom pattern support: pip install chatmap[custom]

Configuration

Persist defaults in ~/.chatmap/config.yaml:

db_path: ~/.chatmap/chatmap.db
proxy:
  host: 127.0.0.1
  port: 8080
  max_db_size_mb: 500
patterns: ~/.chatmap/patterns.yaml
suppress:
  - value: "192.168.1.10"     # my attack box
  - pattern: internal_hostname  # suppress all internal hostnames

Priority: CLI flags > environment variables > config file > built-in defaults. Pass --config to use a non-default config path.

Session Management

Organize and manage capture runs:

# List all sessions
chatmap sessions list

# Include archived sessions
chatmap sessions list --all

# Rename a session
chatmap sessions rename 3 "engagement-final"

# Archive a completed session
chatmap sessions archive 3

# Delete a session and all its data
chatmap sessions delete 5

Tag Correlation

Find the same credentials, hosts, or paths appearing across multiple sessions — a strong signal that infrastructure is shared or reused:

# Show tag values appearing in 2+ sessions
chatmap correlate

# Raise the threshold
chatmap correlate --min 3

# Filter to credentials only
chatmap correlate --type credential

# Drill into a specific value — see every session where it appeared
chatmap correlate --value "AKIA..."

Timeline

Chronological view of tag detections, useful for understanding when sensitive data was exposed:

# Global timeline: first/last seen for every tag value
chatmap timeline

# Filter to a specific tag type
chatmap timeline --type credential

# Only tags detected after a date
chatmap timeline --since 2026-04-01

# Per-session chronological feed
chatmap timeline --session 3

Conversation Threading

Auto-group conversations into threads by endpoint and time proximity. Useful for tracing multi-turn exchanges:

# Group conversations in session 1 (default: 5-minute window)
chatmap threads 1

# Custom time window (10 minutes)
chatmap threads 1 --window 10

Tag Suppression

Suppress noisy or expected tags via config-based rules. Suppression rules persist in ~/.chatmap/config.yaml:

# List active suppression rules
chatmap suppress list

# Suppress a specific value (e.g., your own test API key)
chatmap suppress add value "sk-test-1234..."

# Suppress an entire tag type
chatmap suppress add type breadcrumb

# Suppress by pattern name
chatmap suppress add pattern internal_host

# Remove a rule
chatmap suppress remove type breadcrumb

MCP Server

Chatmap exposes its search, tags, and stats as an MCP server for integration with Claude Code and other MCP-aware tools:

# Start the MCP server (JSON-RPC over stdio)
chatmap mcp

Add to your project's .mcp.json for Claude Code integration:

{
  "mcpServers": {
    "chatmap": {
      "command": "chatmap",
      "args": ["mcp"]
    }
  }
}

CA Setup

Configure the mitmproxy CA certificate so Chatmap can intercept HTTPS traffic from LLM clients:

# Guided CA certificate installation (detects OS automatically)
chatmap setup-ca

This installs the mitmproxy-generated CA into your system trust store (macOS Keychain, Linux update-ca-certificates, or Windows certutil). Required once per machine for HTTPS interception to work without certificate errors.

Disclaimer

Chatmap is intended for authorized security testing and research only.

Intercepting network traffic requires appropriate authorization. Use of this tool against systems or networks without explicit permission is illegal and unethical. The author is not responsible for misuse.

License

MIT - See LICENSE for details.

Author

pitl0rd / github.com/McKern3l


"Every LLM conversation is a potential evidence trail. Chatmap makes sure you never lose one."

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors