Turn email threads into a searchable knowledge base. Parse EML files, index with embeddings, and use RAG to learn how your best engineers analyze issues.
MailWise reads .eml files (exported from Outlook, Thunderbird, etc.), splits email threads into individual replies, and builds a semantic search index. You can then:
- Search for similar past issues using natural language
- Analyze new issues with RAG — Claude reads how your experts solved similar problems and synthesizes advice
- Tag expert engineers whose replies get boosted in search results and highlighted in output
If your team handles bugs/incidents via email, years of tribal knowledge is buried in threads. MailWise makes that knowledge searchable and actionable.
- Python 3.10+
- Claude Code (for the
analyzecommand — uses your existing auth, no API key needed)
git clone https://github.com/PetrGuan/MailWise.git
cd MailWise
pip install -e .cp config.example.yaml config.yamlEdit config.yaml with your settings:
eml_directory: /path/to/your/eml/files
database: data/index.db
markdown_directory: markdown
embedding_model: all-MiniLM-L6-v2
expert_boost: 1.5
experts:
- email: senior.dev@company.com
name: Jane Doe# Index your emails (incremental — only processes new/changed files)
mailwise index
# Search for similar past issues
mailwise search "sync failure after folder migration"
# Search with previews
mailwise search "calendar not updating" --show-body
# Only show expert replies
mailwise search "deleted emails reappear" --expert-only
# Deep analysis — Claude reasons over similar expert threads
mailwise analyze "User reports emails moved to local folder keep reappearing in Inbox"
# View full markdown of a specific email thread
mailwise show 42
# Check index stats
mailwise stats# Add an expert
mailwise experts add engineer@company.com --name "Jane Doe"
# List all experts
mailwise experts list
# Remove an expert
mailwise experts remove engineer@company.comEML files → Parser → Markdown + Embeddings → SQLite index
↓
Query → Semantic search → Top matches
↓
Claude (via RAG) → Expert-informed analysis
- Parse: EML files are parsed in parallel and threads are split into individual replies using Outlook-style
From:/Sent:delimiters - Clean: Microsoft SafeLinks are unwrapped, mailto artifacts are removed
- Markdown: Each thread becomes a structured markdown file with
[Expert]tags on replies from your designated engineers - Embed: Each reply is embedded using
all-MiniLM-L6-v2(runs locally, no API calls) - Index: Embeddings and metadata are stored in SQLite for fast retrieval
- Search: Cosine similarity with expert score boosting finds relevant past issues
- Analyze: Top matches are fed to Claude (via Claude Code CLI) with a system prompt that focuses on expert reasoning patterns
Designed for large mailboxes (25K+ emails, 16GB+):
| Operation | Performance |
|---|---|
| Incremental check (no changes) | ~2-3s for 25K files (stat-based, no file reads) |
| Full index | ~5-10 min (parallel parsing + batch embedding) |
| Search query | <100ms (single matrix multiply over 100K+ vectors) |
| RAG analysis | ~10-20s (retrieval + Claude response) |
Key optimizations:
- Two-phase change detection: mtime+size stat check before SHA256 hashing
- Parallel EML parsing: multiprocessing with configurable workers
- Batch embedding: pre-computed offset arrays, no O(n²) lookups
- Optimized search: loads only embedding BLOBs into contiguous numpy array; fetches metadata only for top-k results
- SQLite tuning: WAL journal, 64MB cache, 256MB mmap, batch inserts via
executemany
src/email_issue_indexer/
├── cli.py # Click-based CLI
├── parser.py # EML parsing + thread splitting (parallel-safe)
├── markdown.py # Markdown conversion with expert tags
├── safelinks.py # Microsoft SafeLinks URL cleaning
├── embeddings.py # sentence-transformers embeddings + vector search
├── store.py # SQLite storage layer (performance-tuned)
├── indexer.py # Parallel batch orchestrator with progress tracking
├── search.py # Optimized similarity search with expert boosting
└── rag.py # RAG layer using Claude Code CLI
All processing is local:
- Embeddings run on your machine (no data sent to any API for indexing)
- Email content stays in your local SQLite database and markdown files
- The
analyzecommand sends relevant thread excerpts to Claude — same as chatting in Claude Code
Your config.yaml, emails/, data/, and markdown/ directories are gitignored by default. Only config.example.yaml (with no real data) is committed. A pre-commit hook (scripts/install-hooks.sh) scans for accidental PII leaks.
MIT