MailWise

Turn email threads into a searchable knowledge base. Parse EML files, index with embeddings, and use RAG to learn how your best engineers analyze issues.

What it does

MailWise reads .eml files (exported from Outlook, Thunderbird, etc.), splits email threads into individual replies, and builds a semantic search index. You can then:

Search for similar past issues using natural language
Analyze new issues with RAG — Claude reads how your experts solved similar problems and synthesizes advice
Tag expert engineers whose replies get boosted in search results and highlighted in output

Why

If your team handles bugs/incidents via email, years of tribal knowledge is buried in threads. MailWise makes that knowledge searchable and actionable.

Quick start

Prerequisites

Python 3.10+
Claude Code (for the analyze command — uses your existing auth, no API key needed)

Install

git clone https://github.com/PetrGuan/MailWise.git
cd MailWise
pip install -e .

Configure

cp config.example.yaml config.yaml

Edit config.yaml with your settings:

eml_directory: /path/to/your/eml/files
database: data/index.db
markdown_directory: markdown
embedding_model: all-MiniLM-L6-v2
expert_boost: 1.5

experts:
  - email: senior.dev@company.com
    name: Jane Doe

Usage

# Index your emails (incremental — only processes new/changed files)
mailwise index

# Search for similar past issues
mailwise search "sync failure after folder migration"

# Search with previews
mailwise search "calendar not updating" --show-body

# Only show expert replies
mailwise search "deleted emails reappear" --expert-only

# Deep analysis — Claude reasons over similar expert threads
mailwise analyze "User reports emails moved to local folder keep reappearing in Inbox"

# View full markdown of a specific email thread
mailwise show 42

# Check index stats
mailwise stats

Managing experts

# Add an expert
mailwise experts add engineer@company.com --name "Jane Doe"

# List all experts
mailwise experts list

# Remove an expert
mailwise experts remove engineer@company.com

How it works

EML files → Parser → Markdown + Embeddings → SQLite index
                                                    ↓
                              Query → Semantic search → Top matches
                                                            ↓
                                          Claude (via RAG) → Expert-informed analysis

Parse: EML files are parsed in parallel and threads are split into individual replies using Outlook-style From:/Sent: delimiters
Clean: Microsoft SafeLinks are unwrapped, mailto artifacts are removed
Markdown: Each thread becomes a structured markdown file with [Expert] tags on replies from your designated engineers
Embed: Each reply is embedded using all-MiniLM-L6-v2 (runs locally, no API calls)
Index: Embeddings and metadata are stored in SQLite for fast retrieval
Search: Cosine similarity with expert score boosting finds relevant past issues
Analyze: Top matches are fed to Claude (via Claude Code CLI) with a system prompt that focuses on expert reasoning patterns

Performance

Designed for large mailboxes (25K+ emails, 16GB+):

Operation	Performance
Incremental check (no changes)	~2-3s for 25K files (stat-based, no file reads)
Full index	~5-10 min (parallel parsing + batch embedding)
Search query	<100ms (single matrix multiply over 100K+ vectors)
RAG analysis	~10-20s (retrieval + Claude response)

Key optimizations:

Two-phase change detection: mtime+size stat check before SHA256 hashing
Parallel EML parsing: multiprocessing with configurable workers
Batch embedding: pre-computed offset arrays, no O(n²) lookups
Optimized search: loads only embedding BLOBs into contiguous numpy array; fetches metadata only for top-k results
SQLite tuning: WAL journal, 64MB cache, 256MB mmap, batch inserts via executemany

Architecture

src/email_issue_indexer/
├── cli.py          # Click-based CLI
├── parser.py       # EML parsing + thread splitting (parallel-safe)
├── markdown.py     # Markdown conversion with expert tags
├── safelinks.py    # Microsoft SafeLinks URL cleaning
├── embeddings.py   # sentence-transformers embeddings + vector search
├── store.py        # SQLite storage layer (performance-tuned)
├── indexer.py      # Parallel batch orchestrator with progress tracking
├── search.py       # Optimized similarity search with expert boosting
└── rag.py          # RAG layer using Claude Code CLI

Privacy

All processing is local:

Embeddings run on your machine (no data sent to any API for indexing)
Email content stays in your local SQLite database and markdown files
The analyze command sends relevant thread excerpts to Claude — same as chatting in Claude Code

Your config.yaml, emails/, data/, and markdown/ directories are gitignored by default. Only config.example.yaml (with no real data) is committed. A pre-commit hook (scripts/install-hooks.sh) scans for accidental PII leaks.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
scripts		scripts
src/email_issue_indexer		src/email_issue_indexer
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SKILL.md		SKILL.md
config.example.yaml		config.example.yaml
mailwise		mailwise
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MailWise

What it does

Why

Quick start

Prerequisites

Install

Configure

Usage

Managing experts

How it works

Performance

Architecture

Privacy

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MailWise

What it does

Why

Quick start

Prerequisites

Install

Configure

Usage

Managing experts

How it works

Performance

Architecture

Privacy

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages