Deterministic, rule-first email classifier for Fastmail: sweep newsletters, file everything into labels, flag what needs action — dry-run first, never deletes.
fast-classifier was born from a live session that organized a real 6,551-email inbox: 3,849 newsletters swept in one run, 5,573 emails filed into 14 labels, and 87% rule coverage reached by iterating recon reports instead of training anything. The one-off scripts from that session are preserved read-only in reference/; this repo is those scripts hardened into a typed, tested library with a CLI and an MCP server. Every core concept below shipped from that session.
▶ Watch the 20-second demo — the whole pitch in one terminal session (also attached to the v0.1.0 release).
Read this before installing — it is the point of the project.
- Dry-run by default. Every mutating command plans and reports but writes nothing until you pass
--execute. Dry runs print aDRY RUN — no changes will be made (pass --execute to apply)banner, and in dry-run mode the provider is wrapped in a proxy whose mutating methods throw — a planning-pass bug physically cannot mutate mail. - The never-delete guarantee. The
MailProviderinterface has no delete methods and never will. The JMAP layer additionally runtime-asserts that no request carries adestroykey before any bytes leave the process.archive()only removes the Inbox label (and adds Archive) — the worst any bug can do is mislabel or archive mail, never lose it. - Confirmation above 100 mutations. A run planning more than 100 mutations prompts for confirmation.
--yesskips the prompt; a non-interactive run without--yesrefuses the whole batch (sweep/filethen exit 1). --maxcap. Every scanning command accepts a hard cap on emails scanned per run.- Append-only TSV audit + resume.
sweepandfilelog every mutated email id to an append-only TSV before the next batch runs, so an interrupted run resumes without duplicating work. - Tokens live in the environment only. The config loader rejects secret-shaped keys and token-shaped values in config files, and errors/reports pass through a redaction chokepoint.
- MCP-server writes are gated. The MCP server forces every tool into dry-run unless started with
--allow-execute(orFAST_CLASSIFIER_ALLOW_WRITES=1), and marks forced results withforcedDryRun: true.
Not yet on npm — install from a clone (bun ≥ 1.1):
git clone https://github.com/ahmadatallah/fast-classifier.git
cd fast-classifier
bun install
bun run build && bun link # puts `fast-classifier` and `fast-classifier-mcp` on your PATH(Or skip building and run straight from source: bun src/cli/main.ts <command>.)
Get a scoped Fastmail token. In Fastmail: Settings → Privacy & Security → Integrations → API tokens. Scope it to Mail read + write only — no contacts, calendars, files, or sending. Use a read-only token if you only run analyze/plan. Note: the JMAP API token and the MCP endpoint token are different credentials — one does not work against the other's endpoint.
cp .env.example .env # bun loads .env automatically; otherwise export in your shell
export FASTMAIL_API_TOKEN=... # JMAP transport (the default)
export FASTMAIL_MCP_TOKEN=... # MCP transport (-p mcp)The canonical loop — recon, build your rules from your inbox, then execute:
fast-classifier init # writes a starter fast-classifier.config.ts
fast-classifier analyze # read-only recon: top senders and root domains
fast-classifier suggest # match your top domains against the built-in catalog:
# rule suggestions + unknown domains + paste-ready fragment
# paste the accepted suggestions into fast-classifier.config.ts
fast-classifier plan # coverage % + top unmatched senders
# add rules for the unmatched senders and unknown domains, re-run plan — iterate
fast-classifier sweep # dry run — inspect the report
fast-classifier sweep --execute # label + archive bulk mail
fast-classifier file --execute # file everything the rules match
fast-classifier needs-action # score what likely needs a human reply
fast-classifier verify --cleared newsletter@example.com --label 'Inbox/Dev>=1'What: a hand-rolled, dependency-free client for Fastmail's official MCP endpoint (Streamable HTTP): captures the server-issued mcp-session-id header and echoes it on every later call, sends the dual Accept: application/json, text/event-stream header the server requires, parses SSE data: frames by taking the last JSON line, unwraps structuredContent (falling back to the text body), and converts isError results into typed transport errors.
Origin: a faithful, typed port of the session script reference/mcp.mjs that did all 9,400+ mutations.
Quirk it encodes: Fastmail wraps even single results in SSE frames — and the MCP token is a distinct credential from the JMAP token.
What: session discovery (apiUrl + primary mail account), batched methodCalls (Email/query + Email/get chained by back-reference in one round trip), and bulk Mailbox/set label creation with client ids c0, c1, … plus modal-parent inference — new labels nest under whatever parent your existing labels use most.
Origin: the session's label-creation probes (reference/jmapcreate.mjs, colortest.mjs).
Quirks it encodes: created/notCreated partial results are surfaced per label name; label color is not settable over JMAP (the session tried); Fastmail signals per-record throttling as a SetError of type rateLimit inside notUpdated/notCreated rather than HTTP 429 — the client converts it into the same backoff path.
What: a pure function (sender) -> category | null with 5-tier precedence: exact sender → registrable root domain → display-name rules on relay domains → relay account/personal fallbacks → your own domains. No state, no network, no ML.
Origin: 87% of 6,551 emails filed with zero machine learning — just rules iterated against recon reports.
Quirk it encodes: the session's naive slice(-2) domain split called amazon.co.uk "co.uk" — fixed with public-suffix-aware tldts and locked in by regression tests.
What: full-text unsubscribe heuristic in, keep-list out, label + archive. The keep-list is enforced twice: as a server-side notFrom filter and re-checked client-side against every result.
Origin: 3,849 newsletters swept in one run, zero keepers lost.
Quirk it encodes: Fastmail's -from: search operator takes literal addresses only — domain negation is silently ignored — so the client-side re-check is mandatory, not defensive. An empty keep-list logs a warning.
What: the shared read loop with a seen-set (nothing yielded twice), stall detection (consecutive all-seen pages → backoff, then stop), and careful cursor arithmetic split into two modes — drain (cursor advances only past items deliberately skipped) and scan (plain offset paging for read-only passes).
Origin: the session's hard-won answer to a trap: offset paging collapses when you mutate the very result you are paging — archiving shifts the window and the search re-serves already-processed ids.
Shipped refinement: pipelines collect the full plan in scan mode, then execute — so dry-run and --execute see the identical plan.
Quirks it encodes: the 50-item MCP search cap, and servers that answer either with a bare array or { items: [...] }.
What: a weighted keyword score over subject + snippet + sender name: 63 EN+DE phrases at +3 ("action required", "deadline", "Frist", "Mahnung", …), 22 receipt/shipping/newsletter exclusions at −2, unread +1, personal-sender-awaiting-reply +4, threshold 3, over a 60-day window. Origin: surfaced the handful of emails needing a human among thousands. Quirk it encodes: bilingual by necessity — a German inbox scores "Frist" exactly like "deadline".
What: read-only aggregation by sender and public-suffix-aware root domain, coverage %, and the top-unmatched-senders list.
Origin: the flywheel that reached 87% — run plan, add rules for the top unmatched senders, run plan again. No mutations at any point.
What: "a human probably wrote this" = a spaced display name (First Last) AND an address that does not look automated (no-reply, notifications@, billing@, …) AND no big-brand token in address or name.
Origin: the difference between archiving a receipt and archiving your accountant.
What: post-run assertions — label exists / has exactly N / at least N emails, keep-senders still present in the inbox, cleared-senders gone. Read-only by construction. Origin: every session run ended by proving it did what the plan said.
What: mutations chunked at ≤ 50 per batch, 220 ms pacing between chunks, rate-limit retries with growing backoff, 1200 ms stall backoff in the pager, progress logged every 5 chunks, append-only TSV audit, JSON report per command. Origin: 9,400+ mutations without a rate-limit death spiral.
| JMAP (default) | MCP | |
|---|---|---|
| Token env var | FASTMAIL_API_TOKEN |
FASTMAIL_MCP_TOKEN |
| Endpoint | api.fastmail.com/jmap/session |
api.fastmail.com/mcp (official) |
| Page size | 100 | 50 — server hard cap |
| Keep-list negation | full server-side NOT filter |
-from: literal addresses only; re-checked client-side |
| Label creation | explicit Mailbox/set (nested paths, modal parent) |
no create tool — addLabels auto-creates missing labels |
| Search totals | real totals (calculateTotal) |
none (position-only paging) |
Pick with -p jmap / -p mcp or provider.type in config. The tokens are distinct credentials — see Quickstart.
// fast-classifier.config.ts
import { defineConfig } from 'fast-classifier/config'
export default defineConfig({
categories: [{ name: 'Dev', label: 'Inbox/Dev' }],
rules: [{ kind: 'domain', domain: 'github.com', category: 'Dev' }],
keepList: ['important@example.com'],
sweep: { targetLabel: 'Promotion' },
needsAction: { label: 'Needs action', windowDays: 60 },
})Discovery order: explicit --config <path> → fast-classifier.config.ts → .mjs → .js → .json in the working directory → built-in defaults. JSON configs work identically (rule patterns are plain strings in every format, so nothing is lost). Config files must never contain credentials — the loader rejects secret-shaped keys and token-shaped values.
Build your config from your inbox. fast-classifier suggest (or the suggest_rules MCP tool, or suggestRules/toConfigFragment from the library) scans your inbox read-only, matches uncovered root domains against a built-in catalog of well-known senders in 12 generic categories, and prints a paste-ready config fragment — anything not in the catalog is listed as unknown for you to decide. The defaults stay generic on purpose: the needs-action scorer's keyword packs are English-only unless you opt in with needsAction: { languages: ['en', 'de'] } (explicit highKeywords/exclusionKeywords replace the packs), and detection.brandNamePattern defaults to global household names — override it with your regional brands.
See examples/ for a full real-world config — the German-power-user setup that reached 87% coverage, with the bilingual packs and the full brand-pattern override — plus its JSON twin and minimal variants.
fast-classifier-mcp runs an MCP server on stdio: it loads the config from the working directory, connects the configured provider, and reads the matching token from the environment. Register it with an MCP client, e.g.:
claude mcp add fast-classifier \
--env FASTMAIL_API_TOKEN=your-jmap-token \
-- fast-classifier-mcpTools (11): classify_sender, analyze_inbox, plan_classification, suggest_rules, sweep_newsletters, file_inbox, score_needs_action, list_labels, ensure_labels, verify_run, get_effective_config.
Mutating tools default to dryRun: true — the intended agent loop is call dry, inspect the report, then pass dryRun: false. Unless the server was started with --allow-execute (or FAST_CLASSIFIER_ALLOW_WRITES=1), dryRun is forced true regardless of arguments and results carry forcedDryRun: true, so agents can tell the difference between "planned" and "cannot execute". Human-facing output goes to stderr; stdout carries the protocol.
fast-classifier [global options] <command>
Global options (accepted before or after the subcommand):
| Flag | Meaning |
|---|---|
-c, --config <path> |
path to fast-classifier.config.{ts,mjs,js,json} |
-p, --provider <type> |
mail transport: jmap or mcp (default: from config) |
--execute |
apply changes — mutating commands are dry-run by default |
--max <n> |
cap on emails scanned per run |
--yes |
skip confirmation prompts for large mutation batches |
--json |
print the full report JSON to stdout instead of a summary |
--report-dir <dir> |
directory for reports and audit logs (default ./.fast-classifier) |
Commands:
| Command | Description |
|---|---|
analyze |
read-only recon: who fills the inbox, by sender and root domain |
plan |
classify without touching anything: coverage + unmatched senders |
suggest [dir] [--interactive] [--no-interactive] [--write] |
read-only scan: suggest config rules for your senders from the built-in domain catalog, plus a paste-ready config fragment |
sweep |
label + archive bulk mail (keep-list wins; dry-run by default); audits to sweep.log.tsv |
file |
file classified mail into per-category labels (dry-run by default); audits to file.log.tsv |
needs-action [--apply] |
score the recent window for mail needing a human response; --apply tags candidates with the needs-action label (never archives) — tagging requires --apply and --execute |
labels list |
list labels with email totals |
labels ensure <names...> |
create the given labels if missing, nested as 'Parent/Child' (dry-run by default) |
verify [--contains <addrs...>] [--cleared <addrs...>] [--label <specs...>] |
post-run assertions; label specs are 'Name' (exists), 'Name=N' (exactly N), 'Name>=N' (at least N) |
init [dir] [--from-inbox] |
write a starter fast-classifier.config.ts (refuses to overwrite); --from-inbox builds it from your inbox's suggestions instead |
mcp |
run the MCP server on stdio — prefer the standalone fast-classifier-mcp bin |
Every command writes <command>-report.json (credentials redacted) into the report dir and echoes the path on stderr. Exit code 1 on errors, on failed verify checks, and when a sweep/file run is aborted by a declined confirmation.
Everything the session learned the hard way, all encoded in code and tests:
- 50-item search cap (MCP).
search_emailhard-caps at 50 per page; JMAP pages at 100. Expressed ascaps.maxPageSize, clamped everywhere. - Array vs
{ items }. The MCP server answers either with a bare array or{ items: [...] }for the same tool. Adapters normalize both. - ISO-only
after:.after:YYYY-MM-DDis the only date operator Fastmail search reliably accepts. -from:is literal-address-only. Domain negation is silently ignored — keep-lists must be re-checked client-side (caps.serverSideNotFrom: 'address-only').- Spaced values need quoting in the search DSL. Unquoted,
in:Needs actionparses asin:Needsplus a free-text termaction— silently wrong scope. The query builder double-quotes any value containing whitespace. addLabelsauto-creates labels (MCP). There is no create-label tool, butupdate_emailaddLabelsauto-creates missing labels.ensureLabelspre-flights stay as typo protection.- Archive ≠
removeLabels: ['Inbox']. The server rejects removing Inbox viaremoveLabels;archive_email(MCP) / the paired inbox-null + archive-true patch (JMAP) is the sanctioned path. - Label color is not settable via JMAP. The session tried; Fastmail ignores it.
- Distinct tokens. The JMAP API token and the MCP endpoint token are different credentials.
rateLimitSetError backoff. Fastmail signals per-record throttling as aSetErrorof typerateLimitinnotUpdated/notCreated, not HTTP 429. Both are converted into the same retry/backoff path.- MCP handshake details. Capture-and-echo
mcp-session-id, dualAcceptheader required, SSEdata:frames even for single results,structuredContentunwrap with text-body fallback, rate limiting reported as a tool error text, not a status code. - JMAP sort tie-break limitation. No total-order tiebreaker exists among JMAP's standard sort properties, so equal
receivedAtvalues at a page boundary can reorder between calls; the pager's seen-set and audit-resume pattern recover anything stepped over.
bun test # 287 tests, no network — everything runs against MemoryMailProvider
bun run typecheck # strict tsc (exactOptionalPropertyTypes, noUncheckedIndexedAccess)
bun run lint # eslint + prettier check
bun run hygiene # credential/PII gate (also runs in CI)
bun run build # emits dist/ for plain Node (>= 20)reference/ holds the origin session's scripts — read-only ground truth for behavior questions; never edit them.
docs/ is an Astro Starlight site: the narrative guides plus an API reference generated straight from the source by starlight-typedoc — one theme, one sidebar, one search index.
bun install # once, at the repo root (bun workspaces)
cd docs && bunx --bun astro dev # live-reloading docs at localhost:4321
cd docs && bunx --bun astro build # static site in docs/dist/CI builds the site on every push (.github/workflows/docs.yml); deployment to GitHub Pages is parked until the repo is public — the workflow file documents the one-step re-enable.
- Contributing: CONTRIBUTING.md
- Security policy: SECURITY.md
- License: MIT
Built with Claude Code from a real inbox-organizing session.