feat(email): voice/style-matched drafting from Sent history (#1607) by kovtcharov · Pull Request #1925 · amd/gaia

kovtcharov · 2026-07-01T22:14:22Z

Closes #1607.

Draft replies came out as neutral boilerplate no matter how the user actually writes. Now the agent can learn the user's voice from their own Sent mail — usual greeting, sign-off, typical length, formality — and every drafted reply comes out sounding like them, while staying approval-gated (#1264): building the profile and drafting trigger zero send side-effects, with a regression test. Local-first by construction: only derived features are persisted to the agent's SQLite state.db (never raw Sent content), and greeting/sign-off extraction matches a closed vocabulary, so hostile text in a Sent body cannot reach the system prompt.

Per the issue notes, the profile plugs into the draft-composition prompt (the LLM composes draft_reply bodies in the agent loop), not into draft_reply itself — the sidecar REST contract is unchanged.

Test plan

PYTHONPATH=$PWD/hub/agents/python/email python -m pytest hub/agents/python/email/tests/test_email_voice_profile.py -q — 16 new tests: analyzer (greeting/sign-off/length/formality, quoted-text stripping, curly apostrophes, long signature blocks), persistence round-trip + restart survival, SENT-label-only sampling, prompt injection before/after build, no-raw-content-persisted sentinel, and the no-send-side-effect acceptance test.
Full email sweep (hub package tests + tests/unit/agents/test_email_agent*.py + tests/integration/test_never_auto_send.py + tests/unit/agents/email/) — 748 passed; the 2 failures are pre-existing environment skew, fail identically on a clean tree.
black/isort clean on all touched files; util/lint.py --all passes.

Deliberately deferred

Judge-scored style_match eval AC: needs the email eval harness extended (draft-quality judging against the feat(eval): generate + commit labelled email-triage corpus + gemma4 baseline #1230 corpus) plus a Strix Halo hardware run — shipping an unrunnable scenario file would be half-finished work. Proposing a follow-up issue for it.
Multi-mailbox voice routing: with several mailboxes connected, the prompt uses the most-recently-built profile rather than routing per-message; ties into the feat(email): derive triage mailbox(es) from connected connectors (drop GAIA_EMAIL_PROVIDER; multi-inbox scan) #1603 multi-inbox work.

Draft replies came out as neutral boilerplate regardless of how the user actually writes. Now `build_voice_profile` samples recent Sent mail, derives a local style profile (usual greeting, sign-off, typical length, formality signals), and stores it in the agent's SQLite state.db; the system prompt carries that guidance every turn, so LLM-composed draft bodies come out in the user's own voice. `clear_voice_profile` forgets it. Local-first by construction: only derived features are persisted — never raw Sent content — and greeting/sign-off values are matched against a closed vocabulary, so hostile text in a Sent body cannot reach the system prompt. Drafts stay approval-gated (#1264); building the profile and drafting trigger no send side-effect, with a regression test.

github-actions · 2026-07-01T22:17:43Z

Verdict: Approve ✅

This teaches the email agent to draft in the user's own voice by learning greeting/sign-off/length/formality from their Sent mail, gated behind approval as before. The design is genuinely local-first: only derived features (fixed-vocabulary phrases + a few numbers) are persisted, never raw Sent content — and there's a sentinel test proving a private sentence from a Sent body never lands in the stored profile.

The headline risk for a feature like this is a hostile Sent email smuggling text into the system prompt (prompt injection). That's cleanly avoided here: the analyzer only emits matches from a closed set of greeting/sign-off words plus word-count/contraction/exclamation stats — arbitrary body text has no path into the prompt. Verified against the code and the tests. No blocking issues; one optional logging nit below.

🔍 Technical details

Strengths

Injection surface closed by construction. analyze_sent_bodies (voice_profile.py:425) only ever stores group(1) of the closed _GREETING_RE/_SIGNOFF_RE alternations plus ints/bools/floats — no free-text from the body reaches profile_json or render_style_guidance. test_stored_profile_has_no_raw_sent_content locks this in with a sentinel. This is the right way to build the feature.
Persistence is loss-safe. save_voice_profile (action_store.py:67) does update-then-insert with a clear WHY comment, so a crash between statements can't drop an existing profile. _get_system_prompt re-reads per turn, so clear_voice_profile takes effect immediately.
Test coverage is thorough — analyzer edge cases (quoted-text stripping, curly apostrophes, sign-off under a long signature block), SENT-label-only sampling, restart survival, and the no-send-side-effect acceptance test extending the feat(email): send with confirmation (never auto-send) #1264 invariant.
Docs kept in sync. docs/guides/email.mdx updated; REST surfaces (openapi.email.json, specification.html) correctly untouched since these are agent-loop tools, not sidecar endpoints — matching the PR's "REST contract unchanged" claim.

🟢 Minor (optional): expected errors log full tracebacks (tools/voice_tools.py:321, :336) — the broad except Exception: log.exception(...) also catches the two expected ValueError paths (no Sent history, no usable bodies), emitting an ERROR-level traceback for what is really a user-actionable message. If this matches the existing email-tool convention, leave it; otherwise a narrower except ValueError returning the envelope without log.exception would keep logs quieter. Not blocking.

Note (no action needed): build_voice_profile_impl does one get_message per sample (N+1, up to 50 calls on first run) — already called out in the code comment and acceptable for a one-time "learn my style" action.

kovtcharov requested a review from kovtcharov-amd as a code owner July 1, 2026 22:14

github-actions Bot added documentation Documentation changes tests Test changes agent::email Email agent changes labels Jul 1, 2026

kovtcharov enabled auto-merge July 1, 2026 22:40

itomek approved these changes Jul 2, 2026

View reviewed changes

kovtcharov added this pull request to the merge queue Jul 2, 2026

Merged via the queue into main with commit 1ff68ca Jul 2, 2026
38 checks passed

kovtcharov deleted the claudia/task-b9c5d931 branch July 2, 2026 12:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(email): voice/style-matched drafting from Sent history (#1607)#1925

feat(email): voice/style-matched drafting from Sent history (#1607)#1925
kovtcharov merged 1 commit into
mainfrom
claudia/task-b9c5d931

kovtcharov commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kovtcharov commented Jul 1, 2026

Test plan

Deliberately deferred

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants