feat(email): LLM-assisted triage classification when heuristic confidence is low

The heuristic classifier caps at ~70% accuracy. Messages where \`confident=False\` are currently classified as \`informational\` — this is wrong. The LLM should classify these messages directly rather than guessing.

**Tasks**
- [ ] Wire LLM classification into \`triage_inbox_impl\` — when \`confident=False\`, send email body to LLM with classification prompt
- [ ] Add structured triage prompt template: \`{category, confidence, reasoning}\` JSON output
- [ ] If LLM classification fails, raise an actionable error (do not silently default to \`informational\`)
- [ ] Benchmark heuristic-only vs heuristic+LLM on test corpus
- [ ] Update integration test \`xfail\` to \`pass\` (\`test_heuristic_triage_meets_baseline_minus_tolerance\`)

**Key files:** \`src/gaia/agents/email/tools/triage_heuristics.py\`, \`src/gaia/agents/email/tools/read_tools.py\`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(email): LLM-assisted triage classification when heuristic confidence is low #1107

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(email): LLM-assisted triage classification when heuristic confidence is low #1107

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions