Demo: Voice Agent — memos, meeting transcription, and voice queries (ASR + TTS + DatabaseMixin)

## Summary

Create a **Voice Agent** that combines voice memo dictation, meeting transcription, and voice-based querying into a single domain agent. This agent handles all voice-first workflows — it records, transcribes, **uses LLM to automatically clean and improve transcriptions** (similar to [Wispr Flow](https://wisprflow.ai)), **exports notes as markdown**, **auto-labels and categorizes entries**, stores in a database, answers questions about stored content, and provides a **simple web UI for browsing and viewing notes**. Uses Lemonade v9.4.1 streaming ASR, TTS, and reranking.

---

## LLM-Powered Transcription Enhancement

Raw ASR output is noisy — filler words, missing punctuation, run-on sentences, misheard terms. The Voice Agent pipes every transcription through an LLM post-processing step before storage:

### Enhancement Pipeline

```
Microphone → Lemonade ASR (raw transcript) → LLM Enhancement → Markdown Formatting → Database Storage + .md File Export
```

### What the LLM Fixes

| Issue | Raw ASR | After LLM Enhancement |
|-------|---------|----------------------|
| Filler words | "So um we decided to uh use the Flux model" | "We decided to use the Flux model" |
| Punctuation | "launch target is march 15th budget approved for two gpus" | "Launch target is March 15th. Budget approved for two GPUs." |
| Grammar | "me and the team was discussing" | "The team and I were discussing" |
| Proper nouns | "we're using lennon aid server" | "We're using Lemonade Server" |
| Technical terms | "the cue wen model" | "the Qwen model" |

### Enhancement Modes

| Mode | Behavior | Use Case |
|------|----------|----------|
| `clean` (default) | Remove fillers, fix punctuation/grammar, preserve meaning exactly | Quick memos |
| `structured` | Clean + organize into sections/bullet points with headings | Meeting minutes |
| `verbatim` | No LLM processing, raw ASR output | Legal/compliance recording |

### Context-Aware Enhancement

- **Domain vocabulary:** Custom word list (e.g., "Lemonade", "Qwen", "GAIA", "NPU") stored in `vocabulary` table
- **Previous entries:** Recent entries provide context for ambiguous terms
- **User corrections:** When user manually corrects a transcription, agent learns the correction

### Command Mode

Voice-edit stored content:

```
User: [speaks] "Edit memo 12 — make it more concise"
Agent: [rewrites via LLM] "Updated memo #12."

User: [speaks] "Turn memo 13 into bullet points"
Agent: [reformats via LLM] "Done — memo #13 reformatted."

User: [speaks] "Fix the spelling of Lemonade in all my memos"
Agent: [batch-corrects] "Fixed 3 occurrences across memos #8, #11, and #12."
```

---

## Markdown Note Export

All entries are stored both in the database AND exported as markdown files for portability and readability.

### Markdown Output Format

**Memos** (`~/.gaia/voice/notes/memo_012.md`):
```markdown
---
id: 12
type: memo
title: Design Team Meeting
labels: [gpu, budget, launch, infrastructure]
category: engineering
created: 2026-02-27T14:05:00
enhancement: clean
---

# Design Team Meeting

Meeting with design team. Decided to use the Flux model for the image
pipeline. Launch target is March 15th. Budget approved for two additional GPUs.

**Tags:** gpu, budget, launch, infrastructure
```

**Meetings** (`~/.gaia/voice/notes/meeting_007.md`):
```markdown
---
id: 7
type: meeting
title: Q2 Planning
labels: [roadmap, npu, gpu, budget, action-items]
category: planning
duration: 31 min
word_count: 4230
created: 2026-02-27T14:01:00
enhancement: structured
---

# Q2 Planning — Feb 27, 2026

## Attendees
(auto-detected from transcript if speaker identification available)

## Discussion

### NPU Optimization
- Work is ahead of schedule
- Performance targets exceeded on Ryzen AI 300 series

### Infrastructure Budget
- Approved two additional GPUs for inference cluster
- Sarah to handle procurement

## Action Items
- [ ] Sarah: Prepare customer demo by March 10th
- [ ] Team: Finalize Q2 milestones by March 3rd

**Tags:** roadmap, npu, gpu, budget, action-items
```

### Export Behavior

- **Auto-export:** Every entry automatically saved as `.md` in `~/.gaia/voice/notes/`
- **Sync:** Database is source of truth; markdown files regenerated on edit
- **Batch export:** `gaia voice --export ./my-notes/` exports all entries as markdown
- **Custom template:** Users can override the markdown template

---

## Auto-Labeling and Categorization

The LLM automatically assigns labels and a category to each entry upon creation.

### Label Generation

```
LLM prompt: "Given this note, generate 3-6 short labels (1-2 words each)
that capture the key topics. Return as comma-separated list."

Input: "Meeting with design team. Decided to use the Flux model..."
Output: "gpu, budget, launch, flux-model, design-team, infrastructure"
```

### Category System

Predefined categories (LLM selects the best match):

| Category | Description | Example |
|----------|-------------|---------|
| `engineering` | Technical decisions, code, architecture | "Decided to use Flux model" |
| `planning` | Roadmaps, timelines, milestones | "Q2 planning meeting" |
| `action-items` | Tasks, to-dos, assignments | "Sarah to prepare demo" |
| `ideas` | Brainstorming, feature proposals | "What if we added voice to SD agent" |
| `reference` | Facts, specs, documentation notes | "NPU supports INT8 quantization" |
| `personal` | Personal notes, reminders | "Pick up groceries" |
| `other` | Anything else | Fallback |

### Database Schema for Labels

```sql
CREATE TABLE labels (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL UNIQUE,
    color TEXT,              -- hex color for UI display
    entry_count INTEGER DEFAULT 0
);

CREATE TABLE entry_labels (
    entry_id INTEGER REFERENCES entries(id),
    label_id INTEGER REFERENCES labels(id),
    PRIMARY KEY (entry_id, label_id)
);
```

### Querying by Label

```
$ gaia voice --search --label gpu
  #12  Design team meeting    [gpu, budget, launch]         Feb 27
  #7   Q2 Planning            [roadmap, npu, gpu, budget]   Feb 27

$ gaia voice --search --category planning
  #7   Q2 Planning            [roadmap, npu, gpu, budget]   Feb 27
  #3   Sprint Retrospective   [sprint, velocity, planning]  Feb 21
```

---

## Simple Web UI for Viewing Notes

A lightweight web viewer served by GAIA's existing FastAPI server, following the same HTML template pattern used by the summarize app (`src/gaia/apps/summarize/templates/`).

### UI Features

```
┌──────────────────────────────────────────────────────────────────┐
│  GAIA Voice Notes                              🔍 Search...      │
├──────────────┬───────────────────────────────────────────────────┤
│              │                                                   │
│  CATEGORIES  │  # Design Team Meeting                           │
│  ───────────│  📅 Feb 27, 2026  ·  memo  ·  42 words           │
│  All (24)    │                                                   │
│  engineering │  Meeting with design team. Decided to use the    │
│  planning    │  Flux model for the image pipeline. Launch       │
│  action-items│  target is March 15th. Budget approved for two   │
│  ideas       │  additional GPUs.                                │
│  reference   │                                                   │
│              │  Labels: gpu  budget  launch  infrastructure      │
│  LABELS      │                                                   │
│  ───────────│  ┌─────────┐ ┌──────────┐ ┌────────┐            │
│  gpu (3)     │  │ ✏️ Edit  │ │ 📋 Copy MD│ │ 🗑 Del  │            │
│  budget (2)  │  └─────────┘ └──────────┘ └────────┘            │
│  launch (2)  │                                                   │
│  npu (1)     │───────────────────────────────────────────────── │
│              │                                                   │
│  RECENT      │  # Q2 Planning                                   │
│  ───────────│  📅 Feb 27, 2026  ·  meeting  ·  31 min          │
│  Meeting #7  │                                                   │
│  Memo #13    │  ## NPU Optimization                             │
│  Memo #12    │  - Work is ahead of schedule...                  │
│              │                                                   │
└──────────────┴───────────────────────────────────────────────────┘
```

### Implementation

| File | Content |
|------|---------|
| `src/gaia/apps/voice/webui/index.html` | Single-page app with sidebar (categories, labels, recent) + main content area (rendered markdown) |
| `src/gaia/apps/voice/webui/style.css` | Clean, minimal styling — dark/light mode |
| `src/gaia/apps/voice/app.config.json` | Electron app config (window size, dev server port) |
| `src/gaia/api/voice_endpoints.py` | FastAPI endpoints for the UI |

### REST Endpoints (served by GAIA API)

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/voice/entries` | List all entries (filterable by type, category, label) |
| `GET` | `/api/voice/entries/{id}` | Get single entry with rendered markdown |
| `GET` | `/api/voice/entries/{id}/raw` | Get raw markdown source |
| `PUT` | `/api/voice/entries/{id}` | Update entry (edit content, labels, category) |
| `DELETE` | `/api/voice/entries/{id}` | Delete entry |
| `GET` | `/api/voice/labels` | List all labels with counts |
| `GET` | `/api/voice/categories` | List categories with counts |
| `GET` | `/api/voice/search?q=...&label=...&category=...` | Search entries |
| `GET` | `/api/voice/export` | Export all entries as zip of markdown files |

### UI Technology

- **Pure HTML/CSS/JS** — no React/build step required (matches summarize app pattern)
- Markdown rendered client-side via lightweight library (e.g., `marked.js`)
- Responsive layout for desktop and mobile
- YAML frontmatter displayed as metadata badges
- **Access via:** `gaia voice --ui` opens in browser, or load in Electron via app config

---

## Demo Scenarios

### Voice Memos
```
$ gaia voice

User: [speaks] "New memo um meeting with design team so we decided to use
       the flux model for the image pipeline uh launch target is march 15th
       and budget approved for two additional gpus"
Agent: [transcribes → LLM cleans → auto-labels → saves to DB + markdown]
       "Saved memo #12 — Design team meeting
        Labels: gpu, budget, launch, infrastructure
        Category: engineering
        Exported to ~/.gaia/voice/notes/memo_012.md"
```

### Meeting Transcription
```
$ gaia voice --meeting "Q2 Planning"

Agent: Recording... (live captions displayed)
  [14:01] "Welcome everyone. Today we're discussing the Q2 roadmap."
  [14:15] "Budget approved for two additional GPUs."
  [Ctrl+C to stop]

Agent: Meeting saved — 4,230 words, 31 minutes.
       Labels: roadmap, npu, gpu, budget, action-items
       Category: planning
       Exported to ~/.gaia/voice/notes/meeting_007.md
```

### Browse UI
```
$ gaia voice --ui
Agent: Voice Notes UI running at http://localhost:8080/voice
```

### List, Search & Edit
```
$ gaia voice --list
  TYPE      ID   TITLE                  CATEGORY      LABELS                        DATE
  meeting   #7   Q2 Planning            planning      roadmap, npu, gpu, budget     Feb 27
  memo      #13  Customer demo prep     action-items  demo, customer, preparation   Feb 27
  memo      #12  Design team meeting    engineering   gpu, budget, launch           Feb 27

$ gaia voice --search "GPU budget"
  Meeting #7 [14:15]: "Budget approved for two additional GPUs..."

$ gaia voice --search --label gpu
  #12  Design team meeting    [gpu, budget, launch]
  #7   Q2 Planning            [roadmap, npu, gpu, budget]

$ gaia voice --export ./my-notes/
  Exported 24 entries to ./my-notes/ (14 memos, 10 meetings)
```

---

## Architecture

```python
class VoiceAgent(Agent, DatabaseMixin):
    """Voice-first agent with LLM-enhanced transcription, auto-labeling,
    markdown export, database storage, and semantic search."""

    def __init__(self, db_path=".gaia/voice.db", notes_dir="~/.gaia/voice/notes", **kwargs):
        super().__init__(**kwargs)
        self.init_db(db_path)
        self.notes_dir = Path(notes_dir).expanduser()
        self.notes_dir.mkdir(parents=True, exist_ok=True)

        if not self.table_exists("entries"):
            self.execute("""
                CREATE TABLE entries (
                    id INTEGER PRIMARY KEY AUTOINCREMENT,
                    type TEXT NOT NULL,              -- 'memo' or 'meeting'
                    title TEXT,
                    content_raw TEXT NOT NULL,        -- original ASR output
                    content TEXT NOT NULL,            -- LLM-enhanced version
                    content_markdown TEXT NOT NULL,   -- full markdown with frontmatter
                    category TEXT DEFAULT 'other',
                    enhancement_mode TEXT DEFAULT 'clean',
                    duration_seconds INTEGER,
                    word_count INTEGER,
                    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
                    updated_at TEXT DEFAULT CURRENT_TIMESTAMP
                )
            """)

        if not self.table_exists("segments"):
            self.execute("""
                CREATE TABLE segments (
                    id INTEGER PRIMARY KEY AUTOINCREMENT,
                    entry_id INTEGER REFERENCES entries(id),
                    timestamp_offset REAL,
                    text_raw TEXT NOT NULL,
                    text TEXT NOT NULL,
                    speaker TEXT
                )
            """)

        if not self.table_exists("labels"):
            self.execute("""
                CREATE TABLE labels (
                    id INTEGER PRIMARY KEY AUTOINCREMENT,
                    name TEXT NOT NULL UNIQUE,
                    color TEXT,
                    entry_count INTEGER DEFAULT 0
                )
            """)

        if not self.table_exists("entry_labels"):
            self.execute("""
                CREATE TABLE entry_labels (
                    entry_id INTEGER REFERENCES entries(id),
                    label_id INTEGER REFERENCES labels(id),
                    PRIMARY KEY (entry_id, label_id)
                )
            """)

        if not self.table_exists("vocabulary"):
            self.execute("""
                CREATE TABLE vocabulary (
                    id INTEGER PRIMARY KEY AUTOINCREMENT,
                    term TEXT NOT NULL UNIQUE,
                    correction TEXT,
                    context TEXT
                )
            """)
```

---

## Agent Capabilities

| Capability | Description |
|------------|-------------|
| **Memo dictation** | Voice notes → LLM-enhanced transcription → auto-title → auto-label → categorize → store + export md |
| **Meeting recording** | Long-form recording with live captions, timestamped segments, structured markdown output |
| **LLM enhancement** | Filler removal, punctuation, grammar, proper noun correction, formatting |
| **Auto-labeling** | LLM generates 3-6 topic labels per entry, stored in normalized label table |
| **Auto-categorization** | LLM assigns category (engineering, planning, action-items, ideas, reference, personal) |
| **Markdown export** | Every entry exported as `.md` with YAML frontmatter; batch export supported |
| **Command mode** | Voice-edit stored content ("make this concise", "relabel this", "change category") |
| **Custom vocabulary** | Domain-specific word list for proper noun correction |
| **Voice queries** | Ask questions about stored content via reranking + LLM |
| **File import** | Transcribe pre-recorded WAV/audio files via REST endpoint |
| **Web UI** | Browse, search, filter by label/category, view rendered markdown |
| **List & search** | CLI: browse and search with label/category filters |

---

## Demo Deliverables

| File | Content |
|------|---------|
| `src/gaia/agents/voice/agent.py` | `VoiceAgent(Agent, DatabaseMixin)` — full agent implementation |
| `src/gaia/agents/voice/prompts.py` | System prompts for enhancement, labeling, categorization |
| `src/gaia/agents/voice/markdown.py` | Markdown generation with YAML frontmatter |
| `src/gaia/apps/voice/webui/index.html` | Single-page note viewer (HTML/CSS/JS) |
| `src/gaia/apps/voice/webui/style.css` | UI styling (dark/light mode) |
| `src/gaia/apps/voice/app.config.json` | Electron app config |
| `src/gaia/api/voice_endpoints.py` | FastAPI REST endpoints for UI |
| `src/gaia/cli.py` | `gaia voice` subcommand with all flags |
| `examples/voice_agent_demo.md` | Walkthrough of all workflows |
| `tests/unit/test_voice_agent.py` | Unit tests (mocked ASR, LLM, database) |

---

## What This Exercises

- **Streaming ASR** (real-time transcription via WebSocket) — new v9.4.1
- **Streaming TTS** (voice responses via audio/speech) — new v9.4.1
- **REST audio transcription** (file import via `/audio/transcriptions`) — new v9.4.1
- **Reranking** (accurate search over stored content) — new v9.4.1
- **LLM chat completions** (enhancement, labeling, categorization, command mode) — existing
- **DatabaseMixin** (structured storage with SQLite, normalized labels) — existing
- **FastAPI** (REST endpoints for UI) — existing
- **Electron** (optional desktop app wrapper) — existing
- Auto-detection of Lemonade audio backends — new (#386)

---

## LLM Prompts

### Enhancement Prompt
```
You are a transcription enhancer. Given raw speech-to-text output, produce
clean, well-formatted text.

Rules:
- Remove filler words (um, uh, like, you know, so)
- Add proper punctuation and capitalization
- Fix obvious grammar errors while preserving the speaker's intent
- Correct known terms from the vocabulary list: {vocabulary}
- Do NOT add information that wasn't spoken
- Do NOT change the meaning or omit substantive content

Enhancement mode: {mode}
- clean: Fix errors, preserve structure
- structured: Fix errors + organize into sections/bullets
- verbatim: Return as-is (no changes)

Raw transcription:
{raw_text}
```

### Labeling Prompt
```
Given this note, generate 3-6 short labels (1-2 words each) that capture
the key topics. Return as a JSON array of strings.

Note: {content}
Output: ["label1", "label2", ...]
```

### Categorization Prompt
```
Classify this note into exactly one category.
Categories: engineering, planning, action-items, ideas, reference, personal, other

Note: {content}
Output: category_name
```

---

## Dependencies

- #386 (TalkSDK auto-detection integration)
- #372 (streaming ASR)
- #373 (server-side TTS)
- #374 (REST audio transcription — for file import)
- #375 (reranking — for accurate search)

## Acceptance Criteria

- [ ] Single VoiceAgent handles memos, meetings, editing, and queries
- [ ] LLM enhancement pipeline cleans raw ASR output before storage
- [ ] Both `content_raw` and `content` (enhanced) stored in database
- [ ] Three enhancement modes: clean, structured, verbatim
- [ ] Auto-labeling generates 3-6 topic labels per entry
- [ ] Auto-categorization assigns one of 7 categories
- [ ] Labels stored in normalized table with counts
- [ ] Every entry exported as markdown with YAML frontmatter
- [ ] Batch export via `gaia voice --export`
- [ ] Command mode allows voice-editing stored content
- [ ] Custom vocabulary table improves domain-specific transcription
- [ ] Web UI serves via FastAPI, browsable at `/voice`
- [ ] UI supports: list, search, filter by label/category, view rendered markdown
- [ ] REST endpoints for CRUD on entries, labels, categories
- [ ] CLI supports `--list`, `--search`, `--label`, `--category`, `--export`, `--ui`
- [ ] Unit tests pass with mocked ASR, LLM, and database
- [ ] Demo walkthrough documented

Method	Path	Description
`GET`	`/api/voice/entries`	List all entries (filterable by type, category, label)
`GET`	`/api/voice/entries/{id}`	Get single entry with rendered markdown
`GET`	`/api/voice/entries/{id}/raw`	Get raw markdown source
`PUT`	`/api/voice/entries/{id}`	Update entry (edit content, labels, category)
`DELETE`	`/api/voice/entries/{id}`	Delete entry
`GET`	`/api/voice/labels`	List all labels with counts
`GET`	`/api/voice/categories`	List categories with counts
`GET`	`/api/voice/search?q=...&label=...&category=...`	Search entries
`GET`	`/api/voice/export`	Export all entries as zip of markdown files

File	Content
`src/gaia/agents/voice/agent.py`	`VoiceAgent(Agent, DatabaseMixin)` — full agent implementation
`src/gaia/agents/voice/prompts.py`	System prompts for enhancement, labeling, categorization
`src/gaia/agents/voice/markdown.py`	Markdown generation with YAML frontmatter
`src/gaia/apps/voice/webui/index.html`	Single-page note viewer (HTML/CSS/JS)
`src/gaia/apps/voice/webui/style.css`	UI styling (dark/light mode)
`src/gaia/apps/voice/app.config.json`	Electron app config
`src/gaia/api/voice_endpoints.py`	FastAPI REST endpoints for UI
`src/gaia/cli.py`	`gaia voice` subcommand with all flags
`examples/voice_agent_demo.md`	Walkthrough of all workflows
`tests/unit/test_voice_agent.py`	Unit tests (mocked ASR, LLM, database)

Issue	Raw ASR	After LLM Enhancement
Filler words	"So um we decided to uh use the Flux model"	"We decided to use the Flux model"
Punctuation	"launch target is march 15th budget approved for two gpus"	"Launch target is March 15th. Budget approved for two GPUs."
Grammar	"me and the team was discussing"	"The team and I were discussing"
Proper nouns	"we're using lennon aid server"	"We're using Lemonade Server"
Technical terms	"the cue wen model"	"the Qwen model"

Mode	Behavior	Use Case
`clean` (default)	Remove fillers, fix punctuation/grammar, preserve meaning exactly	Quick memos
`structured`	Clean + organize into sections/bullet points with headings	Meeting minutes
`verbatim`	No LLM processing, raw ASR output	Legal/compliance recording

Category	Description	Example
`engineering`	Technical decisions, code, architecture	"Decided to use Flux model"
`planning`	Roadmaps, timelines, milestones	"Q2 planning meeting"
`action-items`	Tasks, to-dos, assignments	"Sarah to prepare demo"
`ideas`	Brainstorming, feature proposals	"What if we added voice to SD agent"
`reference`	Facts, specs, documentation notes	"NPU supports INT8 quantization"
`personal`	Personal notes, reminders	"Pick up groceries"
`other`	Anything else	Fallback

File	Content
`src/gaia/apps/voice/webui/index.html`	Single-page app with sidebar (categories, labels, recent) + main content area (rendered markdown)
`src/gaia/apps/voice/webui/style.css`	Clean, minimal styling — dark/light mode
`src/gaia/apps/voice/app.config.json`	Electron app config (window size, dev server port)
`src/gaia/api/voice_endpoints.py`	FastAPI endpoints for the UI

Capability	Description
Memo dictation	Voice notes → LLM-enhanced transcription → auto-title → auto-label → categorize → store + export md
Meeting recording	Long-form recording with live captions, timestamped segments, structured markdown output
LLM enhancement	Filler removal, punctuation, grammar, proper noun correction, formatting
Auto-labeling	LLM generates 3-6 topic labels per entry, stored in normalized label table
Auto-categorization	LLM assigns category (engineering, planning, action-items, ideas, reference, personal)
Markdown export	Every entry exported as `.md` with YAML frontmatter; batch export supported
Command mode	Voice-edit stored content ("make this concise", "relabel this", "change category")
Custom vocabulary	Domain-specific word list for proper noun correction
Voice queries	Ask questions about stored content via reranking + LLM
File import	Transcribe pre-recorded WAV/audio files via REST endpoint
Web UI	Browse, search, filter by label/category, view rendered markdown
List & search	CLI: browse and search with label/category filters

Demo: Voice Agent — memos, meeting transcription, and voice queries (ASR + TTS + DatabaseMixin) #389

Description

Summary

LLM-Powered Transcription Enhancement

Enhancement Pipeline

What the LLM Fixes

Enhancement Modes

Context-Aware Enhancement

Command Mode

Markdown Note Export

Markdown Output Format

Export Behavior

Auto-Labeling and Categorization

Label Generation

Category System

Database Schema for Labels

Querying by Label

Simple Web UI for Viewing Notes

UI Features

Implementation

REST Endpoints (served by GAIA API)

UI Technology

Demo Scenarios

Voice Memos

Meeting Transcription

Browse UI

List, Search & Edit

Architecture

Agent Capabilities

Demo Deliverables

What This Exercises

LLM Prompts

Enhancement Prompt

Labeling Prompt

Categorization Prompt

Dependencies

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions