You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create a Voice Agent that combines voice memo dictation, meeting transcription, and voice-based querying into a single domain agent. This agent handles all voice-first workflows — it records, transcribes, uses LLM to automatically clean and improve transcriptions (similar to Wispr Flow), exports notes as markdown, auto-labels and categorizes entries, stores in a database, answers questions about stored content, and provides a simple web UI for browsing and viewing notes. Uses Lemonade v9.4.1 streaming ASR, TTS, and reranking.
LLM-Powered Transcription Enhancement
Raw ASR output is noisy — filler words, missing punctuation, run-on sentences, misheard terms. The Voice Agent pipes every transcription through an LLM post-processing step before storage:
"launch target is march 15th budget approved for two gpus"
"Launch target is March 15th. Budget approved for two GPUs."
Grammar
"me and the team was discussing"
"The team and I were discussing"
Proper nouns
"we're using lennon aid server"
"We're using Lemonade Server"
Technical terms
"the cue wen model"
"the Qwen model"
Enhancement Modes
Mode
Behavior
Use Case
clean (default)
Remove fillers, fix punctuation/grammar, preserve meaning exactly
Quick memos
structured
Clean + organize into sections/bullet points with headings
Meeting minutes
verbatim
No LLM processing, raw ASR output
Legal/compliance recording
Context-Aware Enhancement
Domain vocabulary: Custom word list (e.g., "Lemonade", "Qwen", "GAIA", "NPU") stored in vocabulary table
Previous entries: Recent entries provide context for ambiguous terms
User corrections: When user manually corrects a transcription, agent learns the correction
Command Mode
Voice-edit stored content:
User: [speaks] "Edit memo 12 — make it more concise"
Agent: [rewrites via LLM] "Updated memo #12."
User: [speaks] "Turn memo 13 into bullet points"
Agent: [reformats via LLM] "Done — memo #13 reformatted."
User: [speaks] "Fix the spelling of Lemonade in all my memos"
Agent: [batch-corrects] "Fixed 3 occurrences across memos #8, #11, and #12."
Markdown Note Export
All entries are stored both in the database AND exported as markdown files for portability and readability.
Markdown Output Format
Memos (~/.gaia/voice/notes/memo_012.md):
---id: 12type: memotitle: Design Team Meetinglabels: [gpu, budget, launch, infrastructure]category: engineeringcreated: 2026-02-27T14:05:00enhancement: clean---# Design Team Meeting
Meeting with design team. Decided to use the Flux model for the image
pipeline. Launch target is March 15th. Budget approved for two additional GPUs.
**Tags:** gpu, budget, launch, infrastructure
Meetings (~/.gaia/voice/notes/meeting_007.md):
---id: 7type: meetingtitle: Q2 Planninglabels: [roadmap, npu, gpu, budget, action-items]category: planningduration: 31 minword_count: 4230created: 2026-02-27T14:01:00enhancement: structured---# Q2 Planning — Feb 27, 2026## Attendees
(auto-detected from transcript if speaker identification available)
## Discussion### NPU Optimization- Work is ahead of schedule
- Performance targets exceeded on Ryzen AI 300 series
### Infrastructure Budget- Approved two additional GPUs for inference cluster
- Sarah to handle procurement
## Action Items-[ ] Sarah: Prepare customer demo by March 10th
-[ ] Team: Finalize Q2 milestones by March 3rd
**Tags:** roadmap, npu, gpu, budget, action-items
Export Behavior
Auto-export: Every entry automatically saved as .md in ~/.gaia/voice/notes/
Sync: Database is source of truth; markdown files regenerated on edit
Batch export:gaia voice --export ./my-notes/ exports all entries as markdown
Custom template: Users can override the markdown template
Auto-Labeling and Categorization
The LLM automatically assigns labels and a category to each entry upon creation.
Label Generation
LLM prompt: "Given this note, generate 3-6 short labels (1-2 words each)
that capture the key topics. Return as comma-separated list."
Input: "Meeting with design team. Decided to use the Flux model..."
Output: "gpu, budget, launch, flux-model, design-team, infrastructure"
Category System
Predefined categories (LLM selects the best match):
Category
Description
Example
engineering
Technical decisions, code, architecture
"Decided to use Flux model"
planning
Roadmaps, timelines, milestones
"Q2 planning meeting"
action-items
Tasks, to-dos, assignments
"Sarah to prepare demo"
ideas
Brainstorming, feature proposals
"What if we added voice to SD agent"
reference
Facts, specs, documentation notes
"NPU supports INT8 quantization"
personal
Personal notes, reminders
"Pick up groceries"
other
Anything else
Fallback
Database Schema for Labels
CREATETABLElabels (
id INTEGERPRIMARY KEY AUTOINCREMENT,
name TEXTNOT NULL UNIQUE,
color TEXT, -- hex color for UI display
entry_count INTEGER DEFAULT 0
);
CREATETABLEentry_labels (
entry_id INTEGERREFERENCES entries(id),
label_id INTEGERREFERENCES labels(id),
PRIMARY KEY (entry_id, label_id)
);
Querying by Label
$ gaia voice --search --label gpu
#12 Design team meeting [gpu, budget, launch] Feb 27
#7 Q2 Planning [roadmap, npu, gpu, budget] Feb 27
$ gaia voice --search --category planning
#7 Q2 Planning [roadmap, npu, gpu, budget] Feb 27
#3 Sprint Retrospective [sprint, velocity, planning] Feb 21
Simple Web UI for Viewing Notes
A lightweight web viewer served by GAIA's existing FastAPI server, following the same HTML template pattern used by the summarize app (src/gaia/apps/summarize/templates/).
UI Features
┌──────────────────────────────────────────────────────────────────┐
│ GAIA Voice Notes 🔍 Search... │
├──────────────┬───────────────────────────────────────────────────┤
│ │ │
│ CATEGORIES │ # Design Team Meeting │
│ ───────────│ 📅 Feb 27, 2026 · memo · 42 words │
│ All (24) │ │
│ engineering │ Meeting with design team. Decided to use the │
│ planning │ Flux model for the image pipeline. Launch │
│ action-items│ target is March 15th. Budget approved for two │
│ ideas │ additional GPUs. │
│ reference │ │
│ │ Labels: gpu budget launch infrastructure │
│ LABELS │ │
│ ───────────│ ┌─────────┐ ┌──────────┐ ┌────────┐ │
│ gpu (3) │ │ ✏️ Edit │ │ 📋 Copy MD│ │ 🗑 Del │ │
│ budget (2) │ └─────────┘ └──────────┘ └────────┘ │
│ launch (2) │ │
│ npu (1) │───────────────────────────────────────────────── │
│ │ │
│ RECENT │ # Q2 Planning │
│ ───────────│ 📅 Feb 27, 2026 · meeting · 31 min │
│ Meeting #7 │ │
│ Memo #13 │ ## NPU Optimization │
│ Memo #12 │ - Work is ahead of schedule... │
│ │ │
└──────────────┴───────────────────────────────────────────────────┘
Implementation
File
Content
src/gaia/apps/voice/webui/index.html
Single-page app with sidebar (categories, labels, recent) + main content area (rendered markdown)
src/gaia/apps/voice/webui/style.css
Clean, minimal styling — dark/light mode
src/gaia/apps/voice/app.config.json
Electron app config (window size, dev server port)
src/gaia/api/voice_endpoints.py
FastAPI endpoints for the UI
REST Endpoints (served by GAIA API)
Method
Path
Description
GET
/api/voice/entries
List all entries (filterable by type, category, label)
GET
/api/voice/entries/{id}
Get single entry with rendered markdown
GET
/api/voice/entries/{id}/raw
Get raw markdown source
PUT
/api/voice/entries/{id}
Update entry (edit content, labels, category)
DELETE
/api/voice/entries/{id}
Delete entry
GET
/api/voice/labels
List all labels with counts
GET
/api/voice/categories
List categories with counts
GET
/api/voice/search?q=...&label=...&category=...
Search entries
GET
/api/voice/export
Export all entries as zip of markdown files
UI Technology
Pure HTML/CSS/JS — no React/build step required (matches summarize app pattern)
Markdown rendered client-side via lightweight library (e.g., marked.js)
Responsive layout for desktop and mobile
YAML frontmatter displayed as metadata badges
Access via:gaia voice --ui opens in browser, or load in Electron via app config
Demo Scenarios
Voice Memos
$ gaia voice
User: [speaks] "New memo um meeting with design team so we decided to use
the flux model for the image pipeline uh launch target is march 15th
and budget approved for two additional gpus"
Agent: [transcribes → LLM cleans → auto-labels → saves to DB + markdown]
"Saved memo #12 — Design team meeting
Labels: gpu, budget, launch, infrastructure
Category: engineering
Exported to ~/.gaia/voice/notes/memo_012.md"
Meeting Transcription
$ gaia voice --meeting "Q2 Planning"
Agent: Recording... (live captions displayed)
[14:01] "Welcome everyone. Today we're discussing the Q2 roadmap."
[14:15] "Budget approved for two additional GPUs."
[Ctrl+C to stop]
Agent: Meeting saved — 4,230 words, 31 minutes.
Labels: roadmap, npu, gpu, budget, action-items
Category: planning
Exported to ~/.gaia/voice/notes/meeting_007.md
$ gaia voice --list
TYPE ID TITLE CATEGORY LABELS DATE
meeting #7 Q2 Planning planning roadmap, npu, gpu, budget Feb 27
memo #13 Customer demo prep action-items demo, customer, preparation Feb 27
memo #12 Design team meeting engineering gpu, budget, launch Feb 27
$ gaia voice --search "GPU budget"
Meeting #7 [14:15]: "Budget approved for two additional GPUs..."
$ gaia voice --search --label gpu
#12 Design team meeting [gpu, budget, launch]
#7 Q2 Planning [roadmap, npu, gpu, budget]
$ gaia voice --export ./my-notes/
Exported 24 entries to ./my-notes/ (14 memos, 10 meetings)
Architecture
classVoiceAgent(Agent, DatabaseMixin):
"""Voice-first agent with LLM-enhanced transcription, auto-labeling, markdown export, database storage, and semantic search."""def__init__(self, db_path=".gaia/voice.db", notes_dir="~/.gaia/voice/notes", **kwargs):
super().__init__(**kwargs)
self.init_db(db_path)
self.notes_dir=Path(notes_dir).expanduser()
self.notes_dir.mkdir(parents=True, exist_ok=True)
ifnotself.table_exists("entries"):
self.execute(""" CREATE TABLE entries ( id INTEGER PRIMARY KEY AUTOINCREMENT, type TEXT NOT NULL, -- 'memo' or 'meeting' title TEXT, content_raw TEXT NOT NULL, -- original ASR output content TEXT NOT NULL, -- LLM-enhanced version content_markdown TEXT NOT NULL, -- full markdown with frontmatter category TEXT DEFAULT 'other', enhancement_mode TEXT DEFAULT 'clean', duration_seconds INTEGER, word_count INTEGER, created_at TEXT DEFAULT CURRENT_TIMESTAMP, updated_at TEXT DEFAULT CURRENT_TIMESTAMP ) """)
ifnotself.table_exists("segments"):
self.execute(""" CREATE TABLE segments ( id INTEGER PRIMARY KEY AUTOINCREMENT, entry_id INTEGER REFERENCES entries(id), timestamp_offset REAL, text_raw TEXT NOT NULL, text TEXT NOT NULL, speaker TEXT ) """)
ifnotself.table_exists("labels"):
self.execute(""" CREATE TABLE labels ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL UNIQUE, color TEXT, entry_count INTEGER DEFAULT 0 ) """)
ifnotself.table_exists("entry_labels"):
self.execute(""" CREATE TABLE entry_labels ( entry_id INTEGER REFERENCES entries(id), label_id INTEGER REFERENCES labels(id), PRIMARY KEY (entry_id, label_id) ) """)
ifnotself.table_exists("vocabulary"):
self.execute(""" CREATE TABLE vocabulary ( id INTEGER PRIMARY KEY AUTOINCREMENT, term TEXT NOT NULL UNIQUE, correction TEXT, context TEXT ) """)
You are a transcription enhancer. Given raw speech-to-text output, produce
clean, well-formatted text.
Rules:
- Remove filler words (um, uh, like, you know, so)
- Add proper punctuation and capitalization
- Fix obvious grammar errors while preserving the speaker's intent
- Correct known terms from the vocabulary list: {vocabulary}
- Do NOT add information that wasn't spoken
- Do NOT change the meaning or omit substantive content
Enhancement mode: {mode}
- clean: Fix errors, preserve structure
- structured: Fix errors + organize into sections/bullets
- verbatim: Return as-is (no changes)
Raw transcription:
{raw_text}
Labeling Prompt
Given this note, generate 3-6 short labels (1-2 words each) that capture
the key topics. Return as a JSON array of strings.
Note: {content}
Output: ["label1", "label2", ...]
Categorization Prompt
Classify this note into exactly one category.
Categories: engineering, planning, action-items, ideas, reference, personal, other
Note: {content}
Output: category_name
Summary
Create a Voice Agent that combines voice memo dictation, meeting transcription, and voice-based querying into a single domain agent. This agent handles all voice-first workflows — it records, transcribes, uses LLM to automatically clean and improve transcriptions (similar to Wispr Flow), exports notes as markdown, auto-labels and categorizes entries, stores in a database, answers questions about stored content, and provides a simple web UI for browsing and viewing notes. Uses Lemonade v9.4.1 streaming ASR, TTS, and reranking.
LLM-Powered Transcription Enhancement
Raw ASR output is noisy — filler words, missing punctuation, run-on sentences, misheard terms. The Voice Agent pipes every transcription through an LLM post-processing step before storage:
Enhancement Pipeline
What the LLM Fixes
Enhancement Modes
clean(default)structuredverbatimContext-Aware Enhancement
vocabularytableCommand Mode
Voice-edit stored content:
Markdown Note Export
All entries are stored both in the database AND exported as markdown files for portability and readability.
Markdown Output Format
Memos (
~/.gaia/voice/notes/memo_012.md):Meetings (
~/.gaia/voice/notes/meeting_007.md):Export Behavior
.mdin~/.gaia/voice/notes/gaia voice --export ./my-notes/exports all entries as markdownAuto-Labeling and Categorization
The LLM automatically assigns labels and a category to each entry upon creation.
Label Generation
Category System
Predefined categories (LLM selects the best match):
engineeringplanningaction-itemsideasreferencepersonalotherDatabase Schema for Labels
Querying by Label
Simple Web UI for Viewing Notes
A lightweight web viewer served by GAIA's existing FastAPI server, following the same HTML template pattern used by the summarize app (
src/gaia/apps/summarize/templates/).UI Features
Implementation
src/gaia/apps/voice/webui/index.htmlsrc/gaia/apps/voice/webui/style.csssrc/gaia/apps/voice/app.config.jsonsrc/gaia/api/voice_endpoints.pyREST Endpoints (served by GAIA API)
GET/api/voice/entriesGET/api/voice/entries/{id}GET/api/voice/entries/{id}/rawPUT/api/voice/entries/{id}DELETE/api/voice/entries/{id}GET/api/voice/labelsGET/api/voice/categoriesGET/api/voice/search?q=...&label=...&category=...GET/api/voice/exportUI Technology
marked.js)gaia voice --uiopens in browser, or load in Electron via app configDemo Scenarios
Voice Memos
Meeting Transcription
Browse UI
List, Search & Edit
Architecture
Agent Capabilities
.mdwith YAML frontmatter; batch export supportedDemo Deliverables
src/gaia/agents/voice/agent.pyVoiceAgent(Agent, DatabaseMixin)— full agent implementationsrc/gaia/agents/voice/prompts.pysrc/gaia/agents/voice/markdown.pysrc/gaia/apps/voice/webui/index.htmlsrc/gaia/apps/voice/webui/style.csssrc/gaia/apps/voice/app.config.jsonsrc/gaia/api/voice_endpoints.pysrc/gaia/cli.pygaia voicesubcommand with all flagsexamples/voice_agent_demo.mdtests/unit/test_voice_agent.pyWhat This Exercises
/audio/transcriptions) — new v9.4.1LLM Prompts
Enhancement Prompt
Labeling Prompt
Categorization Prompt
Dependencies
Acceptance Criteria
content_rawandcontent(enhanced) stored in databasegaia voice --export/voice--list,--search,--label,--category,--export,--ui