A macOS desktop app that records meetings, transcribes them, and produces structured notes β with action items pushed directly as GitHub issues and notes committed to a git repository.
Built with Tauri + Svelte 5 on the frontend, Rust on the backend.
- Records two audio streams in parallel β local mic and remote source (system audio via BlackHole) β with live dual VU meters
- Transcribes live both streams in real time using local whisper.cpp with Metal GPU acceleration β no audio leaves your machine. VAD (Silero) detects speech boundaries and feeds chunks to Whisper as you speak, so the transcript appears during the meeting.
- Merges the two transcripts into a single time-sorted conversation, each segment labelled with the correct speaker name (one stream = one speaker, no diarization needed)
- Summarizes with Claude Sonnet (Anthropic) or any Together.ai chat model, using a context you define (meeting type, participants, domain vocabulary)
- Exports structured Markdown notes (
YYYY-MM-DD_HHmm_Context.md) to a configurable folder - Commits the notes to a git repository automatically (optional)
- Creates GitHub issues for every action item extracted from the notes (optional)
Privacy: All audio is processed locally on-device. Only the final text transcript is sent to the cloud for summarization (Anthropic or Together.ai). A Groq Whisper cloud fallback is available in settings if preferred.
Each meeting produces two files:
Meetings/
2026-03-20_1430_welqin.md β structured notes
2026-03-20_1430_welqin_transcript.md β raw transcript with named speakers
The notes follow a consistent structure:
## Participants
## Summary
## Key Discussion Points
## Decisions Made
## Action Items
- [ ] **David**: Review the API design document
- [ ] **Yannick**: Set up CI pipeline for stagingAction items are automatically created as GitHub issues with the meeting-action label.
- macOS Apple Silicon (M1 or later β Metal GPU required for local Whisper)
- BlackHole 2ch β virtual audio driver for capturing system audio (see setup guide)
cmakeβ required to build whisper.cpp (brew install cmake)- One of (for summarization):
- Anthropic API key β to use Claude Sonnet
- Together.ai API key β to use open-source models (Llama, etc.)
ghCLI installed and authenticated β for GitHub issues (optional)gitβ for committing notes (optional)- Groq API key β only if using cloud transcription fallback (optional)
The Whisper model (
large-v3-turbo, quantized q5_0, ~550 MB) is downloaded automatically on first launch to~/.sidekick2000/models/.
npm installLaunch the app and click the gear icon in the top-right corner. All settings are stored in ~/.sidekick2000/settings.json.
| Tab | What to configure |
|---|---|
| API Keys | Transcription mode (Local Whisper or Groq); summarization provider (Claude or Together.ai) and the corresponding API key/model |
| Devices | Local mic device + your speaker name; remote source device + remote speaker name |
| Repository | Working folder (git root), meetings subfolder, GitHub repo (owner/repo), default language, pipeline step toggles |
| Contexts | Meeting context templates β instructions that shape how the AI summarizes each meeting type |
| Speakers | Default meeting attendees pre-loaded at startup (for AI context) |
API keys set in Settings take priority over environment variables. You can still use a
.envfile as fallback.
npm run tauri devnpm run tauri buildContexts are the core of Sidekick2000's flexibility. Each context is a Markdown document that gives Claude background knowledge about a meeting type: who the participants are, domain vocabulary, and how to structure the notes.
Examples of contexts you might create:
- General β neutral instructions, works for any meeting
- Product review β focus on decisions, feature requests, backlog items
- Client call β highlight commitments, risks, next steps
- Training session β track exercises, Q&A, shortcuts mentioned
Contexts are managed entirely in the Settings UI (no external files needed).
Record local mic ββββββββββββββββββ (parallel, ring buffers)
Record remote source ββββββββββββββββ€
β
βββββββββββββββββ΄ββββββββββββββββ
βΌ βΌ
Worker thread Worker thread
(drain every 200ms) (drain every 200ms)
β β
βΌ βΌ
VAD (Silero) VAD (Silero)
silence β₯ 300ms β flush silence β₯ 300ms β flush
β β
βΌ βΌ
Whisper (Metal) Whisper (Metal)
whisper.cpp local whisper.cpp local
β β
βΌ βΌ
emit "live-segment" emit "live-segment"
β frontend display β frontend display
βββββββββββββββββ¬ββββββββββββββββ
β
βΌ
Merge β sort by timestamp, speakers already known
β
βΌ
Summarize (Claude Sonnet or Together.ai β skipped if disabled)
β
βΌ
Export YYYY-MM-DD_HHmm_Context.md
β
βΌ
Git commit (if enabled and working folder configured)
β
βΌ
Create GitHub issues (if enabled and repo configured)
~/.sidekick2000/settings.json β created automatically on first save.
{
"transcription_mode": "LocalWhisper",
"groq_api_key": "",
"anthropic_api_key": "sk-ant-...",
"together_ai_api_key": "",
"summarization_provider": "claude",
"together_ai_model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
"default_input_device": "MacBook Pro Microphone",
"local_speaker_name": "David",
"remote_device": "BlackHole 2ch",
"remote_speaker_name": "Remote",
"working_folder": "/Users/you/my-repo",
"github_repo": "owner/repo",
"meetings_subfolder": "Meetings",
"default_language": "fr",
"enable_summary": true,
"enable_git_commit": true,
"enable_github_issues": true,
"default_speakers": [
{ "name": "Alice", "organization": "Acme" }
],
"contexts": [
{
"id": "general",
"label": "General",
"content": "Be factual. Group by theme. Use professional tone."
}
]
}| Layer | Technology |
|---|---|
| UI | Svelte 5, Tailwind CSS 4 |
| Desktop shell | Tauri 2 |
| Backend | Rust (async with Tokio) |
| Transcription | Local whisper.cpp via whisper-rs (large-v3-turbo q5_0, Metal GPU). Groq Whisper API as optional cloud fallback. |
| VAD | Silero VAD via voice_activity_detector (ONNX) β detects speech/silence for live chunking |
| Summarization | Anthropic Claude Sonnet or Together.ai (configurable) |
| Speaker identification | Device-based β each stream has a pre-assigned speaker name |
| Audio capture | CPAL (two simultaneous input streams, ring buffers, shared t=0 origin) |
| GitHub integration | gh CLI |