██████╗ ██╗ ██╗ ██████╗ ███████╗████████╗
██╔════╝ ██║ ██║██╔═══██╗██╔════╝╚══██╔══╝
██║ ███╗███████║██║ ██║███████╗ ██║
██║ ██║██╔══██║██║ ██║╚════██║ ██║
╚██████╔╝██║ ██║╚██████╔╝███████║ ██║
╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝ ╚═╝
Real-Time Voice Sales Co-Pilot
Your AI doesn't coach you. It becomes you.
Sales reps lose deals in the objection moment — not because they don't know the answer, but because they freeze, fumble, or respond with the wrong frame under pressure.
Existing tools record calls and coach after the fact. Zero help when it matters.
When a prospect throws an objection during a live call, the rep holds SPACEBAR.
Ghost:
- Captures the mic stream in real time
- Transcribes the objection via ElevenLabs STT
- Classifies it into one of 5 types using NVIDIA Llama 3.3 70B
- Generates a response styled after an elite sales coach
- Speaks the response back in the rep's own cloned voice through the call
The prospect hears no gap. No robot. No fumble. Just the rep handling it perfectly.
Ghost was never there.
Prospect says: "I think the price is a bit high for us."
Rep holds SPACE → releases →
Ghost responds in rep's voice: "That tells me the value isn't fully clear yet. Let me ask — what would it be worth to you if this solved the problem in 30 days?"
Prospect never knew.
SPACE held
│
▼
Web Audio API (mic capture)
│
▼
POST /api/ghost/takeover
│
├─► ElevenLabs STT (scribe_v1)
│ └─► transcript string
│
├─► NVIDIA NIM Llama 3.3 70B
│ └─► { objection_type, confidence, response }
│
└─► ElevenLabs TTS (Turbo v2.5)
└─► audio/mpeg stream → browser → BlackHole → call
Sub-2-second pipeline. End to end.
| Layer | Technology |
|---|---|
| Frontend | React 18 + Vite + Tailwind CSS + Zustand |
| Backend | Node.js + Express |
| STT | ElevenLabs Scribe v1 |
| Voice Clone | ElevenLabs Instant Voice Clone |
| TTS | ElevenLabs Turbo v2.5 (streaming) |
| LLM | NVIDIA NIM — Llama 3.3 70B Instruct |
| Audio Routing | BlackHole 2ch (M1/M2 Mac) |
| Observability | PostHog LLM events |
Core Pipeline
- SPACEBAR hotkey trigger with key-repeat guard and ESC emergency cancel
- Real-time mic capture via Web Audio API
- ElevenLabs STT with direct
form-dataREST call - NVIDIA Llama 3.3 70B objection classifier — strict JSON schema output
- ElevenLabs Turbo v2.5 streaming TTS in the rep's cloned voice
- Audio routed to BlackHole → call app hears it as the rep's microphone
Voice Profiles
- Up to 3 rep profiles per workspace
- One-sentence voice clone onboarding (30 seconds of audio)
- Per-profile coach persona preference
3 Coach Personas
- Alex Hormozi — direct, urgency-first, value/ROI framing
- Chris Voss — FBI negotiation, tactical empathy, labeling
- Grant Cardone — high-energy, follow-up pressure, 10X close
5 Objection Types
- Stall · Price · Authority · Timing · Competitor
Phase 5 Hardening
- Pre-warm cache: all 5 objection types pre-generated at session start
- Client-side audio blob cache for instant playback
- Demo run-through checker with 6-step verification
- Graceful fallbacks at every external API boundary
Post-Call Debrief
- Full session transcript and objection log
- AI-generated 3-bullet call summary
- Latency metrics per objection
- JSON export
- Node.js 18+
- ElevenLabs API key — elevenlabs.io/app/settings/api-keys
- NVIDIA NIM API key — build.nvidia.com → any model → Get API Key
- BlackHole 2ch (M1/M2 Mac) —
brew install blackhole-2ch - Chrome browser (required for
setSinkIdaudio routing)
# Clone the repo
git clone https://github.com/yourusername/ghost-sales-copilot
cd ghost-sales-copilot
# Install all dependencies
npm install
cd client && npm install && cd ..
cd server && npm install && cd ..
# Set up environment
cp .env.example .envEdit .env:
ELEVENLABS_API_KEY=your_elevenlabs_key
NVIDIA_API_KEY=nvapi-your_nvidia_key
POSTHOG_API_KEY=your_posthog_key # optional# Start everything
npm run devOpen http://localhost:5173
- Open Audio MIDI Setup (Spotlight → Audio MIDI Setup)
+→ Create Multi-Output Device- Check: MacBook Speakers + BlackHole 2ch
- Right-click → Use This Device For Sound Output
- In your call app (Zoom/Meet): set Microphone → BlackHole 2ch
- In Ghost: sidebar → Audio Output → BlackHole 2ch
1. Open Ghost → create a voice profile (30s recording)
2. Click "Demo Check" → verify all 6 systems pass
3. Click "Launch Ghost" → you land on the live call page
4. Join your call in another tab
5. When prospect says an objection → hold SPACE
6. Speak the objection or let it come through mic
7. Release SPACE → Ghost processes and responds in your voice
8. ESC at any time to cancel
9. End Call → review the debrief
ghost/
├── client/ # React frontend
│ └── src/
│ ├── components/ # GhostOverlay, LiveCaptions, DemoMode, AudioDeviceSelector
│ ├── hooks/ # useGhostTakeover, useAudioRecorder
│ ├── pages/ # Dashboard, Onboarding, Call, Debrief
│ ├── stores/ # ghostStore, profileStore (Zustand)
│ └── lib/ # api.js
│
└── server/ # Node.js backend
└── src/
├── lib/ # stt.js, tts.js, classifier.js, posthog.js
└── routes/ # ghost.js, voice.js, debrief.js, engine.js
| Variable | Required | Description |
|---|---|---|
ELEVENLABS_API_KEY |
✅ | ElevenLabs API key for STT, TTS, voice clone |
NVIDIA_API_KEY |
✅ | NVIDIA NIM key for Llama 3.3 70B |
POSTHOG_API_KEY |
Optional | PostHog project key for observability |
NGROK_URL |
Optional | ngrok tunnel URL (future Speech Engine WS) |
PORT |
Optional | Server port (default: 3001) |
| Step | Target | Notes |
|---|---|---|
| STT transcription | ~400ms | ElevenLabs Scribe v1 REST |
| LLM classify + response | ~600ms | NVIDIA NIM Llama 3.3 70B |
| TTS first audio chunk | ~700ms | ElevenLabs Turbo v2.5 streaming |
| Audio routing | ~100ms | BlackHole near-zero |
| Total | ~1.8s | Pre-warm cache reduces to ~0.8s |
ElevenLabs Worldwide Hackathon #9
Built solo in 48 hours for the ElevenLabs Hack!
Ghost. The invisible closer.