A local full-stack project (backend Flask + Google STT + ElevenLabs TTS; frontend React + Vite + Supabase). This README explains how to set up, run, and deploy the project on your machine. It reflects your current development workflow (two terminal windows).
- Quick start
- Prerequisites
- Environment variables / secrets
- Project structure & important files
- Running (development)
- Building (frontend)
- Git & .gitignore notes
- Troubleshooting
- Production / deployment notes
Open two terminals.
cd D:\ai-voicebot-2\ai-voice-banking\backend
.\venv\Scripts\activate
pip install -r requirements.txt
python tts_server.pycd D:\ai-voicebot-2\ai-voice-banking\frontend
.\venv\Scripts\activate # (optional if you have a frontend venv)
npm install # only first time
npm run devYour backend exposes a WebSocket at
ws://localhost:5000/stream(this is used by the frontend).
- Python 3.10+ (same major version you develop with; Google client warns when using older unsupported versions)
- Node.js (v16+ recommended) and npm
- A Git client (for repo management)
- Google Cloud service account key for Speech-to-Text (JSON file)
- ElevenLabs API key + voice id
- (Optional) Supabase account & keys if you want the database features working
- (Optional) OpenAI API key for the assistant logic
Create .env files (not committed to Git). Example entries required by code:
ELEVEN_API_KEY=your_elevenlabs_api_key
ELEVEN_VOICE_ID=your_voice_id
# The Google key is supplied as a JSON file; the code uses GOOGLE_APPLICATION_CREDENTIALS to point to it
# Put the JSON file in backend/ and set the variable below (or set it in your system envs)
GOOGLE_APPLICATION_CREDENTIALS=stt_key.json
The backend code currently sets
GOOGLE_KEY = "stt_key.json"and then assignsos.environ["GOOGLE_APPLICATION_CREDENTIALS"] = GOOGLE_KEY. You can either set the system env as above or placestt_key.jsoninsidebackend/.
VITE_OPENAI_KEY=your_openai_key
VITE_SUPABASE_URL=https://your-supabase-url
VITE_SUPABASE_ANON_KEY=your-anon-key
The frontend also falls back to
import.meta.env.VITE_OPENAI_KEYinsrc/App.jsx.
ai-voice-banking/
├─ backend/
│ ├─ tts_server.py # main Flask + websockets + Google STT + ElevenLabs TTS
│ ├─ requirements.txt
│ └─ stt_key.json # your Google service key (keep private)
├─ frontend/
│ ├─ src/App.jsx # main React app and websocket client
│ ├─ package.json
│ └─ node_modules/
└─ .gitignore
Local file references (for quick inspection):
- Backend Websocket server:
file:///D:/ai-voicebot-2/ai-voice-banking/backend/tts_server.py - Frontend main app:
file:///D:/ai-voicebot-2/ai-voice-banking/frontend/src/App.jsx
- Frontend opens a WebSocket to
ws://localhost:5000/stream. - When the user speaks, the frontend captures microphone audio at 16 kHz, encodes it to base64, and sends chunks as
{ type: 'audio_input', data: <base64> }. - Backend receives chunks and feeds them into a Google STT streaming recognizer (runs in a separate thread). Final transcriptions are sent back to the client as
{ type: 'transcription', text: <transcript> }. - The frontend passes the transcribed text to an assistant (OpenAI) via REST (inside
getAIResponse). The assistant returns structured intent JSON. - If the assistant wants to speak back, the frontend sends
{ type: 'tts_request', text: '...' }to the websocket; backend uses ElevenLabs to stream audio chunks back as base64{ type:'audio_chunk', data: <base64> }and{ type:'audio_end' }when done. - Frontend collects base64 chunks and plays them as a single WAV blob.
-
Line endings warning: If Git prints
LF will be replaced by CRLF, it's harmless on Windows. Make surevenv/is in.gitignoreso you don't accidentally commit your virtual environment. -
Remove already-tracked venv (if accidentally added):
git rm -r --cached backend/venv
git rm -r --cached frontend/venv
git add .gitignore
git commit -m "Remove venv from repo"
git push-
CORS:
CORS(app)is enabled in the Flask backend. If you serve the frontend from a different host in the future, update CORS settings accordingly. -
Google STT: Keep the
stt_key.jsonsecret. The code currently setsos.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "stt_key.json"— ensure the file is present inbackend/or change the code to read from a secure path. -
Audio sample rate: The frontend uses an
AudioContextconfigured for16000Hz and the backend expectssample_rate_hertz=16000. Keep these consistent. -
Supabase: The frontend expects tables like
bank_accounts,bank_transactions, andbank_recipients. If you don’t have Supabase set up, the app will show onboarding and limited features.
-
WebSocket failed to connect: Ensure backend is running and listening on port 5000, and that the firewall allows local connections. The
serverStatusindicator in the UI helps debug if WS is connected or not. -
No transcription appears: Check that Google STT service account has
Speech-to-Textenabled and thatstt_key.jsonis correct. Monitor backend logs for thread exceptions. -
TTS audio not playing: Confirm frontend receives
audio_chunkmessages and thataudioChunksRefaccumulates data. If ElevenLabs responds with a streaming error, checkELEVEN_API_KEYandELEVEN_VOICE_ID. -
OpenAI errors: The frontend uses
VITE_OPENAI_KEY. If the assistant returns non-JSON text,getAIResponsetries to extract the first JSON object. Keep prompts conservative and test with simple messages first.