A sophisticated, production-ready AI voice assistant application with multi-provider support, real-time voice conversation, advanced customization capabilities, and conversation templates.
- OpenAI (GPT-4, GPT-3.5 Turbo) - Built-in support, no API key required
- OpenRouter - Access to Mistral, Claude, Llama, and 200+ models
- Mistral AI - Support for normal and custom fine-tuned models
- Anthropic Claude - Claude 3.5 Sonnet, Opus, and Haiku
- Hume AI - Emotion-aware AI conversations
- Easy provider switching per conversation
- Custom model configuration with temperature control (0-2.0)
- Per-conversation system prompts
- Real-time Voice Recording with waveform visualization
- Voice Activity Detection (VAD) with configurable sensitivity (0-100%)
- Automatic Silence Detection with adjustable thresholds (500-3000ms)
- Speech-to-Text powered by:
- OpenAI Whisper (built-in, no API key required)
- Deepgram (with custom API key)
- Text-to-Speech via:
- ElevenLabs (multiple voice options)
- Hume AI (emotion-aware voices)
- Voice Call Mode - Continuous conversation with automatic turn detection
- Audio Playback Controls - Play/pause AI responses on demand
- TTS Speed Control (50-200%)
- Auto-play Toggle for AI responses
- Conversation Templates - Pre-configured AI settings for specific use cases:
- Creative Writing - High creativity (temp: 0.9) for storytelling
- Code Assistant - Precise (temp: 0.3) for programming help
- Business Advisor - Analytical (temp: 0.5) for strategy
- Personal Tutor - Patient (temp: 0.4) for learning
- Brainstorm Partner - Creative (temp: 0.85) for idea generation
- Quick Answers - Fast (temp: 0.2) with GPT-3.5 Turbo
- Per-Conversation Settings Menu - Customize provider, model, temperature, and system prompt without leaving the chat
- Export Conversations - Download as:
- JSON (full data with metadata)
- Markdown (formatted documentation)
- Plain Text (simple transcript)
- Conversation History with timestamps
- Thread Organization and management
- Delete Conversations with confirmation
- Voice Settings Tab:
- Silence threshold (500-3000ms)
- VAD sensitivity (0-100%)
- TTS speed (50-200%)
- Auto-play AI responses toggle
- Default STT provider selection
- Default TTS provider selection
- API Keys Tab - Manage provider credentials:
- OpenRouter API key
- Mistral AI API key
- Anthropic API key
- ElevenLabs API key
- Deepgram API key
- Hume AI API key
- Optional labels for identification
- Account Tab:
- User profile information
- Usage statistics
- Account settings
- Glassmorphism Design with dark gradient background
- Mobile-Responsive Layout with collapsible sidebar
- Keyboard Shortcuts:
Ctrl+Enter/Cmd+Enter- Send messageEnter- Send message (when not multiline)
- Real-time Message Streaming with markdown rendering
- Token Usage Display per message
- Loading States & Skeletons for smooth UX
- Typing Indicators during AI processing
- Error Handling with retry mechanisms
- Smooth Animations and transitions
- Message Bubbles with user/assistant distinction
- Node.js 22.x
- pnpm package manager
- MySQL/TiDB database (provided by platform)
-
Clone the repository
-
Install dependencies:
pnpm install
-
Configure environment variables (automatically injected by platform):
DATABASE_URL- Database connection stringJWT_SECRET- Session signing secretVITE_APP_ID- OAuth application IDBUILT_IN_FORGE_API_KEY- API key for built-in servicesBUILT_IN_FORGE_API_URL- Built-in services URL
-
Push database schema:
pnpm db:push
-
Start development server:
pnpm dev
Option 1: From Templates (Recommended)
- Click "Templates" button in sidebar
- Choose a template that matches your use case
- Start chatting immediately with optimized settings
Option 2: Manual Creation
- Click "New Conversation" button in sidebar
- Enter a title for your conversation
- Configure AI settings via the settings menu in chat header
Voice Recording:
- Click the microphone button in the chat input
- Speak your message clearly
- Recording stops automatically after silence is detected
- Transcription appears in the input field
- Review and edit if needed, then send
Voice Call Mode:
- Open a conversation
- Click "Voice Call" button in chat header
- Start speaking naturally when ready
- AI listens and responds automatically with voice
- Turn detection handles conversation flow
- Mute/unmute with microphone button
- Click red phone icon to end call
TTS Playback:
- Click the speaker icon next to any AI message
- Audio plays automatically after generation
- Click again to pause/resume
- Configure voice provider and speed in Settings
Adding API Keys:
- Go to Settings (gear icon in sidebar)
- Navigate to "API Keys" tab
- Select provider from dropdown
- Enter your API key
- Optional: Add a label for identification
- Click "Add API Key"
Supported Providers:
- OpenRouter - Access 200+ models from various providers
- Mistral AI - Mistral models including fine-tuned versions
- Anthropic - Claude 3.5 Sonnet, Opus, Haiku
- ElevenLabs - High-quality TTS with multiple voices
- Deepgram - Advanced STT with custom models
- Hume AI - Emotion-aware AI and TTS
Per-Conversation Settings:
- Open a conversation
- Click the settings icon (βοΈ) in chat header
- Choose AI provider and model
- Adjust temperature (0 = deterministic, 2 = creative)
- Add custom system prompt (optional)
- Click "Save Settings"
- Open the conversation you want to export
- Click the download icon (β¬οΈ) in chat header
- Choose format:
- JSON - Full data including metadata, timestamps, token counts
- Markdown - Formatted for documentation with headers
- Text - Simple plain text transcript
Ctrl+EnterorCmd+Enter- Send messageEnter- Send message (in single-line mode)Esc- Close dialogs and modals
Frontend:
- React 19 with TypeScript
- Tailwind CSS 4 with custom design system
- shadcn/ui components
- tRPC for type-safe API calls
- Wouter for routing
- Streamdown for markdown rendering
- Web Audio API for voice processing
Backend:
- Express 4
- tRPC 11
- Drizzle ORM
- MySQL/TiDB database
- OpenAI API (GPT-4, Whisper)
- Multi-provider LLM integration
Voice Processing:
- MediaRecorder API for audio capture
- Web Audio API for analysis
- Voice Activity Detection (VAD)
- Silence detection algorithms
- Real-time audio streaming
- users - User accounts and authentication
- id, openId, name, email, role, timestamps
- conversations - Chat sessions with AI settings
- id, userId, title, llmProvider, llmModel, temperature, systemPrompt, timestamps
- messages - Individual messages with metadata
- id, conversationId, role, content, audioUrl, provider, model, tokenCount, timestamps
- userSettings - User preferences and defaults
- userId, silenceThreshold, vadSensitivity, ttsSpeed, autoPlayTts, defaultLlmProvider, defaultLlmModel, defaultSttProvider, defaultTtsProvider
- providerConfigs - API keys and provider settings
- id, userId, provider, apiKey, label, isActive, timestamps
- voiceProfiles - Voice customization settings
- id, userId, name, provider, voiceId, settings, timestamps
- usageTracking - Token and audio usage statistics
- id, userId, conversationId, provider, model, tokensUsed, audioMinutes, cost, timestamps
The application features a modern glassmorphism design with:
- Dark gradient background (purple to blue)
- Translucent panels with backdrop blur
- Smooth animations and micro-interactions
- Accessible color contrast for readability
- Mobile-first responsive layout
- Consistent spacing and typography
- Visual hierarchy with shadows and borders
- Encrypted API Keys - All provider credentials encrypted at rest
- Session-based Authentication with JWT
- No Data Sharing - Your data stays private
- User Data Isolation - Each user's data is separate
- HTTPS Enforced - All connections encrypted
- Input Validation - Protection against injection attacks
- Rate Limiting - Protection against abuse (future)
- Lazy Loading - Components loaded on demand
- Optimistic UI Updates - Instant feedback
- Efficient Re-rendering - React hooks optimization
- Database Query Optimization - Indexed queries
- Response Streaming - Real-time AI responses
- Audio Processing - Efficient VAD algorithms
- Caching Strategy - Reduced API calls (future)
- Chrome/Edge 90+
- Firefox 88+
- Safari 14+
- Mobile browsers with WebRTC support
This is a private project. For issues or feature requests, please contact the project owner.
Proprietary - All rights reserved
- OpenAI - GPT models and Whisper API
- Anthropic - Claude models
- Mistral AI - Mistral models
- ElevenLabs - Voice synthesis
- Deepgram - Speech recognition
- Hume AI - Emotion-aware AI
- Manus Platform - Infrastructure and deployment
- shadcn/ui - UI component library
Version: 2.0.0
Last Updated: November 2025
Built with β€οΈ using Manus AI