A vibe-coded Mix Assistant AI that provides "expert" audio engineering feedback:
- React (Vite) frontend: Upload audio, waveform playback with region selection (WaveSurfer), chat-based consultation
- FastAPI backend: Trims selected audio regions, generates Mel spectrograms, and provides AI-powered mixing advice
- Multi-model support: Choose between Gemini (with spectrogram analysis) or OpenAI GPT Audio models
- 🎵 Audio Analysis: Upload WAV, MP3, or FLAC files and select specific regions for analysis
- 📊 Spectrogram Generation: Visual frequency analysis to identify issues
- 🤖 AI Consultation: Get professional mixing and mastering advice from AI models
- 💬 Chat Interface: Follow-up conversations to drill deeper into specific issues
- 🎛️ Multiple Models: Support for Gemini 3, Gemini 2, and OpenAI GPT Audio models
cd backend
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
# source .venv/bin/activate
pip install -r requirements.txtCopy the example environment file and configure your API keys:
# Windows:
copy .env.example .env
# macOS/Linux:
cp .env.example .envThen edit .env with your API keys:
# Required for Gemini models (gemini-3-pro, gemini-3-flash, gemini-2.0, etc.)
GEMINI_API_KEY=your_gemini_api_key_here
# Required for OpenAI GPT Audio model
OPENAI_API_KEY=your_openai_api_key_here
# Optional: Only needed if FFmpeg is not in your system PATH
FFMPEG_PATH=Note: You only need to configure the API key(s) for the model(s) you plan to use.
FFmpeg is required for audio processing. Install it based on your OS:
Windows (WinGet):
winget install Gyan.FFmpegmacOS (Homebrew):
brew install ffmpegLinux (apt):
sudo apt install ffmpegIf FFmpeg is installed but the app can't find it, set the
FFMPEG_PATHvariable in your.envfile to point to the directory containingffmpeg.exe(Windows) orffmpegbinary.
uvicorn app:app --reload --port 8000Health check: Open http://localhost:8000/health
cd frontend
npm install
npm run devOpen http://localhost:5173
- Upload Audio: Select a WAV, MP3, or FLAC file
- Select Region: Click and drag on the waveform to select the section you want analyzed
- Choose Model: Select from available Gemini or OpenAI models
- Enter Prompt: Describe what you want feedback on (e.g., "Check the overall frequency balance")
- Start Analysis: Click the button to get AI-powered mixing advice
- Follow Up: Use the chat to ask follow-up questions about specific issues
| Model | Spectrogram | Thinking Mode | Best For |
|---|---|---|---|
| Gemini 3 Pro | ✅ | ✅ | Deep analysis with visual + audio |
| Gemini 3 Flash | ✅ | ✅ | Fast analysis with visual + audio |
| Gemini 2.0 Thinking | ✅ | ✅ | Complex problem-solving |
| Gemini 2.0 Flash | ✅ | ❌ | Quick responses |
| GPT Audio | ❌ | ❌ | Audio-only analysis |
- "Preview" generates only the spectrogram (no AI call) so you can see the visual first
- "Start Analysis" trims audio + generates spectrogram + sends to AI for comprehensive feedback
- Gemini models receive both the audio file and spectrogram image for analysis
- GPT Audio model receives only the audio (does not support image input)