Records microphone audio in the browser, sends it to a Python ML pipeline, and classifies it as correct, flat, sharp, or off_rhythm. Results are pushed back to the frontend over WebSocket.
- Frontend: React (Create React App, runs at project root)
- Backend: Node.js with ES modules, Express, FFmpeg
- ML: Python, Wav2Vec2 (HuggingFace transformers), librosa, scikit-learn
- User records audio in the browser via microphone
- Audio blob is POSTed to the Node.js backend
- FFmpeg converts it to WAV
- Python extracts features using Wav2Vec2 (
facebook/wav2vec2-base) - A trained classifier labels the audio as
correct,flat,sharp, oroff_rhythm - Result is pushed to the frontend via WebSocket
First run: Wav2Vec2 downloads ~95MB of model weights on first analysis. Subsequent runs are fast.
Create backend/.env before starting the server. All fields have defaults but PORT and PYTHON_PATH should be set explicitly:
DATABASE_URL=your_neon_connection_string_here
PORT=4000
PYTHON_PATH=../.venv/Scripts/python.exe # Windows
# PYTHON_PATH=../.venv/bin/python # Mac/Linux
ML_SCRIPT_PATH=../ML/analyze.py
UPLOAD_DIR=./uploads
python -m venv .venv
# Windows
.venv\Scripts\activate
# Mac/Linux
source .venv/bin/activate
pip install -r ML/requirements.txtcd backend
npm install
npm run devnpm run dev uses Node's built-in --watch flag — no nodemon needed.
Use npm start for production.
npm install
npm startRuns on http://localhost:3000. Backend must be running on port 4000.