Train a small audio classification CNN, serve it through Modal, and inspect predictions, waveforms, spectrograms, and convolutional feature maps from a minimal Next.js app.
Audion is an ML-first audio classification project. The repository keeps the model pipeline and the web interface separate so the local and Modal inference path can stay simple before adding more UI.
The ML side downloads FSD50K metadata from Hugging Face, prepares deterministic manifests, converts audio to fixed-size log-mel spectrograms, trains a residual CNN on Modal, and exposes a Modal FastAPI endpoint for inference.
The web app uploads one .wav file at a time, forwards it to the Modal endpoint through a server route, and renders the top predictions along with the waveform, input spectrogram, and selected convolutional activations.
audion/
ml/ Python ML pipeline, Modal training, and Modal inference endpoint
app/ Next.js app and API bridge for running inference from the browser
- Python 3.11
- Node.js 20 or newer
- pnpm
- Modal account and CLI authentication
- Hugging Face access for the
Fhrozen/FSD50kdataset
cd ml
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txtcd app
pnpm installRun these from the ml/ directory:
python bootstrap.py
python dataset.py
python subsets.py
python preprocess.py
python model.py
python overfit.py
python -m modal run train.py
python -m modal deploy infer.pyThe scripts are intentionally standalone:
bootstrap.pyprepares the local directory layout.dataset.pydownloads FSD50K label metadata and writesartifacts/labels.json.subsets.pycreates deterministic tiny and full manifests.preprocess.pycreates mono 16 kHz, 4 second log-mel inputs.model.pydefinesAudionCNNand saves an initial checkpoint.overfit.pyruns a local tiny overfit check.train.pytrains on Modal and stores artifacts in Modal volumes.infer.pydeploys a Modal FastAPI endpoint for.wavinference.
Create app/.env.local, then run:
cd app
pnpm devOpen http://localhost:3000 and upload one .wav file. The app sends the file to /api/infer, which proxies the request to the configured Modal endpoint.
Useful web commands:
pnpm dev
pnpm build
pnpm lintAdd these to app/.env.local:
AUDION_MODAL_INFER_URL=
AUDION_MODAL_KEY=
AUDION_MODAL_SECRET=AUDION_MODAL_INFER_URL is required. AUDION_MODAL_KEY and AUDION_MODAL_SECRET are used when the Modal endpoint requires proxy authentication.
- Python
- PyTorch and torchaudio
- Hugging Face Hub
- FSD50K
- Modal
- FastAPI
- Next.js
- React
- TypeScript
- Tailwind CSS