Portable Music Server v1.0.0
Portable multi-GPU music generation server for Windows. 8 AI models, gateway + worker architecture, 8-stage audio mastering pipeline, CLAP scoring, one-click install. No system Python, no Docker, no admin rights.
Setup
1. Download or clone this repository
2. Double-click install.bat
3. Install model environments from the GUI (Setup tab)
4. Download model weights (Setup tab → Download button)
5. Start the API server (API Server tab, or launcher.bat api)
6. Send requests to http://127.0.0.1:9150/api/music/{model}
install.bat automatically downloads and configures: embedded Python 3.10, portable Git, FFmpeg, eSpeak NG, and all gateway dependencies. Nothing touches your system.
8 Music Models
| Model | Key Capability | VRAM |
|---|---|---|
| ACE-Step v1.5 | Lyrics-to-song, DiT + 5Hz LLM, 51 languages, CoT | <4 GB |
| ACE-Step v1 | Original ACE-Step pipeline | ~8 GB |
| HeartMuLa 3B | Lyrics-to-music with RL optimization | ~16 GB |
| DiffRhythm | Diffusion-based full-song generation with lyrics | 8 GB |
| YuE | Chain-of-thought lyrics-to-song, two-stage codec | 24 GB+ |
| MusicGen | Meta's text-to-music (AudioCraft), melody conditioning | 8-16 GB |
| Riffusion | Stable Diffusion fine-tuned for spectrograms | 6-8 GB |
| Stable Audio Open | Stability AI's latent diffusion for audio | ~8 GB |
Architecture
- Gateway (port 9150) — orchestrates generation, delegates inference to workers via HTTP
- Workers (ports 9151-9249) — each runs one model on one GPU as an isolated subprocess
- CLAP Scorer (port 9250) — optional audio-text similarity scoring micro-service
- Each worker injects only its venv's site-packages — zero cross-environment conflicts
- Same model can run multiple instances across GPUs for concurrent inference
- Workers auto-spawn on first request, fail over to siblings, health-checked every 10s
Audio Pipeline
8-stage post-processing mastering applied per generation: denoise (noisereduce), highpass filter (scipy), multiband compression (scipy), stereo widening (numpy), parametric EQ (scipy), silence trimming (pydub), LUFS normalization (pyloudnorm), peak limiting. 13 configurable parameters. Each stage degrades gracefully if its library is unavailable.
Key Features
- Multi-GPU — Pin workers to any NVIDIA GPU, run the same model on multiple GPUs simultaneously
- 35 REST API endpoints — Generation, worker management, install management, CLAP scoring, output library, GPU discovery
- CLAP scoring — Audio-text similarity scoring via LAION CLAP to evaluate generation quality
- Persistent output library — All generations auto-saved with metadata, browsable and exportable
- Format export — WAV, MP3, OGG, FLAC via FFmpeg
- GUI — 5-tab Tkinter interface (Setup, API Server, Testing, Outputs, Log) with right-click context menus
- 308 unit tests — All mocked, no GPU required to run
Requirements
- Windows 10/11 (64-bit)
- NVIDIA GPU with CUDA (any VRAM), or CPU-only
- Internet connection (first run only, for downloading dependencies and model environments)
- No admin rights needed