Release v1.0.0 -- Portable Multi-GPU Music Server · aivrar/portable-music-server

Portable Music Server v1.0.0

Portable multi-GPU music generation server for Windows. 8 AI models, gateway + worker architecture, 8-stage audio mastering pipeline, CLAP scoring, one-click install. No system Python, no Docker, no admin rights.

Setup

1. Download or clone this repository
2. Double-click install.bat
3. Install model environments from the GUI (Setup tab)
4. Download model weights (Setup tab → Download button)
5. Start the API server (API Server tab, or launcher.bat api)
6. Send requests to http://127.0.0.1:9150/api/music/{model}

install.bat automatically downloads and configures: embedded Python 3.10, portable Git, FFmpeg, eSpeak NG, and all gateway dependencies. Nothing touches your system.

8 Music Models

Model	Key Capability	VRAM
ACE-Step v1.5	Lyrics-to-song, DiT + 5Hz LLM, 51 languages, CoT	<4 GB
ACE-Step v1	Original ACE-Step pipeline	~8 GB
HeartMuLa 3B	Lyrics-to-music with RL optimization	~16 GB
DiffRhythm	Diffusion-based full-song generation with lyrics	8 GB
YuE	Chain-of-thought lyrics-to-song, two-stage codec	24 GB+
MusicGen	Meta's text-to-music (AudioCraft), melody conditioning	8-16 GB
Riffusion	Stable Diffusion fine-tuned for spectrograms	6-8 GB
Stable Audio Open	Stability AI's latent diffusion for audio	~8 GB

Architecture

Gateway (port 9150) — orchestrates generation, delegates inference to workers via HTTP
Workers (ports 9151-9249) — each runs one model on one GPU as an isolated subprocess
CLAP Scorer (port 9250) — optional audio-text similarity scoring micro-service
Each worker injects only its venv's site-packages — zero cross-environment conflicts
Same model can run multiple instances across GPUs for concurrent inference
Workers auto-spawn on first request, fail over to siblings, health-checked every 10s

Audio Pipeline

8-stage post-processing mastering applied per generation: denoise (noisereduce), highpass filter (scipy), multiband compression (scipy), stereo widening (numpy), parametric EQ (scipy), silence trimming (pydub), LUFS normalization (pyloudnorm), peak limiting. 13 configurable parameters. Each stage degrades gracefully if its library is unavailable.

Key Features

Multi-GPU — Pin workers to any NVIDIA GPU, run the same model on multiple GPUs simultaneously
35 REST API endpoints — Generation, worker management, install management, CLAP scoring, output library, GPU discovery
CLAP scoring — Audio-text similarity scoring via LAION CLAP to evaluate generation quality
Persistent output library — All generations auto-saved with metadata, browsable and exportable
Format export — WAV, MP3, OGG, FLAC via FFmpeg
GUI — 5-tab Tkinter interface (Setup, API Server, Testing, Outputs, Log) with right-click context menus
308 unit tests — All mocked, no GPU required to run

Requirements

Windows 10/11 (64-bit)
NVIDIA GPU with CUDA (any VRAM), or CPU-only
Internet connection (first run only, for downloading dependencies and model environments)
No admin rights needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0 -- Portable Multi-GPU Music Server

Choose a tag to compare

Sorry, something went wrong.