Skip to content

v1.0.0 -- Portable Multi-GPU Music Server

Latest

Choose a tag to compare

@aivrar aivrar released this 17 Feb 01:05
· 1 commit to master since this release

Portable Music Server v1.0.0

Portable multi-GPU music generation server for Windows. 8 AI models, gateway + worker architecture, 8-stage audio mastering pipeline, CLAP scoring, one-click install. No system Python, no Docker, no admin rights.

Setup

1. Download or clone this repository
2. Double-click install.bat
3. Install model environments from the GUI (Setup tab)
4. Download model weights (Setup tab → Download button)
5. Start the API server (API Server tab, or launcher.bat api)
6. Send requests to http://127.0.0.1:9150/api/music/{model}

install.bat automatically downloads and configures: embedded Python 3.10, portable Git, FFmpeg, eSpeak NG, and all gateway dependencies. Nothing touches your system.

8 Music Models

Model Key Capability VRAM
ACE-Step v1.5 Lyrics-to-song, DiT + 5Hz LLM, 51 languages, CoT <4 GB
ACE-Step v1 Original ACE-Step pipeline ~8 GB
HeartMuLa 3B Lyrics-to-music with RL optimization ~16 GB
DiffRhythm Diffusion-based full-song generation with lyrics 8 GB
YuE Chain-of-thought lyrics-to-song, two-stage codec 24 GB+
MusicGen Meta's text-to-music (AudioCraft), melody conditioning 8-16 GB
Riffusion Stable Diffusion fine-tuned for spectrograms 6-8 GB
Stable Audio Open Stability AI's latent diffusion for audio ~8 GB

Architecture

  • Gateway (port 9150) — orchestrates generation, delegates inference to workers via HTTP
  • Workers (ports 9151-9249) — each runs one model on one GPU as an isolated subprocess
  • CLAP Scorer (port 9250) — optional audio-text similarity scoring micro-service
  • Each worker injects only its venv's site-packages — zero cross-environment conflicts
  • Same model can run multiple instances across GPUs for concurrent inference
  • Workers auto-spawn on first request, fail over to siblings, health-checked every 10s

Audio Pipeline

8-stage post-processing mastering applied per generation: denoise (noisereduce), highpass filter (scipy), multiband compression (scipy), stereo widening (numpy), parametric EQ (scipy), silence trimming (pydub), LUFS normalization (pyloudnorm), peak limiting. 13 configurable parameters. Each stage degrades gracefully if its library is unavailable.

Key Features

  • Multi-GPU — Pin workers to any NVIDIA GPU, run the same model on multiple GPUs simultaneously
  • 35 REST API endpoints — Generation, worker management, install management, CLAP scoring, output library, GPU discovery
  • CLAP scoring — Audio-text similarity scoring via LAION CLAP to evaluate generation quality
  • Persistent output library — All generations auto-saved with metadata, browsable and exportable
  • Format export — WAV, MP3, OGG, FLAC via FFmpeg
  • GUI — 5-tab Tkinter interface (Setup, API Server, Testing, Outputs, Log) with right-click context menus
  • 308 unit tests — All mocked, no GPU required to run

Requirements

  • Windows 10/11 (64-bit)
  • NVIDIA GPU with CUDA (any VRAM), or CPU-only
  • Internet connection (first run only, for downloading dependencies and model environments)
  • No admin rights needed