# üéµ YuE Music Generation with SGLang

This notebook runs YuE music generation using SGLang for faster inference with native CFG support.

**Requirements:**
- A100 GPU (recommended) or V100
- ~40GB GPU memory for Stage 1 (7B model)

## 1Ô∏è‚É£ Setup Environment

In [None]:
# Check GPU
!nvidia-smi

In [None]:
# Clone the repository
!git clone https://github.com/AdamZinebii/yue3.git
%cd yue3

In [None]:
# Install dependencies
!pip install -q torch torchaudio einops omegaconf soundfile tqdm transformers
!pip install -q sglang[all]
!pip install -q flash-attn --no-build-isolation

In [None]:
# Download xcodec model weights (if not included in repo)
# Uncomment if needed:
# !wget -P inference/xcodec_mini_infer/final_ckpt/ <xcodec_checkpoint_url>
# !wget -P inference/xcodec_mini_infer/decoders/ <decoder_weights_url>

## 2Ô∏è‚É£ Prepare Input Files

In [None]:
# Create genre file
genre_content = """pop, emotional, female vocals, melodic, upbeat"""

with open("genre.txt", "w") as f:
    f.write(genre_content)

print("Genre tags:")
print(genre_content)

In [None]:
# Create lyrics file
lyrics_content = """[verse]
Walking through the city lights
Dreaming of a brighter day
Every star that shines tonight
Guides me on my way

[chorus]
We can fly so high
Touch the endless sky
Nothing's gonna stop us now
We'll find a way somehow
"""

with open("lyrics.txt", "w") as f:
    f.write(lyrics_content)

print("Lyrics:")
print(lyrics_content)

## 3Ô∏è‚É£ Run Inference with SGLang

In [None]:
# Run SGLang-based inference
!python inference/infer_sglang.py \
    --genre_txt genre.txt \
    --lyrics_txt lyrics.txt \
    --output_dir ./output \
    --run_n_segments 2 \
    --stage2_batch_size 4

## 4Ô∏è‚É£ Play Generated Audio

In [None]:
import IPython.display as ipd
import os

# Find the output mix file
output_dir = "./output"
mix_files = []

for root, dirs, files in os.walk(output_dir):
    for file in files:
        if "mixed" in file and (file.endswith(".mp3") or file.endswith(".wav")):
            mix_files.append(os.path.join(root, file))

if mix_files:
    print(f"Found {len(mix_files)} mix file(s):")
    for f in mix_files:
        print(f"  - {f}")
    
    # Play the first mix
    print(f"\nPlaying: {mix_files[0]}")
    ipd.display(ipd.Audio(mix_files[0]))
else:
    print("No mix files found. Check the output directory.")
    !ls -la output/

In [None]:
# Download the generated audio
from google.colab import files

if mix_files:
    files.download(mix_files[0])

## üîß Alternative: Run Original Inference (transformers)

If SGLang has issues, use the original transformers-based inference:

In [None]:
# Uncomment to run original inference
# !python inference/infer.py \
#     --genre_txt genre.txt \
#     --lyrics_txt lyrics.txt \
#     --output_dir ./output_transformers