# Music Multiple - Style-Controlled Music Generation
Welcome to **Music Multiple**'s MusicGen-Style demo notebook. Here you will find self-contained examples of how to use style-controlled music generation within the Music Multiple ecosystem.

**Music Multiple** introduces advanced style conditioning for precise musical control.

In [1]:
# Music Multiple - Style Model Initialization
from audiocraft.models import MusicGen
from audiocraft.models import MultiBandDiffusion

USE_DIFFUSION_DECODER = False

model = MusicGen.get_pretrained('facebook/musicgen-style')
if USE_DIFFUSION_DECODER:
    mbd = MultiBandDiffusion.get_mbd_musicgen()

## Configuration
Next, let us configure the generation parameters. You can control:
* `use_sampling` (bool): Use sampling if True, else argmax decoding
* `top_k` (int): top_k for sampling
* `top_p` (float): top_p for sampling
* `temperature` (float): Softmax temperature parameter
* `duration` (float): Duration of generated waveform
* `cfg_coef` (float): Classifier free guidance coefficient
* `cfg_coef_beta` (float): Double CFG parameter for text conditioning boost

### Style Conditioner Parameters
* `eval_q` (int): Quantization level (1-6) - higher values pass more style information
* `excerpt_length` (float): Audio excerpt length for style extraction (1.5-4.5 seconds)

**Music Multiple Tip:** Use `cfg_coef_beta` to balance text vs style conditioning in combined generation.

In [None]:
# Music Multiple - Basic Generation Configuration
model.set_generation_params(
    use_sampling=True,
    top_k=250,
    duration=30
)

## Generation Modes
**Music Multiple** supports three advanced generation modes:
* **Text-to-Music**: Standard text conditioning
* **Style-to-Music**: Generate music matching a reference audio style
* **Text-and-Style-to-Music**: Combine text descriptions with audio style references

All modes use `model.generate_with_chroma`, with optional parameters for style conditioning.

### Text-to-Music

In [None]:
# Music Multiple - Text-Only Generation
from audiocraft.utils.notebook import display_audio

model.set_generation_params(
    duration=8, # generate 8 seconds, can go up to 30
    use_sampling=True, 
    top_k=250,
    cfg_coef=3., # Classifier Free Guidance coefficient 
    cfg_coef_beta=None, # double CFG is only useful for text-and-style conditioning
)

output = model.generate(
    descriptions=[
        '80s pop track with bassy drums and synth',
        '90s rock song with loud guitars and heavy drums',
        'Progressive rock drum and bass solo',
        'Punk Rock song with loud drum and power guitar',
        'Bluesy guitar instrumental with soulful licks and a driving rhythm section',
        'Jazz Funk song with slap bass and powerful saxophone',
        'drum and bass beat with intense percussions'
    ],
    progress=True, return_tokens=True
)
display_audio(output[0], sample_rate=32000)
if USE_DIFFUSION_DECODER:
    out_diffusion = mbd.tokens_to_wav(output[1])
    display_audio(out_diffusion, sample_rate=32000)

### Style-to-Music
Generate music that matches the style of a reference audio without text descriptions.

In [None]:
# Music Multiple - Style-Only Generation
import torchaudio
from audiocraft.utils.notebook import display_audio

model.set_generation_params(
    duration=8, # generate 8 seconds, can go up to 30
    use_sampling=True, 
    top_k=250,
    cfg_coef=3., # Classifier Free Guidance coefficient 
    cfg_coef_beta=None, # double CFG is only useful for text-and-style conditioning
)

model.set_style_conditioner_params(
    eval_q=1, # integer between 1 and 6
              # eval_q is the level of quantization that passes
              # through the conditioner. When low, the models adheres less to the 
              # audio conditioning
    excerpt_length=3., # the length in seconds that is taken by the model in the provided excerpt
    )

melody_waveform, sr = torchaudio.load("../assets/electronic.mp3")
melody_waveform = melody_waveform.unsqueeze(0).repeat(2, 1, 1)
output = model.generate_with_chroma(
    descriptions=[None, None], 
    melody_wavs=melody_waveform,
    melody_sample_rate=sr,
    progress=True, return_tokens=True
)
display_audio(output[0], sample_rate=32000)
if USE_DIFFUSION_DECODER:
    out_diffusion = mbd.tokens_to_wav(output[1])
    display_audio(out_diffusion, sample_rate=32000)

### Text-and-Style-to-Music
**Music Multiple Advanced Feature:** Combine text descriptions with audio style references using Double Classifier Free Guidance.

The double CFG formula:
$$l_{\text{double CFG}} = l_{\emptyset} + \alpha [l_{style} + \beta(l_{text, style} - l_{style}) - l_{\emptyset}]$$

Where $\beta > 1$ boosts text conditioning influence.

In [None]:
# Music Multiple - Combined Text and Style Generation
import torchaudio
from audiocraft.utils.notebook import display_audio

model.set_generation_params(
    duration=8, # generate 8 seconds, can go up to 30
    use_sampling=True, 
    top_k=250,
    cfg_coef=3., # Classifier Free Guidance coefficient 
    cfg_coef_beta=5., # double CFG is necessary for text-and-style conditioning
                   # Beta in the double CFG formula. between 1 and 9. When set to 1 
                   # it is equivalent to normal CFG. 
)

model.set_style_conditioner_params(
    eval_q=1, # integer between 1 and 6
              # eval_q is the level of quantization that passes
              # through the conditioner. When low, the models adheres less to the 
              # audio conditioning
    excerpt_length=3., # the length in seconds that is taken by the model in the provided excerpt
    )

melody_waveform, sr = torchaudio.load("../assets/electronic.mp3")
melody_waveform = melody_waveform.unsqueeze(0).repeat(3, 1, 1)

descriptions = ["8-bit old video game music", "Chill lofi remix", "80s New wave with synthesizer"]

output = model.generate_with_chroma(
    descriptions=descriptions,
    melody_wavs=melody_waveform,
    melody_sample_rate=sr,
    progress=True, return_tokens=True
)
display_audio(output[0], sample_rate=32000)
if USE_DIFFUSION_DECODER:
    out_diffusion = mbd.tokens_to_wav(output[1])
    display_audio(out_diffusion, sample_rate=32000)

## üéµ About Music Multiple - Style Control

**Music Multiple** provides advanced style conditioning for precise musical control:

### üé≠ Generation Modes
- **Text-to-Music**: Traditional text-based generation
- **Style-to-Music**: Extract and replicate audio style characteristics
- **Hybrid Generation**: Combine text and style for precise control

### ‚öôÔ∏è Style Conditioner Parameters
- **eval_q (1-6)**: Controls style adherence level
  - Lower values: More creative freedom
  - Higher values: Closer style matching
- **excerpt_length (1.5-4.5s)**: Audio segment used for style extraction

### üéõÔ∏è Double CFG Technology
- **Balanced Control**: Fine-tune text vs style influence
- **Mathematical Precision**: Advanced conditioning formula
- **Creative Flexibility**: Adjustable text emphasis

### üí° Usage Guidelines
- Start with `eval_q=1` and increase for stronger style matching
- Use `cfg_coef_beta=5` as starting point for hybrid generation
- Adjust `cfg_coef_beta` based on results:
  - Too much text adherence ‚Üí Decrease beta
  - Too much style adherence ‚Üí Increase beta
- Experiment with different excerpt lengths for varied style capture

### üéØ Professional Applications
- **Music Production**: Maintain consistent style across tracks
- **Content Creation**: Match specific audio aesthetics
- **Sound Design**: Precise control over musical characteristics
- **Creative Exploration**: Combine disparate styles and descriptions

*Part of the Music Multiple ecosystem - Advanced style-controlled music generation*