# 🎧 Text-to-Audio Generation with Stable Diffusion

This notebook demonstrates how to generate high-quality **audio from text prompts** using the `StableAudioPipeline` from [Stability AI](https://stability.ai/) via Hugging Face's 🤗 `diffusers` library.

---

## 📌 What This Notebook Does

- Authenticates with Hugging Face to access pre-trained models
- Installs required dependencies (`diffusers`, `torchsde`, etc.)
- Loads the `stable-audio-open-1.0` model
- Accepts a **text prompt** and generates a **10-second audio clip**
- Saves the audio as a `.wav` file locally

---

## 🧪 Example Prompt

```python
prompt = "Baby crying with louder noise."
negative_prompt = "Low quality."


### 🔹 Step 1: Check GPU Availability
Make sure a GPU is available for fast inference.

In [1]:
!nvidia-smi

Sat Jul 26 08:37:03 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   40C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

🔹 Step 2: Login to Hugging Face

Login to access pre-trained models.

In [2]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

🔹 Step 3: Install Required Libraries

Install everything needed, including Gradio.

In [3]:
!pip install torchsde



In [4]:
!pip install -U diffusers



🔹 Step 4: Load the Stable Audio Model
Load the stable-audio-open-1.0 model from Hugging Face.

🔹 Step 5: Define a Prompt
Specify the text input for the audio you want to generate.

🔹 Step 6: Generate the Audio
Run the diffusion model with a fixed seed for reproducibility.

🔹 Step 7: Save the Audio File
Convert the generated tensor into a .wav file.

In [5]:
import torch
import soundfile as sf
from diffusers import StableAudioPipeline

# Load the stable-audio-open-1.0 model
pipe = StableAudioPipeline.from_pretrained("stabilityai/stable-audio-open-1.0", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

# Define the Prompts
prompt = "Baby crying with Louder Noise."
negative_prompt = "Low quality."

# set the seed for generator
generator = torch.Generator("cuda").manual_seed(0)

# Generate the Audio
audio = pipe(
    prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=200,
    audio_end_in_s=10.0,
    num_waveforms_per_prompt=3,
    generator=generator,
).audios

#Save the Audio File
output = audio[0].T.float().cpu().numpy()
sf.write("Baby Crying.wav", output, pipe.vae.sampling_rate)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

transformer/diffusion_pytorch_model.safe(…):   0%|          | 0.00/4.23G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/4.85G [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

  WeightNorm.apply(module, name, dim)


  0%|          | 0/200 [00:00<?, ?it/s]



🔹 Step 8: 🎛️ Add Gradio Interface

Create a simple web UI to input text and generate audio.

In [6]:
!pip install gradio



In [7]:
import torch
import soundfile as sf
import numpy as np
import gradio as gr
from diffusers import StableAudioPipeline

# Load the Stable Audio pipeline (only once)
pipe = StableAudioPipeline.from_pretrained(
    "stabilityai/stable-audio-open-1.0", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

# Function to generate audio
def generate_audio(prompt, negative_prompt, duration, seed):
    generator = torch.Generator("cuda").manual_seed(int(seed))

    result = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_inference_steps=200,
        audio_end_in_s=float(duration),
        num_waveforms_per_prompt=1,
        generator=generator
    )

    audio = result.audios[0].T.float().cpu().numpy()
    file_path = "output.wav"
    sf.write(file_path, audio, pipe.vae.sampling_rate)
    return file_path

# Gradio Interface
interface = gr.Interface(
    fn=generate_audio,
    inputs=[
        gr.Textbox(label="Prompt", placeholder="e.g., A baby crying loudly"),
        gr.Textbox(label="Negative Prompt", value="Low quality"),
        gr.Slider(2, 20, value=10, step=1, label="Duration (seconds)"),
        gr.Number(label="Seed", value=42)
    ],
    outputs=gr.Audio(type="filepath", label="Generated Audio"),
    title="🎧 Text-to-Audio using Stable Diffusion",
    description="Generate high-quality audio clips from text prompts using Stability AI's Stable Audio model.",
)

interface.launch()


Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

  WeightNorm.apply(module, name, dim)


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://a63b88bffb7ae94a09.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


