# **1. Environment Setup**
Before generating audio, the environment must be properly configured. This includes mounting Google Drive to enable file access and storage, as well as installing the necessary Python packages for audio generation and model execution.


In [None]:
from google.colab import drive
# Mount Google Drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
pip install --upgrade diffusers transformers accelerate


Collecting transformers
  Downloading transformers-4.50.0-py3-none-any.whl.metadata (39 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=

# **2. Load AudioLDM2 Model**
The AudioLDM2 model from the HuggingFace diffusers library is used to generate audio from text prompts. The pipeline is initialized with pretrained weights and configured to use GPU acceleration via CUDA for improved performance.

In [None]:
from diffusers import AudioLDM2Pipeline
import torch

repo_id = "cvssp/audioldm2"
pipe = AudioLDM2Pipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model_index.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Fetching 26 files:   0%|          | 0/26 [00:00<?, ?it/s]

model.safetensors:   0%|          | 0.00/776M [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/4.74M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/173 [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/507 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/498M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/766 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.36G [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/902 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.35k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.51k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/559 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/1.39G [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/222M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/801 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/494 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/221M [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/11 [00:00<?, ?it/s]

# **3. Prepare Output Directories**
We create two folders in Google Drive to store the generated audio samples:

*   Healthy coughs
*   Unhealthy coughs

These directories are used later for saving the output .wav files.



In [None]:
import os

# Define output directories for fake coughs
generated_healthy_dir = "/content/drive/MyDrive/AML/generated_healthy_audioldm"
generated_unhealthy_dir = "/content/drive/MyDrive/AML/generated_unhealthy_audioldm"

# Create directories if they don't exist
os.makedirs(generated_healthy_dir, exist_ok=True)
os.makedirs(generated_unhealthy_dir, exist_ok=True)

print("✅ Paths set successfully!")


✅ Paths set successfully!


# **4. Define Prompts for Audio Generation**
To generate realistic audio samples, diverse text prompts are defined to guide the AudioLDM2 model. These prompts are designed to simulate different types of cough sounds:

*   Healthy Prompts simulate harmless, normal coughs such as throat-clearing or light single coughs.
*   Unhealthy Prompts simulate harsh or painful coughs associated with illness or respiratory issues.

Only high-quality and distinctive prompts are retained, while vague or ambiguous ones are commented out to enhance the reliability of generated samples.

In [None]:
import random

# Define diverse prompts for healthy and unhealthy coughs
healthy_prompts = [
    # "A person coughing normally, without illness.", # bad
    # "A short, dry cough from a healthy person.", # bad
    "A normal human cough with no background noise.", # good
    # "A mild cough from a person who is not sick.",
    # "A gentle throat-clear sound, indicating a minor cough.",
    # "A soft, single cough from a person in a quiet room.",
    "A casual throat-clearing cough with no health issues.", # Good prompt
    "A brief, controlled cough from a normal individual.", # Good
    "A quick, light cough that sounds completely healthy.", # Good
    "A shallow, harmless cough from someone without any infection.",
    "A light cough, similar to someone clearing their throat naturally.", # Good
    # "A quiet, polite cough from a person in a formal setting.",
    # "A momentary, barely noticeable cough from a healthy adult.",
    "A single, subtle cough that is completely non-threatening.", # Good prompt
    # "A normal cough from a person who just drank cold water.",
    # "A light, habitual cough from someone in a normal conversation.",
    # "A sharp but non-threatening cough from a healthy individual.",
    # "A short and crisp cough from a speaker before a speech.",
    "A slight throat irritation leading to a single harmless cough.", # Good
    # "A mild and momentary cough caused by slight dust inhalation."
]

unhealthy_prompts = [
    # "A rough, dry cough with a raspy sound.",
    # "A deep, chesty cough with mucus buildup.",
    # "A long, intense coughing fit with gasping.",
    "A harsh and painful cough from a sick person.",
    # "A short, wheezy cough followed by deep breathing.",
    # "A loud, forceful cough with a strained throat.",
    # "A dry, persistent cough that sounds irritated.",
    # "A hoarse, wet cough from a congested person.",
    # "A rough, crackling cough with difficulty breathing.",
    # "A deep, heavy cough that doesn't stop quickly.",
    "A harsh cough with a sore throat sound.", # good
    # "A long, wheezing cough from a struggling person.",
    # "A tight, dry cough with an uncomfortable tone.",
    # "A painful-sounding cough with chest discomfort.",
    # "A rattling, rough cough from a sick patient.",
    # "A persistent, hacking cough with difficulty breathing.",
    # "A hoarse, breathy cough from an exhausted person.",
    # "A severe, choking cough with an intense sound.",
    # "A deep, wet cough that lingers for a few seconds.",
    # "A dry, strained cough followed by a deep inhale."
]


### **5. Generate Fake Audio Samples**

A custom function `generate_fake_audio()` is defined to synthesize realistic cough sounds using the AudioLDM2 pipeline. The function performs the following tasks:

- Ensures the output directory exists.
- Appends new audio files without overwriting existing ones by tracking the latest file index.
- Randomly selects prompts from the predefined list.
- Applies a seed for reproducibility using PyTorch’s generator.
- Generates 5-second audio clips using the selected prompt and a defined negative prompt.
- Converts the audio to 16-bit PCM format (`int16`) for WAV storage.
- Saves each audio sample with a unique filename in the specified directory.

A detailed **negative prompt** is used to guide the model away from producing undesirable audio characteristics, such as:
> *"Low quality, distorted, unnatural, robotic, synthetic, non-human sounds, background noise, static, and unrealistic tones."*

The generation is triggered at the end of the script using the following call:

```python
generate_fake_audio(generated_unhealthy_dir, "unhealthy", unhealthy_prompts, negative_prompt, num_samples=200)


In [None]:
import os
import random
import scipy.io.wavfile as wav
import torch

# Function to generate fake audio using AudioLDM2Pipeline without overwriting existing files
def generate_fake_audio(output_dir, label, prompts, negative_prompt, num_samples=200):
    os.makedirs(output_dir, exist_ok=True)  # Ensure output directory exists

    # Get all existing files and find the last index
    existing_files = [f for f in os.listdir(output_dir) if f.startswith(f"generated_{label}_") and f.endswith(".wav")]
    existing_indices = [int(f.split("_")[-1].split(".")[0]) for f in existing_files if f.split("_")[-1].split(".")[0].isdigit()]
    start_index = max(existing_indices, default=-1) + 1  # Start from the next available index

    for i in range(start_index, start_index + num_samples):
        # Select a random prompt from the list
        prompt = random.choice(prompts)

        # Set a seed for reproducibility
        generator = torch.Generator("cuda").manual_seed(i)

        # Generate fake audio (5 seconds long)
        audio = pipe(
            prompt,
            negative_prompt=negative_prompt,
            num_inference_steps=200,
            audio_length_in_s=5.0,
            num_waveforms_per_prompt=1,
            generator=generator
        ).audios[0]

        # Convert to int16 format for WAV saving
        audio = (audio * 32767).astype("int16")

        # Save the generated fake audio with a unique index
        output_audio_path = os.path.join(output_dir, f"generated_{label}_{i}.wav")
        wav.write(output_audio_path, rate=16000, data=audio)

        print(f"✅ Generated fake 5s audio: {output_audio_path} using prompt: {prompt}")

print("🎉 Function `generate_fake_audio` is now set to append new audios without replacing old ones!")


🎉 Function `generate_fake_audio` is now set to append new audios without replacing old ones!


In [None]:
import scipy
# Define prompts for healthy and unhealthy cough sounds
# healthy_prompt = "A person coughing normally, without illness."
# unhealthy_prompt = "A person coughing with a respiratory infection."

# Define a common negative prompt
# negative_prompt = "Low quality, distorted, unrealistic."
negative_prompt = (
    "Low quality, distorted, unnatural, robotic, synthetic, "
    "non-human sounds, whispers, music, talking, background noise, "
    "breathing, barking, wind, mechanical, static, radio interference, "
    "electronic buzzing, echoing, unrealistic tones, overlapping voices, "
    "overprocessed sound, unnatural pitch shifts, muffled audio, weird artifacts"
)


# Generate 200 fake healthy and unhealthy audios
# generate_fake_audio(generated_healthy_dir, "healthy", healthy_prompts, negative_prompt, num_samples=50)
generate_fake_audio(generated_unhealthy_dir, "unhealthy", unhealthy_prompts, negative_prompt, num_samples=200)

print("🎉 ✅ Fake audio generation completed successfully!")


  0%|          | 0/200 [00:00<?, ?it/s]

✅ Generated fake 5s audio: /content/drive/MyDrive/AML/generated_unhealthy_audioldm/generated_unhealthy_616.wav using prompt: Unhealthy cough


  0%|          | 0/200 [00:00<?, ?it/s]

✅ Generated fake 5s audio: /content/drive/MyDrive/AML/generated_unhealthy_audioldm/generated_unhealthy_617.wav using prompt: Unhealthy cough


  0%|          | 0/200 [00:00<?, ?it/s]

✅ Generated fake 5s audio: /content/drive/MyDrive/AML/generated_unhealthy_audioldm/generated_unhealthy_618.wav using prompt: Unhealthy cough


  0%|          | 0/200 [00:00<?, ?it/s]

✅ Generated fake 5s audio: /content/drive/MyDrive/AML/generated_unhealthy_audioldm/generated_unhealthy_619.wav using prompt: Unhealthy cough


  0%|          | 0/200 [00:00<?, ?it/s]

✅ Generated fake 5s audio: /content/drive/MyDrive/AML/generated_unhealthy_audioldm/generated_unhealthy_620.wav using prompt: Unhealthy cough


  0%|          | 0/200 [00:00<?, ?it/s]

✅ Generated fake 5s audio: /content/drive/MyDrive/AML/generated_unhealthy_audioldm/generated_unhealthy_621.wav using prompt: Unhealthy cough


  0%|          | 0/200 [00:00<?, ?it/s]

✅ Generated fake 5s audio: /content/drive/MyDrive/AML/generated_unhealthy_audioldm/generated_unhealthy_622.wav using prompt: Unhealthy cough


  0%|          | 0/200 [00:00<?, ?it/s]

✅ Generated fake 5s audio: /content/drive/MyDrive/AML/generated_unhealthy_audioldm/generated_unhealthy_623.wav using prompt: Unhealthy cough


  0%|          | 0/200 [00:00<?, ?it/s]

✅ Generated fake 5s audio: /content/drive/MyDrive/AML/generated_unhealthy_audioldm/generated_unhealthy_624.wav using prompt: Unhealthy cough


  0%|          | 0/200 [00:00<?, ?it/s]

✅ Generated fake 5s audio: /content/drive/MyDrive/AML/generated_unhealthy_audioldm/generated_unhealthy_625.wav using prompt: Unhealthy cough


  0%|          | 0/200 [00:00<?, ?it/s]

✅ Generated fake 5s audio: /content/drive/MyDrive/AML/generated_unhealthy_audioldm/generated_unhealthy_626.wav using prompt: Unhealthy cough


  0%|          | 0/200 [00:00<?, ?it/s]

✅ Generated fake 5s audio: /content/drive/MyDrive/AML/generated_unhealthy_audioldm/generated_unhealthy_627.wav using prompt: Unhealthy cough


  0%|          | 0/200 [00:00<?, ?it/s]

✅ Generated fake 5s audio: /content/drive/MyDrive/AML/generated_unhealthy_audioldm/generated_unhealthy_628.wav using prompt: Unhealthy cough


  0%|          | 0/200 [00:00<?, ?it/s]

KeyboardInterrupt: 

### **6. Count and Review Generated Audio Files**

After audio generation, it is important to verify the number of files created for each class. A utility function is used to count the `.wav` files present in the specified directories for both healthy and unhealthy cough sounds.

This step ensures that the expected number of healthy and unhealthy samples has been generated and stored correctly.

The output displays the total number of audio files in each category:

- ✅ Healthy audio files  
- ✅ Unhealthy audio files  

📌 **Next step recommendation**:  
Manually review the generated audio samples to ensure sound quality. Remove or flag any clips that are distorted, unrealistic, or otherwise unusable before proceeding to model training.


In [None]:
import os

# Function to count files in a directory
def count_files(directory):
    if not os.path.exists(directory):
        print(f"❌ Directory does not exist: {directory}")
        return 0
    return len([f for f in os.listdir(directory) if os.path.isfile(os.path.join(directory, f))])

# Count files in each folder
num_healthy_files = count_files(generated_healthy_dir)
num_unhealthy_files = count_files(generated_unhealthy_dir)

# Print the results
print(f"✅ Number of healthy audio files: {num_healthy_files}")
print(f"✅ Number of unhealthy audio files: {num_unhealthy_files}")


✅ Number of healthy audio files: 185
✅ Number of unhealthy audio files: 166
