<a href="https://colab.research.google.com/github/Maya7991/gsc_classification/blob/main/synthetic_dataset_gen.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
pip install TTS audiomentations soundfile



In [5]:
!apt-get update && apt-get install espeak-ng

0% [Working]            Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
0% [Waiting for headers] [Connecting to security.ubuntu.com (185.125.190.82)] [                                                                               Hit:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
0% [Waiting for headers] [Connecting to security.ubuntu.com (185.125.190.82)] [                                                                               Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
0% [Waiting for headers] [Connecting to security.ubuntu.com (185.125.190.82)] [                                                                               Hit:4 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:6 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:7 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:8 https://pp

In [6]:
import os
import random
import subprocess
import numpy as np
import soundfile as sf
from TTS.api import TTS
from audiomentations import Compose, AddBackgroundNoise

# === CONFIG ===
KEYWORDS = ["mask", "frame"]
NUM_SPEAKERS = 5
OUTPUT_DIR = "kws_samples"
TTS_MODEL = "tts_models/en/vctk/vits"
BACKGROUND_NOISE_DIR = "background_noises"
SAMPLE_RATE = 22050

# Augmentation toggles
APPLY_AUGMENTATIONS = True
PITCH_SHIFT_STEPS = [-100, 100]
SPEED_FACTORS = [0.9, 1.1]
APPLY_NOISE = True

# === SETUP ===
os.makedirs(OUTPUT_DIR, exist_ok=True)
tts = TTS(model_name=TTS_MODEL)

# Setup noise augmenter
augment = Compose([
    AddBackgroundNoise(
        sounds_path=BACKGROUND_NOISE_DIR,
        min_snr_in_db=5.0,
        max_snr_in_db=20.0,
        p=1.0
    )
]) if APPLY_NOISE else None

# === FUNCTION: apply noise ===
def apply_noise(input_wav_path, output_path):
    samples, sr = sf.read(input_wav_path)
    if sr != SAMPLE_RATE:
        raise ValueError(f"Expected {SAMPLE_RATE} Hz, got {sr}")
    noisy_samples = augment(samples=samples, sample_rate=SAMPLE_RATE)
    sf.write(output_path, noisy_samples, SAMPLE_RATE)

# === GENERATE KEYWORD SAMPLES ===
for keyword in KEYWORDS:
    for speaker_id in range(NUM_SPEAKERS):
        base_filename = f"{keyword}_speaker{speaker_id}"
        output_path = os.path.join(OUTPUT_DIR, f"{base_filename}.wav")
        print(f"Generating: {output_path}")
        tts.tts_to_file(text=keyword, speaker=speaker_id, file_path=output_path)

        paths_to_augment = [output_path]

        # Pitch + speed augmentations via sox
        if APPLY_AUGMENTATIONS:
            for shift in PITCH_SHIFT_STEPS:
                aug_path = os.path.join(OUTPUT_DIR, f"{base_filename}_pitch{shift}.wav")
                subprocess.call(["sox", output_path, aug_path, "pitch", str(shift)])
                paths_to_augment.append(aug_path)

            for speed in SPEED_FACTORS:
                aug_path = os.path.join(OUTPUT_DIR, f"{base_filename}_speed{speed}.wav")
                subprocess.call(["sox", output_path, aug_path, "speed", str(speed)])
                paths_to_augment.append(aug_path)

        # Apply background noise
        if APPLY_NOISE:
            for clean_path in paths_to_augment:
                noisy_path = clean_path.replace(".wav", "_noisy.wav")
                print(f"Adding noise to: {clean_path}")
                apply_noise(clean_path, noisy_path)

print("✅ All keyword samples generated with augmentations and noise.")


 > tts_models/en/vctk/vits is already downloaded.
 > Using model: vits
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024


Exception:  [!] No espeak backend found. Install espeak-ng or espeak to your system.

In [3]:
!apt-get update && apt-get install espeak-ng

0% [Working]            Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Hit:3 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Hit:5 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1,776 kB]
Get:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease [18.1 kB]
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Get:10 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:12 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2,986 kB]
Get:13 https://

# Task
Explain the error in the selected code. If possible, fix the error and incorporate the changes into the existing code. Otherwise, try to diagnose the error.

## Verify espeak-ng installation

### Subtask:
Check if `espeak-ng` is correctly installed and accessible in the environment where the notebook is running.


**Reasoning**:
Check the version of `espeak-ng` to confirm its installation and accessibility in the environment.



In [12]:
try:
    from TTS.tts.utils.text.phonemizers.espeak_wrapper import ESpeak
    print("Successfully imported ESpeak class.")
    # Try to instantiate the ESpeak class with a language
    espeak_instance = ESpeak(language='en')
    print("Successfully instantiated ESpeak class.")
except Exception as e:
    print(f"Error importing or accessing ESpeak class: {e}")

Successfully imported ESpeak class.
Error importing or accessing ESpeak class:  [!] No espeak backend found. Install espeak-ng or espeak to your system.


### Subtask:
Check if `espeak-ng` executable is accessible via `subprocess`.

In [11]:
import subprocess

try:
    # Run a simple espeak-ng command to check if it's accessible
    result = subprocess.run(['espeak-ng', '--version'], capture_output=True, text=True, check=True)
    print("espeak-ng command executed successfully.")
    print("Output:")
    print(result.stdout)
except FileNotFoundError:
    print("Error: espeak-ng command not found.")
except subprocess.CalledProcessError as e:
    print(f"Error executing espeak-ng command: {e}")
    print(f"Stderr: {e.stderr}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

espeak-ng command executed successfully.
Output:
eSpeak NG text-to-speech: 1.50  Data at: /usr/lib/x86_64-linux-gnu/espeak-ng-data

