<a href="https://colab.research.google.com/github/prabhmeharbedi/102165002-SESS_LE1/blob/main/102165002_PRABHMEHAR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cell 1: Introduction and Paper Summary

# Speech Commands Classification - Lab Evaluation

This notebook follows the evaluation tasks based on the paper
[Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition]
(https://arxiv.org/abs/1804.03209).

### Tasks:
1. Summarize the paper in about 50 words.
2. Download, analyze, and statistically describe the dataset.
3. Train a classifier to distinguish commands.
4. Report performance results using standard benchmarks.
5. Record 30 samples of each command in your voice and create a new dataset.
6. Fine-tune the classifier on your voice.
7. Report the results.

## 1. Paper Summary

The Speech Commands dataset is designed to train and evaluate keyword spotting models, providing a standardized collection of audio clips featuring isolated words from diverse speakers. It supports on-device speech recognition tasks, emphasizing efficient, small-scale models for mobile devices. The dataset, available under a Creative Commons license, was crowdsourced and processed to ensure quality. It has facilitated improvements in speech recognition accuracy, model noise tolerance, and adversarial robustness, with Version 2 significantly enhancing performance over Version 1, achieving up to 89.7% Top-One accuracy.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## 2. Analyze Dataset

In [4]:
# Cell 2: Download and Analyze the Dataset
import os
import torchaudio
from collections import Counter
from torch.utils.data import Subset

In [5]:
# Create the data directory if it doesn't exist
data_dir = './data'
if not os.path.exists(data_dir):
    os.makedirs(data_dir)

In [6]:
# Download the Speech Commands dataset
dataset = torchaudio.datasets.SPEECHCOMMANDS(root=data_dir, download=True)

100%|██████████| 2.26G/2.26G [01:14<00:00, 32.4MB/s]


In [7]:
# Select 10 commands to work with
selected_commands = ['yes', 'no', 'up', 'down', 'left', 'right', 'go', 'stop', 'on', 'off']

# Limit the number of samples per command (e.g., 100 samples per command)
samples_per_command = 100

In [8]:
# Create a subset of the dataset by filtering for the selected commands
subset_indices = []
command_counter = Counter()

for idx, sample in enumerate(dataset):
    label = sample[2]
    if label in selected_commands and command_counter[label] < samples_per_command:
        subset_indices.append(idx)
        command_counter.update([label])

    # Stop when we have enough samples for each command
    if all(command_counter[cmd] >= samples_per_command for cmd in selected_commands):
        break

# Create a subset of the dataset
subset_dataset = Subset(dataset, subset_indices)

# Check the sample count for each command in the subset
print(f"Sample counts in subset: {command_counter}")
print(f"Total subset size: {len(subset_dataset)}")

Sample counts in subset: Counter({'down': 100, 'go': 100, 'left': 100, 'no': 100, 'off': 100, 'on': 100, 'right': 100, 'stop': 100, 'up': 100, 'yes': 100})
Total subset size: 1000


## 3. Data Preprocessing

In [9]:
# Cell 3: Data Preprocessing (Padding and Truncating)
import torch
from torch.utils.data import DataLoader
from torch.nn.utils.rnn import pad_sequence

In [10]:
# Define a fixed length for all audio samples (1 second = 16000 samples at 16kHz)
fixed_length = 16000

# Custom collate function to pad and truncate audio data
def collate_fn(batch):
    waveforms = []
    labels = []

    for item in batch:
        waveform = item[0]
        label = item[2]

        if waveform.shape[1] > fixed_length:
            waveform = waveform[:, :fixed_length]
        elif waveform.shape[1] < fixed_length:
            pad_amount = fixed_length - waveform.shape[1]
            waveform = torch.nn.functional.pad(waveform, (0, pad_amount))

        waveforms.append(waveform)
        labels.append(label)

    waveforms = torch.stack(waveforms)
    return waveforms, labels

In [11]:
# DataLoader for the subset dataset
loader = DataLoader(subset_dataset, batch_size=32, shuffle=True, collate_fn=collate_fn)

## 4. CNN Classifier

In [12]:
# Cell 4: Define and Train a CNN Classifier (with correct fully connected layer input size)
import torch.nn as nn
import torch.optim as optim
import torchaudio.transforms as transforms

# Define the MelSpectrogram transform to convert audio waveforms into spectrograms
mel_spectrogram = transforms.MelSpectrogram(
    sample_rate=16000, n_mels=128, n_fft=400, hop_length=160
)

# Define a simple CNN model for speech command classification
class SimpleCNN(nn.Module):
    def __init__(self, num_classes):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.fc1 = nn.Linear(32 * 25 * 32, 128)  # Correct input size based on shape (32*25=800)
        self.fc2 = nn.Linear(128, num_classes)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)  # Flatten the output for the fully connected layers
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Get the number of classes (commands)
num_classes = len(selected_commands)

# Create a dictionary to map commands (labels) to numerical values
label_to_index = {label: idx for idx, label in enumerate(selected_commands)}

# Function to convert string labels to numerical indices
def label_to_tensor(label):
    return torch.tensor(label_to_index[label])

# Instantiate the model
model = SimpleCNN(num_classes=num_classes).to('cuda')

# Define optimizer and loss function
optimizer = optim.Adam(model.parameters(), lr=0.0001)
criterion = nn.CrossEntropyLoss()

# Training loop for 10 epochs
num_epochs = 50

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0

    for waveforms, labels in loader:
        # Convert waveforms to spectrograms
        waveforms = mel_spectrogram(waveforms)
        waveforms = waveforms.squeeze(1).unsqueeze(1).to('cuda')  # Remove extra dimension, add channel dimension
        labels = torch.tensor([label_to_tensor(label) for label in labels]).to('cuda')

        optimizer.zero_grad()
        outputs = model(waveforms)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    print(f'Epoch {epoch + 1}/{num_epochs}, Loss: {running_loss / len(loader)}')

print('Training completed!')



Epoch 1/50, Loss: 2.735433742403984
Epoch 2/50, Loss: 1.7791217155754566
Epoch 3/50, Loss: 1.5099541135132313
Epoch 4/50, Loss: 1.256291102617979
Epoch 5/50, Loss: 1.0833818912506104
Epoch 6/50, Loss: 0.9143548309803009
Epoch 7/50, Loss: 0.8382234964519739
Epoch 8/50, Loss: 0.9330774340778589
Epoch 9/50, Loss: 0.7458871146664023
Epoch 10/50, Loss: 0.6382007179781795
Epoch 11/50, Loss: 0.5344852702692151
Epoch 12/50, Loss: 0.5043157776817679
Epoch 13/50, Loss: 0.4485340788960457
Epoch 14/50, Loss: 0.40360586065799
Epoch 15/50, Loss: 0.38896756747271866
Epoch 16/50, Loss: 0.3600412090308964
Epoch 17/50, Loss: 0.32153833215124905
Epoch 18/50, Loss: 0.33028649259358644
Epoch 19/50, Loss: 0.4526979032671079
Epoch 20/50, Loss: 0.50119389872998
Epoch 21/50, Loss: 0.39286120724864304
Epoch 22/50, Loss: 0.292071697069332
Epoch 23/50, Loss: 0.22603586642071605
Epoch 24/50, Loss: 0.18977399868890643
Epoch 25/50, Loss: 0.1651693576714024
Epoch 26/50, Loss: 0.14813494449481368
Epoch 27/50, Loss: 0.

## 5. Model Evaluation

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Files in the recordings folder: ['no_22.wav', 'yes_1.wav', 'yes_19.wav', 'no_1.wav', 'yes_27.wav', 'yes_4.wav', 'yes_14.wav', 'no_15.wav', 'yes_24.wav', 'yes_22.wav', 'yes_20.wav', 'yes_18.wav', 'yes_6.wav', 'no_16.wav', 'no_27.wav', 'no_19.wav', 'no_5.wav', 'yes_10.wav', 'yes_25.wav', 'no_14.wav', 'yes_8.wav', 'no_3.wav', 'yes_28.wav', 'no_8.wav', 'yes_2.wav', 'no_12.wav', 'yes_17.wav', 'yes_12.wav', 'no_4.wav', 'yes_23.wav', 'yes_21.wav', 'yes_9.wav', 'no_28.wav', 'no_2.wav', 'no_21.wav', 'yes_7.wav', 'yes_13.wav', 'yes_29.wav', 'yes_30.wav', 'no_26.wav', 'yes_3.wav', 'no_11.wav', 'no_24.wav', 'no_23.wav', 'no_20.wav', 'no_7.wav', 'yes_26.wav', 'yes_5.wav', 'yes_16.wav', 'no_29.wav', 'no_30.wav', 'no_6.wav', 'yes_11.wav', 'no_25.wav', 'no_10.wav', 'no_13.wav', 'yes_15.wav', 'no_17.wav', 'no_9.wav', 'no_18.wav', 'down_6.wav', 'down_1.wav', 'down_30.wav', 'do

## 6. Recording

In [16]:
# Cell 6: Install sounddevice and Record 30 Samples of Your Voice
# Step 1: Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [17]:
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')

# Path to the recordings folder in Drive
recordings_path = '/content/drive/MyDrive/speech/Recordings'

# Verify the recordings are accessible
try:
    recording_files = os.listdir(recordings_path)
    print("Recordings found:")
    for file in recording_files:
        print(file)
except FileNotFoundError:
    print(f"The folder {recordings_path} does not exist.")
except Exception as e:
    print(f"An error occurred: {e}")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
The folder /content/drive/MyDrive/speech/Recordings does not exist.


In [None]:
import os

# Check if all files exist before loading
def verify_file_paths(file_list, label):
    missing_files = [file for file in file_list if not os.path.isfile(file)]
    if missing_files:
        print(f"Missing {label} files:", missing_files)
    else:
        print(f"All {label} files are present.")

# Verify files for each label
verify_file_paths(yes_files, "yes")
verify_file_paths(no_files, "no")
verify_file_paths(up_files, "up")
verify_file_paths(down_files, "down")
verify_file_paths(left_files, "left")