# Demo: Loading Pre-Trained Speech Command Model from Google Drive
This notebook demonstrates loading pre-trained model weights from Google Drive and running inference on an audio sample using the model.

### Steps:
1. Mount Google Drive.
2. Load the pre-trained model weights.
3. Prepare an audio sample for inference.
4. Perform inference and display the predicted command.


In [1]:
# Step 1: Mount Google Drive to access the model weights
from google.colab import drive
drive.mount('/content/drive')

### Step 2: Load the Pre-Trained Model
Make sure to provide the correct path to your saved model weights in Google Drive.

In [2]:
import torch
import torch.nn as nn
import torchaudio

# Define the model architecture (make sure it matches the one used in training)
class SpeechCommandModel(nn.Module):
    def __init__(self):
        super(SpeechCommandModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1)
        self.fc1 = nn.Linear(32*61*13, 128)
        self.fc2 = nn.Linear(128, 35)  # Assuming 35 commands

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.relu(self.conv2(x))
        x = x.view(x.size(0), -1)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Instantiate the model
model = SpeechCommandModel()

# Load the pre-trained weights from Google Drive (update the path accordingly)
model_path = '/content/drive/MyDrive/path_to_your_model_weights.pth'
model.load_state_dict(torch.load(model_path))
model.eval()


### Step 3: Prepare an Audio Sample for Inference
Load an audio sample and convert it to the format required by the model (e.g., spectrogram).

In [3]:
# Load an audio file for inference (provide the path to the audio file)
waveform, sample_rate = torchaudio.load('/content/drive/MyDrive/sample_audio.wav')

# Transform the waveform into a mel-spectrogram (required by the model)
transform = torchaudio.transforms.MelSpectrogram(
    sample_rate=16000,
    n_mels=64
)
mel_spectrogram = transform(waveform)

# Add batch dimension and adjust for model input
input_tensor = mel_spectrogram.unsqueeze(0)


### Step 4: Perform Inference and Display the Predicted Command
Pass the processed audio through the model and interpret the result.

In [4]:
# Perform inference
with torch.no_grad():
    output = model(input_tensor)

# Get the predicted label (assuming it's a classification task)
predicted_label = torch.argmax(output, dim=1)

# Define the list of commands (update according to your dataset)
commands = ['yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go', 'zero', 'one', 'two', 'three', 'four',
            'five', 'six', 'seven', 'eight', 'nine', 'bed', 'bird', 'cat', 'dog', 'happy', 'house', 'marvin',
            'sheila', 'tree', 'wow', 'visual', 'backward', 'forward', 'follow', 'learn']

# Display the predicted command
print(f'Predicted command: {commands[predicted_label]}')