
# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

## Learning Objectives

At the end of the experiment you will be able to :

-  understand how to implement neural networks on MFCC features


In [None]:
#@title Experiment Walkthrough Video
from IPython.display import HTML

HTML("""<video width="850" height="480" controls>
  <source src="https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/Walkthrough/MFCC_Pytorch_Walkthrough.mp4" type="video/mp4">
</video>
""")

## Dataset

### Description

In this experiment we will use TensorFlow’s Speech Commands Datasets which includes 1lakh+ samples in which each sample is a one-second-long utterance of 30 short commands. This dataset has been curated using thousands of people and is opensource under a Creative Commons BY 4.0 license.

Example commands: 'Yes', 'No', 'Up', 'Down', 'Left', etc.


## Domain Information

When we listen to an audio sample it changes constantly. This means that speech is non-stationary signal. Therefore, normal signal processing techniques cannot be applied to get features from audio. However, if the speech signal is observed using a very small duration window, the speech content in that small duration appears to be  stationary. That brought in the concept of short-time processing of speech. 

MFCC is a technique for short-time processing of speech. 


In [None]:
! wget  https://cdn.talentsprint.com/aiml/Experiment_related_data/week3/Exp1/AIML_DS_AUDIO_STD.zip
! unzip AIML_DS_AUDIO_STD.zip

### Importing required packages


In [None]:
import scipy.io as sio

# Importing torch packages
import torch
import torch.nn as nn      
import torch.nn.functional as F
import torch.optim as optim

# Importing python packages
import numpy as np

### Load the Dataset

The dataset is of ~10GB in size and operating directly on it will take a lot of time, therefore we have included that as a Homework Exercise for those who are interested to go into that detail.
Our team has instead precomputed the features which can be loaded directly and computed on.

Dataset is available to download using the below link: <br>[http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz ](http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz)


### Loading MFCC features

In this experiment assume that the term Validation (short name: val) is the same as 'Test' dataset. Here we have two-way Train/Val(same as test) split

**Note:** Refer to [sio.loadmat](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html)

In [None]:
# Load MFCC Features
saved_vars = sio.loadmat('AIML_DS_AUDIO_STD/mfcc_feats/tf_speech_mfcc_31st_jan18.mat')
# print(saved_vars.keys())

mfcc_features_train = saved_vars['mfcc_features_train']
mfcc_labels_train = saved_vars['mfcc_labels_train']

mfcc_features_val = saved_vars['mfcc_features_val']
mfcc_labels_val = saved_vars['mfcc_labels_val']
print(mfcc_features_train.shape, mfcc_features_val.shape)

(57923, 416) (6798, 416)


In [None]:
# Check for the no of unique labels in the trainset
print(np.unique(mfcc_labels_train))

### Initializing CUDA

CUDA is used as an interface between our code and the GPU.

Normally, we run the code in the CPU. To run it in the GPU, we need CUDA. Check if CUDA is available:

In [None]:
# To test whether GPU instance is present in the system of not.
use_cuda = torch.cuda.is_available()
print('Using PyTorch version:', torch.__version__, 'CUDA:', use_cuda)

Using PyTorch version: 1.9.0+cu102 CUDA: True


In [None]:
device = torch.device("cuda" if use_cuda else "cpu")
device

device(type='cuda')

### Defining the Neural Network

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
    
        self.fc1 = nn.Linear(416, 208)  # First fully connected layer
        self.fc2 = nn.Linear(208, 104)  # Second fully connected layer
        self.fc3 = nn.Linear(104, 30)   # Third fully connected layer which outputs the no of labels

    def forward(self, x):  

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

        return x 

### Creating Instance for the Model

In [None]:
model = Net()   
model = model.to(device)

### Defining Loss Function and Optimizer

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.5)

### Training the Model

In [None]:
epochs = 5
accuracy = []
train_loss = 0

for epoch in range(epochs):
    correct = 0
    for i, feature in enumerate(mfcc_features_train):
        # Convert the features to pytorch tensor
        feature = torch.Tensor(feature).to(device)
        
        # Zero out the gradients from the preivous step 
        optimizer.zero_grad()
        
        # Do forward pass
        outputs = model(feature)
        
        labels = torch.Tensor(mfcc_labels_train[i]).to(device)
        outputs = outputs.unsqueeze(0)
        
        # Calculating the loss
        loss = criterion(outputs, labels.long())
        train_loss += loss.item()
        
        # Do backward pass
        loss.backward()
        
        # optimizer.step() updates the weights accordingly
        optimizer.step()

        # Accuracy calculation
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == labels).sum()

    accuracy.append(correct/len(mfcc_features_train))   
    print(accuracy[-1].item())