# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

In [None]:
#@title Explanation Video
from IPython.display import HTML

HTML("""<video width="800" height="500" controls>
  <source src="https://cdn.talentsprint.com/aiml/aiml_2020_b14_hyd/experiment_details_backup/Hackathon_Voice_based.mp4" type="video/mp4">
</video>
""")

# Hackathon: Voice commands based food ordering system
The goal of the hackathon is to train your model on different types of voice data (such as studio data, noisy data and finally on your own team data)

## Grading = 40 Marks

### **Objectives:**

Stage 0 - Obtain Features from Audio samples using Pre-trained Network

Stage 1 (17 Marks) - Train a classifier on Studio data and deploy the model in the server 

Stage 2 (10 Marks) - Use 'Noisy_data' and 'Studio_data' together, train a classifier on the same, and deploy the model in the server.

Stage 3 (13 Marks) - Collect your voice samples (team data) and refine the classifier trained on Studio_data and Noisy_data. Deploy the model in the server.

## Dataset Description

The data contains voice samples of classes - Zero, One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Yes and No. Each class is denoted by a numerical label from 0 to 11.

The audio files collected in a Studio dataset contain very few noise samples whereas the audio files collected in a Noisy dataset contain more noise samples. In both datasets, noise and speech are mixed and are in wav format.

The audio files recorded for the studio and noisy data are saved with the following naming convention: 

● Class Representation + user_id + sample_ID (or n + sample_ID)

● For example: The voice sample by the user b2, which is “Yes”, is saved as 10_b2_35.wav. Here 35 is sample ID 

● The ‘10’ that you see above is the label of that sample 


In [None]:
#@title Please run the setup to download the dataset

from IPython import get_ipython
ipython = get_ipython()
  
notebook= "Hackathon1 - Voice Food Ordering System" #name of the notebook

def setup():
    ipython.magic("sx wget https://cdn.talentsprint.com/aiml/Experiment_related_data/Week8/Hackathon2/Noisy_data.zip")
    ipython.magic("sx wget https://cdn.iiith.talentsprint.com/aiml/Hackathon_data/studio_rev_data.zip")
    ipython.magic("sx wget https://cdn.talentsprint.com/aiml/Experiment_related_data/Week8/Hackathon2/net_speech_89.pt")
    ipython.magic("sx unzip studio_rev_data.zip")
    ipython.magic("sx unzip Noisy_data.zip")
    ipython.magic("sx pip install torch torchvision")
    ipython.magic("sx pip install librosa")
    print ("Setup completed successfully")

setup()

Setup completed successfully


In [None]:
import torch
from torch.autograd import Variable
import numpy as np
import librosa
import os
import warnings
from time import sleep
import sys
warnings.filterwarnings('ignore')

## **Stage 0:** Obtain Features from Audio samples using Pre-trained Network
---

### Pretrained Network for deep features


The following function contains code to load a pre-trained network to produce deep features of the audio sample. This network is trained with delta MFCC features of mono channel 8000 bit rate audio sample.

In [None]:
def get_network():

    net = torch.nn.Sequential()

    saved_net = torch.load("net_speech_89.pt").cpu()

    for index, module in enumerate(saved_net):
        net.add_module("layer"+str(index),module)
        if (index+1)%17 == 0 :
            break
    return net

In [None]:
get_network()

Sequential(
  (layer0): Linear(in_features=900, out_features=800, bias=True)
  (layer1): ReLU()
  (layer2): Linear(in_features=800, out_features=700, bias=True)
  (layer3): ReLU()
  (layer4): Linear(in_features=700, out_features=600, bias=True)
  (layer5): ReLU()
  (layer6): Linear(in_features=600, out_features=500, bias=True)
  (layer7): ReLU()
  (layer8): Linear(in_features=500, out_features=400, bias=True)
  (layer9): ReLU()
  (layer10): Linear(in_features=400, out_features=300, bias=True)
  (layer11): ReLU()
  (layer12): Linear(in_features=300, out_features=200, bias=True)
  (layer13): ReLU()
  (layer14): Linear(in_features=200, out_features=100, bias=True)
  (layer15): ReLU()
  (layer16): Linear(in_features=100, out_features=50, bias=True)
)

###Obtaining Features from Audio samples
Generate features from an audio sample of '.wav' format
* Generate Delta MFCC features of order 1 and 2 
* Pass them through the above mentioned deep neural net and obtain deep features.

Parameters: filepath (path of audio sample),
                       sr (sampling rate, all the samples provided are of 8000 bitrate)
         
  Caution: Do not change the default parameters

"""
    extract MFCC feature
    :param y: np.ndarray [shape=(n,)], real-valued the input signal (audio time series)
    :param sr: sample rate of 'y'
    :param size: the length (seconds) of random crop from original audio, default as 3 seconds
    :return: MFCC feature
    """

In [None]:
def get_features(filepath, sr=8000, n_mfcc=30, n_mels=128, frames = 15):
    
    #loads and decodes the audio as a time series y, 
    #represented as a one-dimensional NumPy floating point array.
    #sr is sampling rate of y, the number of samples per second of audio. 
    y, sr = librosa.load(filepath, sr=sr)

    #Short-time Fourier transform (STFT).  Convert the audio file into mel-frequency cepstrum(MFC)
    #a representation of the short-term power spectrum of a sound, based on a linear cosine transform 
    #of a log power spectrum on a nonlinear mel scale of frequency.
    # signal in time-frequency domain by computing(DFT)over short overlapping windows.
    D = np.abs(librosa.stft(y))**2

    #Compute a mel-scaled spectrogram.If a spectrogram inputS is provided, 
    # then it is mapped directly onto the mel basis by mel_f.dot(S). 
    # Get the mel-spectrogram features using a precomuted power spectogram,
   
    S = librosa.feature.melspectrogram(S=D)

    #If a time-series input y, sr is provided,then its magnitude spectrogram 
    # S is first computed, and then mapped onto the mel scale by mel_f.dot(S**power).
    S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=n_mels)

    #transform the spectrogram output to a logarithmic scale by transforming
    # the amplitude to decibels. While doing so we will also normalize 
    # the spectrogram so that its maximum represent the 0 dB point.

    #* Convert a power spectrogram (amplitude squared) to decibel (dB) units
    log_S = librosa.power_to_db(S,ref=np.max)
    

    # Mel-frequency cepstral coefficients (MFCCs) on log-power Mel spectrogram
    features = librosa.feature.mfcc(S=log_S, n_mfcc=n_mfcc)

    if features.shape[1] < frames :
        features = np.hstack((features, np.zeros((n_mfcc, frames - features.shape[1]))))
    elif features.shape[1] > frames:
        features = features[:, :frames]

    # Find 1st order delta_mfcc
    #Compute delta features: local estimate of the derivative of the input data 
    #along the selected axis. Delta features are computed Savitsky-Golay filtering.
    delta1_mfcc = librosa.feature.delta(features, order=1)

    # Find 2nd order delta_mfcc
    delta2_mfcc = librosa.feature.delta(features, order=2)

    features = np.hstack((delta1_mfcc.flatten(), delta2_mfcc.flatten()))
    features = features.flatten()[np.newaxis, :]
    features = Variable(torch.from_numpy(features)).float()
    deep_net = get_network()
    deep_features = deep_net(features)
    #print(deep_features.shape)
    #print(audio_file)
    features.flatten()[np.newaxis, :]
    return deep_features.data.numpy().flatten()

### All the voice samples needed for training are present across the folders "Noisy_data" and "studio_data"

In [None]:
%ls

net_speech_89.pt  Noisy_data.zip  [0m[01;34mstudio_data[0m/
[01;34mNoisy_data[0m/       [01;34msample_data[0m/    studio_rev_data.zip


##**Stage 1**: Train a classifier on the Studio data and Deploy the model in the server

---


### a) Extract features of Studio data (5 Marks)

 Load 'Studio data' and extract deep features

 **Evaluation Criteria:**

 * Complete the code in the load_data function
 * The function should take path of the folder containing audio samples as input
 * It should return features of all the audio samples present in the specified folder into single array (list of lists or 2-d numpy array) and their respective labels should be returned too

In [None]:
def load_data(dirname):
    features = []  
    labels = []
    for root, directories, files in os.walk(dirname):
        filepath = ''
        for  filename in files:
            filepath = os.path.join(root, filename)
            features.append(get_features(filepath))
            r4 = filename.split('_', 1)
            r5 = r4[0]
            labels.append(int(r5))
    return features, labels 

In [None]:
%ls

net_speech_89.pt  Noisy_data.zip  [0m[01;34mstudio_data[0m/
[01;34mNoisy_data[0m/       [01;34msample_data[0m/    studio_rev_data.zip


Load data from studio_data folder for extracting all features and labels

In [None]:
studio_recorded_features, studio_recorded_labels = load_data('/content/studio_data')

In [None]:
len(studio_recorded_features)

8178

In [None]:
len(studio_recorded_labels)

8178

In [None]:
# convert the list to numpy array
studio_recorded_features = np.array(studio_recorded_features)

In [None]:
len(studio_recorded_features)

8178

In [None]:
studio_recorded_features.shape

(8178, 50)

In [None]:
X = studio_recorded_features
y = studio_recorded_labels

### b) Train and classify on the studio_data (5 Marks)

The goal here is to train and classify your model on voice samples collected in studio data

**Evaluation Criteria:**
* Train the classifier
* Expected accuracy is above 85%

In [None]:
# YOUR CODE HERE
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)



In [None]:
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [None]:
#Training the Decision Tree Classification model on the Training set
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='entropy',
                       max_depth=None, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=0, splitter='best')

In [None]:
# Predicting the Test set results
y_pred = classifier.predict(X_test)

In [None]:
#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[149   1   2   0   0   0   1   2   1   1   3   0]
 [  0 144   0   0   1   3   0   6   1   2   2   2]
 [  3   0 146   0   1   0   0   1   0   1   2   5]
 [  1   0   1 155   0   2   3   0   2   0   0   0]
 [  0   0   1   0 163   0   0   0   0   0   0   1]
 [  0   4   0   0   0 171   0   3   0   6   0   0]
 [  3   2   0   3   0   1 149   5  14   5  13   1]
 [  1   5   0   0   0   5   3 160   0   5   1   1]
 [  0   1   0   3   0   0  15   2 141   3   1   0]
 [  0   1   0   0   0   6   2   4   0 130   0   1]
 [  3   4   0   2   0   1   9   6   5   3 168   1]
 [  2   3   3   2   2   1   0   0   0   1   1 148]]


0.8919315403422983

### c) Save and download your model (2 Marks)

Save your trained model

**Hint:** You can use joblib package to save the model.

In [None]:
# YOUR CODE HERE to save the trained model
# Save
from joblib import dump
dump(classifier, 'studio_data_DT.sav') 

# Load
from joblib import load
#clf = load('studio_data_DT.joblib') 

Download your trained model using the code below
* Give the path of model file to download through the browser

In [None]:
from google.colab import files
files.download('/content/studio_data_DT.sav')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### d) Deploy your model in the server (5 Marks).

(This can be done on the day of the Hackathon once the login username and password provided by the mentors in the lab) 

Deploy your model on the server, check the hackathon document (2-Server Access and File transfer For Voice based food ordering) for details. 

To order food in user interface, go through the document (3-Hackathon 1 Application Interface Documentation) for details.


**Evaluation Criteria:**

There are two stages in the food ordering application
        
* Ordering Item
* Providing the number of servings
    
If both the stages are cleared with correct predictions you will get
complete marks
Otherwise, no marks will be awarded


#### Now deploy the model trained on studio_data in the server to order food correctly. 


## **Stage 2**: Use 'Noisy_data' and 'Studio_data' together, train a Classifier on the same and deploy the model in the server 

---

### a) Extract features and classify the model (3 Marks)

The goal here is to train your model on voice samples collected in both noisy and studio data

**Evaluation Criteria:**
* Load 'Noisy_data' and extract features
* Combine noisy features with the studio features
* Train the classifier


Load data from Noisy_data folder for extracting all features and labels

In [None]:
noisy_data, noisy_data_labels = load_data('Noisy_data')

In [None]:
# Combine the features of Studio and Noisy data
# YOUR CODE HERE

Train a classifier on the features obtained from noisy_data and studio_data

In [None]:
# YOUR CODE HERE

### b) Save and download your model (2 Marks)
Save your trained model

**Hint:** You can use joblib package to save the model.

In [None]:
# YOUR CODE HERE

Download your trained model using the code below
* Give the path of model file to download through the browser

In [None]:
from google.colab import files
files.download('<model_file_path>')

### c) Deploy your model in the server (5 Marks).

(This can be done on the day of the Hackathon once the login username and password given by the mentors in the lab) 

Deploy your model on the server, check the hackathon document (2-Server Access and File transfer For Voice based food ordering) for details.

To order food in user interface, go through the document (3-Hackathon 1 Application Interface Documentation) for details.

**Evaluation Criteria:**

There are two stages in the food ordering application
        
* Ordering Item
* Providing the number of servings
    
If both the stages are cleared with correct predictions you will get
complete marks. 
Otherwise, no marks will be awarded


#### Now deploy the model trained on studio_data and noisy_data in the server to order food correctly. 

## **Stage 3:** Collect your voice samples and refine the classifier trained on studio_data and Noisy_data
---

### a) Collect your Team Voice Samples and extract features (5 Marks)

(This can be done on the day of the Hackathon once the login username and password is given by mentors in the lab)

* In order to collect the team data, ensure the server is active (Refer document: 2-Server Access and File transfer For Voice based food ordering)

* Refer document "3-Hackathon_1 Application Interface Documentation" for collecting your team voice samples. These will get stored in your server

**Evaluation Criteria:**
* Load 'Team_data' and extract features
* Combine features of team data with the extracted features of studio and noisy data

In [None]:
!mkdir team_data

In [None]:
# Replace <YOUR_GROUP_ID> with your Username given in the lab
!wget -r -A .wav https://aiml-sandbox1.talentsprint.com/audio_recorder/<YOUR_GROUP_ID>/team_data/ -nH --cut-dirs=100  -P ./team_data

In [None]:
# YOUR CODE HERE to Load data from teamdata folder for extracting all features and labels

In [None]:
# Combine the features of all voice samples (studio_data, noisy_data and teamdata)
# YOUR CODE HERE

### b) Classify and download the model (3 Marks)

The goal here is to train your model on all voice samples collected in noisy, studio and team data


In [None]:
# YOUR CODE HERE for refining your classifier

Save your trained model

**Hint:** You can use joblib package to save the model

In [None]:
# YOUR CODE HERE

Download your trained model using the code below
* Give the path of model file to download through the browser

In [None]:
from google.colab import files
files.download('<model_file_path>')

### c) Deploy your model in the server (5 Marks).

(This can be done on the day of the Hackathon once the login username and password given by the mentors in the lab) 

Deploy your model on the server, check the hackathon document (2-Server Access and File transfer For Voice based food ordering) for details.

To order food in user interface, go through the document (3-Hackathon 1 Application Interface Documentation) for details.


**Evaluation Criteria:**

There are two stages in the food ordering application
        
* Ordering Item
* Providing the number of servings
    
If both the stages are cleared with correct predictions you will get
complete marks
Otherwise, no marks will be awarded


#### Now deploy the model trained on studio_data in the server to order food correctly. 