# Part A: Classification Problem

Part A of this assignment aims at building neural networks to perform polarity detection from voice recordings, based on data in the National Speech Corpus, which is obtained from [here]( https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus).

The National Speech Corpus is an initiative by the Info-Communications and Media Development Authority, and it is the first large scale Singapore English corpus. Within the dataset, there are 6 parts. In the fifth segment, speakers are made to communicate in several different styles, including Positive Emotions and Negative Emotions. The original recordings are approximately 20 minutes long. Using the librosa library, the recordings are split into shorter segments and preprocessed to features such as chromagrams, Mel spectrograms, MFCCs and various other features.

The preprocessed csv file is provided in this assignment. We will be using the CSV file named simplified.csv, which is both provided to you. The features from the dataset are engineered. The aim is to determine the speech polarity of the engineered feature dataset. The csv file is called simplified.csv with a row of 77 features that you can use, together with the filename. The “filename” column has the labels associated to them.


| Type of features | Explanation |
| -----------------| ------------ |
| Chroma (e.g. chroma_stft_mean) | Describes the tonal content of a musical audio signal in a condensed form (Stein et al, 2009) [2] |
| Rms (e.g. rms_mean) |Square root of average of a squared signal (Andersson) [3] |
| Spectral (e.g. spectral_centroid_mean) | Spectral Centroid is a metric of the centre of gravity of the frequency power spectrum (Andersson) [3] |
| Rolloff (e.g. rolloff_mean) | Spectral rolloff is a metric of how high in the frequency spectrum a certain part of energy lies (Andersson) [3] |
| Zero crossing (e.g. zero_crossing_mean) | Zero-crossing rate is the number of time domain zero-crossings within a processing window (Andersson) [3] |
| Harmonics (e.g. harmony_mean) | Sound wave that has a frequency that is a n integer multiple of a fundamental tone. Refer to link: https://professionalcomposers.com/what-are-harmonics-in-music/ |
| Tempo | Periodicity of note onset pulses (Alonso et al, 2004) |
| MFCC (Mel Frequency Cepstral Coefficient) | Small set of features (usually about 10-20) which concisely describe the overall shape of a spectral envelope. Refer to link: https://musicinformationretrieval.com/mfcc.html |



Part A consists of **four** parts. Please use the question templates provided for submitting your answer. In each question template, you are guided step by step in answering the questions. Use CPU for all the questions to ensure reproducibility. Best of luck!

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!ls

drive  sample_data


In [3]:
import torch
import pandas as pd
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset, random_split

In [4]:
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# Getting label from filename

In [5]:
df = pd.read_csv('/content/drive/MyDrive/SC4001/Assignment/simplified.csv')
df.head()

Unnamed: 0,filename,tempo,total_beats,average_beats,chroma_stft_mean,chroma_stft_var,chroma_cq_mean,chroma_cq_var,chroma_cens_mean,chroma_cens_var,melspectrogram_mean,melspectrogram_var,mfcc_mean,mfcc_var,mfcc_delta_mean,mfcc_delta_var,rmse_mean,rmse_var,cent_mean,cent_var,spec_bw_mean,spec_bw_var,contrast_mean,contrast_var,rolloff_mean,rolloff_var,poly_mean,poly_var,tonnetz_mean,tonnetz_var,zcr_mean,zcr_var,harm_mean,harm_var,perc_mean,perc_var,frame_mean,frame_var,mfcc0_mean,mfcc0_var,mfcc1_mean,mfcc1_var,mfcc2_mean,mfcc2_var,mfcc3_mean,mfcc3_var,mfcc4_mean,mfcc4_var,mfcc5_mean,mfcc5_var,mfcc6_mean,mfcc6_var,mfcc7_mean,mfcc7_var,mfcc8_mean,mfcc8_var,mfcc9_mean,mfcc9_var,mfcc10_mean,mfcc10_var,mfcc11_mean,mfcc11_var,mfcc12_mean,mfcc12_var,mfcc13_mean,mfcc13_var,mfcc14_mean,mfcc14_var,mfcc15_mean,mfcc15_var,mfcc16_mean,mfcc16_var,mfcc17_mean,mfcc17_var,mfcc18_mean,mfcc18_var,mfcc19_mean,mfcc19_var
0,app_3001_4001_phnd_neg_0000.wav,184.570312,623,69.222222,0.515281,0.093347,0.443441,0.082742,0.249143,0.021261,0.038422,0.087981,-16.29088,8822.263672,0.01436,7.908705,0.04347,0.000818,1833.579533,511344.031721,1746.559035,144881.971359,19.095815,319.628529,3827.14775,3827.14775,0.294635,0.294635,0.01577,0.012313,0.114622,0.004777,2.8527e-06,0.001529,7.4703e-06,0.000618,1.729204,0.945134,-389.5784,1394.284424,134.581345,694.73645,-39.877445,331.621368,55.018433,417.293945,-36.944489,246.965225,18.573177,270.046539,-19.398455,136.647842,4.641793,166.485138,-5.455597,105.498589,-6.548687,143.077621,1.620288,80.328003,-14.974999,55.536694,1.443957,105.00219,-10.213489,52.869869,0.71876,75.744896,-10.669799,63.340282,1.811605,58.117188,-3.286546,54.268448,-2.719069,59.548176,-4.559987,70.774803
1,app_3001_4001_phnd_neg_0001.wav,151.999081,521,74.428571,0.487201,0.094461,0.542182,0.073359,0.274423,0.008025,0.204988,5.152482,-16.18387,7335.709961,-0.025494,18.772476,0.090213,0.008415,1927.253538,354369.575716,1627.620214,68783.641466,19.186873,305.084512,3762.586531,3762.586531,0.583882,0.583882,0.015399,0.006057,0.122172,0.003331,-1.6512e-06,0.002638,-2.78816e-05,0.009359,1.793741,0.910349,-350.381317,5990.534668,112.355591,596.321411,-50.575706,1418.432983,39.114021,507.006927,-33.239597,416.781708,3.573578,236.576492,-11.785189,178.042618,-1.014654,178.834152,4.223846,226.874054,-8.432135,133.631943,-0.922831,75.74511,-14.040901,129.677872,-1.542051,89.679306,-2.871657,86.87146,-2.855503,106.239403,-5.666375,90.256195,1.573594,105.070496,-0.742024,82.417496,-1.961745,119.312355,1.51366,101.014572
2,app_3001_4001_phnd_neg_0002.wav,112.347147,1614,146.727273,0.444244,0.099268,0.442014,0.083224,0.26443,0.01341,0.218063,3.372185,-15.555374,7140.790039,-0.001268,10.85019,0.099754,0.005438,1558.350787,286662.686733,1480.320551,108552.760715,19.694916,271.168203,3027.93896,3027.93896,0.626042,0.626042,0.000772,0.012586,0.094763,0.002338,-2.344e-07,0.005676,1.9256e-06,0.005432,2.204735,1.657315,-340.841705,2853.95874,139.396652,639.750854,-44.360332,786.586487,34.030853,405.441681,-37.146648,447.909576,1.16685,360.854797,-11.257973,170.027328,-3.371944,226.6996,1.764457,140.997101,-9.14403,123.745407,0.545947,68.511703,-12.346964,91.306229,-3.44801,96.648567,-4.782896,96.846092,-3.135671,85.535561,-5.50239,73.07975,0.202623,72.04055,-4.021009,73.844353,-5.916223,103.834824,-2.939086,113.598824
3,app_3001_4001_phnd_neg_0003.wav,107.666016,2060,158.461538,0.454156,0.100834,0.42437,0.084435,0.257672,0.016938,0.214154,3.943239,-16.38241,7671.897461,-0.017487,10.714126,0.092214,0.006496,1501.958914,236170.752891,1468.111222,100434.245015,19.731574,280.614702,2981.342123,2981.342123,0.544611,0.544611,0.024137,0.015121,0.085925,0.001861,-4.205e-07,0.006873,-2.248e-07,0.004422,1.789098,1.241672,-359.523376,3351.339844,135.395157,589.953613,-40.197311,840.56427,32.70483,312.519379,-28.228338,411.952454,0.862422,276.24884,-9.016964,178.003738,-6.123117,168.513107,1.593995,121.375755,-7.000763,103.869049,-3.331117,89.101837,-12.368806,100.109505,-4.700512,99.216591,-5.343085,74.244865,-3.944259,76.465134,-8.812989,93.791893,-0.429413,60.002579,-4.013513,82.54454,-5.858006,84.402092,0.686969,90.126389
4,app_3001_4001_phnd_neg_0004.wav,75.99954,66,33.0,0.47878,0.1,0.414859,0.089313,0.252143,0.019757,0.128487,0.79246,-17.22436,8488.603516,0.012062,8.329871,0.079344,0.00256,1395.230033,281802.432273,1441.533673,145394.091098,19.596951,282.358081,2799.582248,2799.582248,0.463039,0.463039,0.002951,0.011803,0.0761,0.001372,-1.6971e-06,0.003972,-2.2884e-06,0.002766,1.885705,1.133871,-377.545197,2084.74292,149.727615,806.048401,-39.882656,449.930267,41.879189,472.226166,-30.462873,235.521164,7.132905,275.070709,-14.036365,134.910324,-3.452546,128.165848,-8.529839,109.861977,-12.82693,100.156181,-2.546823,54.763493,-15.552882,68.453285,-1.928557,86.908035,-9.201928,76.01844,-3.700468,72.502159,-6.584204,64.973305,0.744403,68.908516,-6.354805,66.414391,-6.555534,47.85284,-4.809713,73.033966


In [6]:
df.columns

Index(['filename', 'tempo', 'total_beats', 'average_beats', 'chroma_stft_mean', 'chroma_stft_var', 'chroma_cq_mean', 'chroma_cq_var', 'chroma_cens_mean', 'chroma_cens_var', 'melspectrogram_mean', 'melspectrogram_var', 'mfcc_mean', 'mfcc_var', 'mfcc_delta_mean', 'mfcc_delta_var', 'rmse_mean', 'rmse_var', 'cent_mean', 'cent_var', 'spec_bw_mean', 'spec_bw_var', 'contrast_mean', 'contrast_var', 'rolloff_mean', 'rolloff_var', 'poly_mean', 'poly_var', 'tonnetz_mean', 'tonnetz_var', 'zcr_mean', 'zcr_var', 'harm_mean', 'harm_var', 'perc_mean', 'perc_var', 'frame_mean', 'frame_var', 'mfcc0_mean', 'mfcc0_var', 'mfcc1_mean', 'mfcc1_var', 'mfcc2_mean', 'mfcc2_var', 'mfcc3_mean', 'mfcc3_var', 'mfcc4_mean', 'mfcc4_var', 'mfcc5_mean', 'mfcc5_var', 'mfcc6_mean', 'mfcc6_var', 'mfcc7_mean', 'mfcc7_var', 'mfcc8_mean', 'mfcc8_var', 'mfcc9_mean', 'mfcc9_var', 'mfcc10_mean', 'mfcc10_var', 'mfcc11_mean', 'mfcc11_var', 'mfcc12_mean', 'mfcc12_var', 'mfcc13_mean', 'mfcc13_var', 'mfcc14_mean', 'mfcc14_var',
    

In [7]:
df['sentiment'] = df['filename'].str.contains('pos').astype('int')

In [8]:
df = df.drop(columns='filename', axis=1, errors='ignore')

In [9]:
df.shape

(12057, 78)

# Model Creation

In [10]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")


Using cuda device


Model Creation

In [11]:
class DNN(nn.Module):
  def __init__(self, num_neurons=128):
    super().__init__()
    self.relu = nn.ReLU()
    self.dropout = nn.Dropout(p=0.2)
    self.layer1 = nn.Linear(77, num_neurons)
    self.layer2 = nn.Linear(num_neurons, num_neurons)
    self.layer3 = nn.Linear(num_neurons, num_neurons)
    self.sigmoid = nn.Sigmoid()

  def forward(self, x):
    x = self.layer1(x)
    x = self.relu(x)
    x = self.Dropout(x)
    x = self.layer2(x)
    x = self.relu(x)
    x = self.Dropout(x)
    x = self.layer3(x)
    x = self.relu(x)
    x = self.Dropout(x)
    out = self.sigmoid(x)
    return out


In [12]:
model = DNN().to(device)
print(model)

DNN(
  (relu): ReLU()
  (dropout): Dropout(p=0.2, inplace=False)
  (layer1): Linear(in_features=77, out_features=128, bias=True)
  (layer2): Linear(in_features=128, out_features=128, bias=True)
  (layer3): Linear(in_features=128, out_features=128, bias=True)
  (sigmoid): Sigmoid()
)


DataLoader

In [13]:
class data(Dataset):
  def __init__(self, path):
    self.df = pd.read_csv(path)
    self.df['sentiment'] = self.df['filename'].str.contains('pos').astype('int')
    self.df = self.df.drop(columns='filename', axis=1, errors='ignore')
    self.df = torch.tensor(df.values)
  def __len__(self):
    return self.df.shape[0]
  def __getitem__(self, idx):
    features = self.df.drop(columns='sentiment').iloc[idx]
    label = self.df['sentiment'].iloc[idx]
    return features, label

In [14]:
dataset = data('/content/drive/MyDrive/SC4001/Assignment/simplified.csv')

In [15]:
generator = torch.Generator().manual_seed(19)
train, test = random_split(dataset, [0.7, 0.3], generator=generator)

In [16]:
train_loader = DataLoader(train, batch_size=128, shuffle=True)
test_loader = DataLoader(test, batch_size=128, shuffle=True)

In [17]:
train_features, train_labels = next(iter(train_loader))

AttributeError: ignored