**Lab 3** for the course of *Selected Topics in Music and Acoustic Engineering* :

***Machine Learning for Audio and Acoustic Engineering***
---

# **Before you start**

*   Go to "*File*" --> "*Save a copy in Drive*"
*   Open that copy (might open automatically)
*   Then continue below

# **Lab 3: Neural Networks**

In this lab we will start to work with deep learning models. We will begin by looking at simple examples with synthetically generated data. Then, you will move to a more challenging and realistic problem.


### **Exercise 1**: Approximating Synthetic Data

Execute the following lines for create a synthetically generated dataset:

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
import librosa
import sklearn
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
plt.style.use("seaborn-v0_8")

In [None]:
def gen_data(size, a, b):
  x = np.random.rand(size,1)-0.5
  y = a*x + b
  y = y*(x>0)
  y = y + 0.2*(np.random.randn(*x.shape))
  return x, y

In [None]:
# Create data and plot
Xdata, Ydata = gen_data(1000, 2, 1)
plt.scatter(Xdata,Ydata);
plt.xlabel('x');
plt.ylabel('y');

Describe the function underlying the model used to generate the data. Complete the symbols "?"

\begin{equation}
  y(x)=\begin{cases}
    ? *x + 1 + ? *  \mathcal{N}(0, 1) & \text{if } x >0 \text{ ,where } x=\mathcal{U}(0, 1) - ?\\
     ? *  \mathcal{N}(0, 1), & \text{otherwise}.
  \end{cases}
\end{equation}


### **Exercise 2**: Create a MLP neural network model using Keras

Create the following fully-connected feedforward network using Keras' sequential model. Use:

| Layer | Type  | Units | Activation | Description                                                  |
|-------|-------|-------|------------|--------------------------------------------------------------|
| 1     | Dense | 5     | ReLU       | First hidden layer with 5 neurons, applies non-linearity     |
| 2     | Dense | 5     | ReLU       | Second hidden layer, also with 5 neurons                     |
| 3     | Dense | 1     | Linear     | Output layer, returns a single continuous value (regression) |

Show the model's summary.

![](https://drive.google.com/uc?export=view&id=1UJwycQXQG8kkF0N8CmDW-ED-hY-Uck5o)



How many parameters has the model?  (Hint: use the function model.summary())

Compile the model and train it on Xdata using MSE as the loss function and SGD optimizer with learning rate 0.01. Train the model until reaching 300 epochs.

In [None]:
# model.compile(....)
# history = model.fit(Xdata, Ydata, epochs=...)

Plot the training history of the network, showing the evolution of the training loss.

Which is the minimum loss achieved by the model? At which epoch achieved that loss value?

In [None]:
# ...

print('Minumum Loss on the Training Set: ', min_loss , ' obtained at epoch: ' , ''.join(map(str, min_loss_index[0])) )

Plot the true training data together with the approximated data using the predictions.

In [None]:
# ...

# plt.scatter(Xdata,Ydata)
# plt.scatter(Xdata,preds);

Now initialize the model again and fit it, but train it for 1000 epochs.

In [None]:
# Define the Fully-connected MLP
# ....

# Compiling the model
# ...

# Training the model
# history_1000 = model.fit(Xdata, Ydata, epochs=...)

Plot the original data and the predicted data. What are the differences observed with respect to the case before?

What is the best loss achieved in this case?

### **Exercise 3**: Classification

Generate synthetically two bivariate Gaussian vectors (see np.random.multivariate_normal), each one with 1000 samples:

*   Xdata0, with mean [-1,-1] and covariance [[4,0],[0,4]]
*   Xdata1, with mean [1,1] and covariance [[3,0],[0,3]]





In [None]:
#Bivariate Gaussian
mean0 = [-1, -1]
cov0 = [[4, 0], [0, 4]]

mean1 = [1, 1]
cov1 = [[3, 0], [0, 3]]

# Xdata0 = ...
# Xdata1 = ...

# print(Xdata0.shape, Xdata1.shape)

Expected output:
```
(1000, 2) (1000, 2)
```

From the above Gaussian vectors, stack them to generate a feature data matrix Xdatac with shape (2000,2) and the corresponding label vector Ydatac with zeros and ones of shape (2000,)

In [None]:
#Features
# Xdatac = ...
# Xdatac.shape

#Labels
labels0 = np.zeros(Xdata0.shape[0])
labels1 = np.ones(Xdata1.shape[0])
labels_gt = np.concatenate((labels0,labels1),axis=0).T

# print(Xdatac.shape, labels_gt.shape)

Expected output:
```
(2000, 2) (2000, 2)
```

Create a scatterplot of the two classes:

Divide the data Xdatac into a training partition and validation partition using "train_test_split" from sklearn. Use 30% of your data for validation.

In [None]:
from sklearn.model_selection import train_test_split
# X_train, X_valid, y_train, y_valid = train_test_split(....)

Create a model identical to the one of Exercise 2 but use sigmoid activation in the output layer. You need also now to specify that the input has two values.

Train the model on the training partion. Select as loss function "binary_crossentropy" and monitor the training accuracy using metrics=["accuracy"]. Use also the validation partition to track the validation accuracy at each epoch.

In [None]:
# Define the Fully-connected MLP
# ...

# Compiling the model
# ...

# Training the model
# history = model.fit(X_train, y_train, validation_data=(X_valid, y_valid), epochs=300)

Plot the training history showing the training accuracy and validation accuracy.

Predict over the training data and create a scatter plot showing the predicted class for each data example.

Scatter the prediction

In [None]:
plt.figure(figsize=(20,13))
plt.scatter(X_train[preds<0.5,0], X_train[preds<0.5,1], c='b')
plt.scatter(X_train[preds>0.5,0], X_train[preds>0.5,1], c='r')
plt.title('Scatter plot')
plt.legend(('class 0','class 1'));

### **Exercise 4**: Data Preparation


Follow the same steps in Lab 2 to download the ESC-50 dataset.

In [None]:
# !wget https://github.com/karolpiczak/ESC-50/archive/master.zip
# !unzip master.zip

Create a list of the files corresponding to the 10 first classes. Those files will form our dataset (400 signals).

In [None]:
fn_csv = 'ESC-50-master/meta/esc50.csv'

files = []  # File list
labels = []  # Class list

# ...

print(f'Lengths: esc5_X: {len(files)}, esc5_y: {len(labels)}')


Expected output:

``` Lengths: esc5_X: 400, esc5_y: 400 ```

Convert the labels to class indexes (rank 1) - e.g. 0,1,2,....,9

In [None]:
# labels = ...

Create a list storing the signals from all the files:

In [None]:
# signals = ...

For each signal in the list, compute the melspectrogram with librosa using default parameters:

In [None]:
mel_spegrams = []
# ...

Convert the list to a numpy array called Xdata. You should end up with an array of shape (400, 128, 216). What do these numbers mean?

In [None]:
Xdata = np.asarray(mel_spegrams)
Xdata.shape

Expected output:

```(400, 128, 216)```

ANSWER THE QUESTION HERE!

### **Exercise 5**: MLP Classification

Let's try now to classify the audio files by using the computed mel spectrogram data. First, flatten each spectrogram into a one-dimensional array, so that you end up with a new array Xdata_f of shape (400, 27648). You can do that by using the function reshape from numpy.

In [None]:
# Xdata_f = ...
# print(Xdata_f.shape)

Expected output:
```
(400, 27648)
```

Let's first use the sklearn StandardScaler function to scale the data (save the output in Xdata_s)

In [None]:
# scaler = ...
# Xdata_s = ...

Create a test and validation split with 20% of the samples. Call the splits X_train, y_train, X_val, y_val.

In [None]:
# X_train, X_val, y_train, y_val = ...

Now, create a MLP-based network for classifying these audios. You can use the same layer structure as in the previous examples, but remember to adapt the output layer so that its size is equal to the number of classes and apply 'softmax' activation. You can also try to increase the number of neurons in the hidden layers.

Proposed architecture:

| Layer | Type  | Units | Activation | Output Shape           | Description                                  |
|-------|-------|-------|------------|-------------------------|----------------------------------------------|
| Input | Input | -     | -          | (None, shape_size)      | Input layer with `shape_size` features       |
| 1     | Dense | 16    | ReLU       | (None, 16)              | First hidden layer with 16 neurons           |
| 2     | Dense | 16    | ReLU       | (None, 16)              | Second hidden layer with 16 neurons          |
| 3     | Dense | 10    | Softmax    | (None, 10)              | Output layer for 10-class classification     |

Fit the model using "sparse_categorical_crossentropy" as loss function. Probably your first attempts will overfit.

Try different strategies to prevent overfitting:

*   Dropout
*   Regularization
*   Reduce number of neurons/layers

What is the best accuracy you could get with a fully-based MLP network?


In [None]:
# Compile the model

# Training the model
# history = model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=300)


Which is the minimum loss achieved by the model? At which epoch achieved that loss value?

Plot the training history of the network, showing the evolution of the training/validation loss.

* REPEAT EXPERIMENTS USING THE AFOREMENTIONED TECHNIQUES TO PREVENT OVERFITTING

### **Exercise 6**: CNN

Create training and validation partitions from Xdata. Remember that Xdata stores has size (400, 128, 216), storing 400 Mel spectrograms of sie (128,216). Name the partitions X_train, X_test, y_train and y_test.

In [None]:
# X_train, X_test, y_train, y_test = train_test_split(..., test_size=0.2)
# print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

Scale each spectrogram by substracting its mean and dividing by its standard deviation.

In [None]:
scaler = sklearn.preprocessing.StandardScaler()
X_train_s = []
X_test_s = []

# ...

X_train_s = np.asarray(X_train_s)
X_test_s = np.asarray(X_test_s)

Create a convolutional neural network model. Remember to adapt the input shape of the first layer to the new input. 

You can start with a model like the next one. Remember to include regularization strategies like dropout layers.

| Layer | Type        | Filters/Units | Kernel/Pool Size | Activation | Output Shape        | Description                                  |
|-------|-------------|----------------|------------------|------------|---------------------|----------------------------------------------|
| Input | Input        | -              | -                | -          | (None, H, W, C)      | Input shape from `X_train_sx.shape[1:]`      |
| 1     | Conv2D       | 16             | (3, 3)           | ReLU       | (None, H-2, W-2, 16) | First convolutional layer                    |
| 2     | MaxPooling2D | -              | (3, 3), stride 3 | -          | (None, H//3, W//3, 16) | Downsamples feature maps                     |
| 3     | Conv2D       | 16             | (3, 3)           | ReLU       | (None, ..., ..., 16) | Second convolutional layer                   |
| 4     | MaxPooling2D | -              | (2, 2), stride 2 | -          | (None, ..., ..., 16) | Second pooling layer                         |
| 5     | Conv2D       | 32             | (2, 2)           | ReLU       | (None, ..., ..., 32) | Third convolutional layer                    |
| 6     | Flatten      | -              | -                | -          | (None, N)            | Flattens 2D features to 1D vector            |
| 7     | Dense        | 32             | -                | ReLU       | (None, 32)           | Fully connected hidden layer                 |
| 8     | Dropout      | -              | -                | -          | (None, 32)           | Dropout for regularization (rate=0.1)        |
| 9     | Dense        | 10             | -                | Softmax    | (None, 10)           | Output layer for 10-class classification     |

In [None]:
# expanding X_train_s and X_test_s to fit conv2d
# ...

Fit the model and try to improve the results obtained with the MLP model.

In [None]:
#compile the model
# ...

#fit the model
# history = model.fit(X_train_sx, y_train, validation_data=(X_test_sx, y_test), batch_size=32, epochs=100)

Plot the training history (train/val loss/accuracy)

Which is the minimum loss achieved by the model? At which epoch achieved that loss value?

In [None]:
# ...

# print('Minimum Loss on the Validation Set: ', min_val_loss ,' obtained at epoch: ' , ''.join(map(str, min_val_loss_index[0])), '  with an Accuracy of: ', val_accuracy_history[int(min_val_loss_index[0].item())] )

Tune your model and try to achieve an accuracy above 60%.