# Homework 6
**Total Points: 5**

**Instructions:**
1. Complete parts 1 through 5, filling in code or responses where marked with `# YOUR CODE HERE` or `# YOUR ANALYSIS HERE`.
2. The libraries you need, in the order you need them, have already been coded. Do not import additional libraries or move import commands.
3. When finished, run the full notebook by selecting <b>Kernel > Restart & Run All</b>. </li>
4. Submit this completed notebook file to <b>NYU Classes</b>. </li>**(Important: Only submit your .ipynb file! Do not submit the entire dataset.)**

In this assignment you will test different ML techniques to classify solo instruments. This assignment uses a large dataset (9+ GB) which you will download separately: *Medley-solos-DB*:

<blockquote>
V. Lostanlen, C.E. Cella. Deep convolutional networks on the pitch spiral for musical instrument recognition. Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2016.
</blockquote>

**Grading:** Each part is worth 1 point.

**Important Note**: The way you implement the code in your work for each assignment is entirely up to you. There are often many ways to solve a particular problem, so use whatever method works for you. The only requirement is that you follow the instructions, which may prohibit or require certain libraries or commands. Refer to https://scikit-learn.org/ for implementation instructions and tutorials.

In [1]:
import numpy as np
import pandas as pd
import librosa
from librosa import feature
from sklearn import neighbors
from sklearn import svm
from sklearn import neural_network
from sklearn.metrics import f1_score
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt

## Prologue: Download the Dataset and Metadata

The data you will need is a folder containing wav audio files, and a separate .csv file with metadata. You can download both from the following page:

https://zenodo.org/record/3464194#.X4G_oi2z3kJ

Place both the folder and csv file into the same directory as your `Homework-6.ipynb` file, such that the folder stucture is as follows:

`
 .
 <--   Homework-6.ipynb
 <--   Medley-solos-DB_metadata.csv
 <--   Medley-solos-DB
 |     <--   *.wav files
`

The audio files contain recordings from 8 different instuments which have already been labeled and separated into training, validation, and test sets. Each audio file is the same length, and there are many example files from each instrument.

Each audio file has a unique id number associated with it ('uuid4'). This id is important when extracting the audio data and making sure that the file has the correct label, as referenced in the csv file. The following two cells will load and display the metadata into a `Medley_Data` DataFrame. No changes should be made to the following code.

In [2]:
# Load amd Check the csv file

Medley_Data = pd.read_csv("Medley-solos-DB_metadata.csv")
Medley_Data.head()

Unnamed: 0,subset,instrument,instrument_id,song_id,uuid4
0,test,clarinet,0,0,0e4371ac-1c6a-51ab-fdb7-f8abd5fbf1a3
1,test,clarinet,0,0,33383119-fd64-59c1-f596-d1a23e8a0eff
2,test,clarinet,0,0,b2b7a288-e169-5642-fced-b509c06b11fc
3,test,clarinet,0,0,151b6ee4-313a-58d9-fbcb-bab73e0d426b
4,test,clarinet,0,0,b43999d1-9b5e-557f-f9bc-1b3759659858


### Helper Function: `get_file_name_and_label()` and `get_ids()`

The following helper functions have been provided for you.

In [3]:
def get_file_name_and_label(uuid, path='Medley-solos-DB/', dataset=Medley_Data):
    """ Returns full file name and path from a uuid
    
    Parameters
    ----------
    
    uuid: str 
        the unique id (uuid4) for the audio file
    
    path: str
        relative path to audio files
        
    dataset: pandas.DataFrame
        the DataFrame to consult (Medley_Data)
    
    Returns
    -------
    
    filename: str
        relative path and filename
    label: int
        the label associated with that filename
    
    """
    
    rd = dataset.loc[ (dataset['uuid4'] == uuid) ]
    file = path + 'Medley-solos-DB' + '_' + str(rd.values[0,0]) + '-'  + str(rd.values[0,2]) + '_' + rd.values[0,4] + '.wav'
    label = rd.values[0,2]
    return(file, label)
                       
def get_ids(subset, path = 'Medley-solos-DB/', dataset = Medley_Data):

    """ Get a np array of all uuids or a subset of files in the dataset
    
    Parameters
    ----------
    
        subset: str
            one of 'training', 'validation, 'test', or 'all'
            
        path: str
            relative path to the audio files
            
        dataset: pd.DataFrame
            The Medley-solos-DB dataframe to search
         
    Returns
    -------
        filename: np.array
            Medley-solos-DB file name (or 0 if not found)
    
    """
    
    file_array = np.array([])
    rd = dataset.loc[ (dataset['subset'] == subset) ]
    if len(rd.index) < 1:
        file_array = np.array([0])
    else:
        k = 0
        for i in range(len(rd.index)):
            file_array = np.append(file_array,rd.iloc[k,4])
            k += 1
    return(file_array)


# Divides up file names into training, validation, and test sets
tracks_train =  get_ids('training')
tracks_validate = get_ids('validation')
tracks_test = get_ids('test')

print("There are {} tracks in the training set".format(len(tracks_train)))
print("There are {} tracks in the validation set".format(len(tracks_validate)))
print("There are {} tracks in the test set".format(len(tracks_test)))

There are 5841 tracks in the training set
There are 3494 tracks in the validation set
There are 12236 tracks in the test set


## Part 1a: Compute Features

Create a function `compute_features()` such that the input is one audio file and the output is a single feature vector. This function should do the following:
1. Load audio into a sample array.
2. Compute the MFCCs of the input audio, and remove the first (0th) coeficient.
3. Compute the summary statistics of the MFCCs over time:
    1. Find the mean and standard deviation for each feature (2 values for each feature)
    2. stack these statistics into single 1-d vector of size (2*(n_mfccs-1))
4. Return the 1-d vector.

In [4]:
def compute_features(audiofile, n_fft=2048, hop_length=512, n_mels=128, n_mfcc=20):
    """Compute features for an audio file
    
    Parameters
    ----------
    audiofile : str
        name of audio file (with relative directory path)
    n_fft : int
        Number of points for computing the fft
    hop_length : int
        Number of samples to advance between frames
    n_mels : int
        Number of mel frequency bands to use
    n_mfcc : int
        Number of mfccs to compute
    
    Returns
    -------
    features: np.array (1, 2* (n_mfcc - 1))
        feature vector

    """
    # Get data from file to an array
    data, fs = librosa.load(audiofile)
    
    # Compute MFCCs using Librosa
    mfcc = librosa.feature.mfcc(data, n_mfcc=n_mfcc, n_fft=n_fft, hop_length=hop_length)
    
    # Removing the first feature
    mfcc = mfcc[ 1 : , : ]
    
    # Compute mean and std for each feature
    mean = np.mean(mfcc, axis = 1)
    std = np.std(mfcc, axis = 1)
    
    # Appending std to mean to form the feature vector
    features = np.append(mean, std).reshape(1 , -1)
    
    return features

features = compute_features("ocean.wav")
features.shape

(1, 38)

## Part 1b: Create Feature Set

Create a function `create_feature_set()` where the input is an array of audio files and output is a normalized feature set and an accompanying vector of class labels. This function should:
1. Iterate through all audio files in a list of uuids. The training, test, and validation lists have been created for you. For each uuid:
    1. Use `get_file_name_and_label()` to retrieve the audio file name and associated label
    2. Use `compute_features()` to get the 1-d vector for that audio file.
    3. Append the feature vector and label to their respective arrays, and continue to the next file.
2. When finished, output 2 numpy arrays: the feature matrix (n_samples, 2*(mfccs-1)) and the label (n_samples,)

In [5]:
def create_feature_set(id_list):
    """Create feature set from list of input ids.

    Parameters
    ----------
    id_list: np.array
        array of uuid (track_test, track_validate, track_train)

    Returns
    -------

    features: np.array (n_samples, n_features)
        The standard deviation of the features
    labels: np.array (n_samples)
        corresponding label for each feature

    """
    
    # Creating a starting array that contains the first track's features
    first_file, label = get_file_name_and_label( id_list[0] )
    features = compute_features( first_file )
    labels = [ label ]
    
    # Iterating through the rest of the uuid list to obtain features and labels. 
    for uuid in id_list[1: ]:
        filename, label = get_file_name_and_label( uuid )
        feature = compute_features( filename )
        features = np.append( features, feature, axis = 0 )
        labels.append( label )

    return features, labels

# Obaining features the first 10 tracks in the validation set
id_list = tracks_validate[ :10 ]
features, labels = create_feature_set(id_list)
print( features.shape )
print( len(labels) )
labels

(10, 38)
10


[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

## Part 2a: Get Mean and Standard Deviation

Create a fucnction `get_stats()` which gets the mean and standard deviation for each feature in the input matrix.


In [6]:
def get_stats(features):
    """ Get mean and standard deviation of each feature in a set
 
    Parameters
    ---------
    
    features: np.array (n_samples, n_features)
        feature set
 
    Returns
    -------
     
    mean: np.array (n_features)
        mean of input feature set
    std_dev: np.array (n_features)
        standard deviation of input feature set

    """
  
    mean = np.mean(features, axis = 0)
    std = np.std(features, axis = 0)

    return mean, std

### Getting Everything Ready

The code in the following cell has been done for you. When all is well, run the code to compute features and training labels for the 3 data sets in Medley-solos-DB.

**Hint:** Since you are processing many GB of data, this code will take a while to run. To make sure everything works as expected, you may want to test on small subset of the data, like `tracks_train[0:500]`. Although the output won't be valid for the ML experiments, you can verify that the shapes of the output matrices and vectors are correct. 

**Another Hint:** This code will save feature sets and labels to your computer so it won't need to be re-computed if not necessary. 

In [7]:
# THIS CODE IS PROVIDED FOR YOU

# Change this to True if you want to load prevously-computed features
load_saved_tests = True

if not load_saved_tests:
    test_set, test_labels = create_feature_set(tracks_test)
    print("Test Set: " + str(test_set.shape))
    train_set, train_labels = create_feature_set(tracks_train)
    print("Training Set: " + str(train_set.shape))
    validate_set, validate_labels = create_feature_set(tracks_validate)
    print("Validation Set: " + str(validate_set.shape))
    np.savetxt('test_set.csv', test_set, delimiter=',')
    np.savetxt('test_labels.csv', test_labels, delimiter=',')
    np.savetxt('train_set.csv', train_set, delimiter=',')
    np.savetxt('train_labels.csv', train_labels, delimiter=',')
    np.savetxt('validate_set.csv', validate_set, delimiter=',')
    np.savetxt('validate_labels.csv', validate_labels, delimiter=',')
else:
    test_set = np.loadtxt('test_set.csv',delimiter=',')
    test_labels = np.loadtxt('test_labels.csv',delimiter=',')
    train_set = np.loadtxt('train_set.csv',delimiter=',')
    train_labels = np.loadtxt('train_labels.csv',delimiter=',')
    validate_set = np.loadtxt('validate_set.csv',delimiter=',')
    validate_labels = np.loadtxt('validate_labels.csv',delimiter=',')
    print("Test Set: " + str(test_set.shape))
    print("Training Set: " + str(train_set.shape))
    print("Validation Set: " + str(validate_set.shape))

Test Set: (12236, 38)
Training Set: (5841, 38)
Validation Set: (3494, 38)


## Part 2b: Normalize Feature Sets

Using `get_stats()` find the mean and standard deviations for the training set. Then use those statistics to make all 3 data sets have a mean of 0 and standard deviation of 1.

In [8]:
# Obtaining mean and standard deviation of the training set
mean, std = get_stats( train_set )

# Standardizes a data_set using a given mean and standard deviation
def standardize( data_set, mean, std ):  
    data_set_norm = ( data_set - mean ) / std
    return data_set_norm

# Using the mean and standard deviation of the traning set to normalize all three sets
train_set_norm = standardize( train_set, mean, std )
test_set_norm = standardize( test_set, mean, std )
validate_set_norm = standardize( validate_set, mean, std )

## Part 3: k-Nearest Neighbor

Using the data from part 1, run a kNN classification experienment:

- Use `sklearn` entirely
- Run tests on the validation set with k = 1, 5, 20, and 50
- When you decide on the best settings (best f-measure), run the experiment on the test set and output the f-measure and a confusion matrix.

In [9]:
# The k that gives the best f-score on the validation set is 1. 
k = 1

# Create a new classifier by using neighbors.KNeighborsClassifier()
KN_classifier = neighbors.KNeighborsClassifier( n_neighbors = k )

# "Fit" the training data to the model.
KN_classifier.fit( train_set_norm, train_labels )

# Get prediction data
validation_predict = KN_classifier.predict( validate_set_norm )

# Compute f1 score from validation set
f_measure = f1_score(validate_labels, validation_predict, average = 'weighted')
f_measure

0.7444470859111986

In [10]:
# ------------------------------------- #
#    Running predictions on test set    #
# ------------------------------------- #
test_predict_knn = KN_classifier.predict( test_set_norm )

f_meaure_knn = f1_score( test_labels, test_predict_knn, average = 'weighted' )

print("F-measure: ", f_meaure_knn)

con_matrix_knn = confusion_matrix(test_labels, test_predict_knn)
print("Confusion matrix: ")
print(con_matrix_knn)

F-measure:  0.5720595894974918
Confusion matrix: 
[[ 323   23   10   31  198    4    5  138]
 [   0  847    2    0   70    2    4   30]
 [   0    2  964    0   50   12    0  114]
 [ 345   56  105  569  866   17  137 1072]
 [   1   19    2    1 2550    0    0   36]
 [   0   83    3    0  133   19    6   81]
 [  44    1    1    6   29   43  200   82]
 [  85   50    3   80  572    5   12 2093]]


## Part 4: Multi-Layered Perceptron (Neural Network)

Using the same data, run the same test using the MLP classifier.

- Use `sklearn` entirely
- Run tests on the validation set to experiment with the number of iterations and size and number of hidden layers.
- Initially, try setting `max_iter=100` and `hidden_layer_sizes=(5 2))` (meaning 2 hiddden layers of size 5 and 2.
- When you decide on the best settings (best f-measure), run the experiment on the test set and output the f-measure and a confusion matrix.


In [11]:
# I've found that just a single layer gives me consistantly good results on the validation set
MLP = neural_network.MLPClassifier( hidden_layer_sizes=(8), 
                                   max_iter=1000, verbose = False).fit(train_set_norm, train_labels)

In [12]:
y_predict = MLP.predict( validate_set_norm )

f_meaure_validate = f1_score( validate_labels, y_predict, average = 'weighted')
f_meaure_validate

0.7811419769374388

In [13]:
# ------------------------------------- #
#    Running predictions on test set    #
# ------------------------------------- #
test_predict_MLP = MLP.predict( test_set_norm )

f_meaure_MLP = f1_score( test_labels, test_predict_MLP, average = 'weighted')

print("F-measure: ", f_meaure_MLP )
con_matrix_mlp = confusion_matrix(test_labels, test_predict_MLP)
print("Confusion matrix: ")
print(con_matrix_mlp)

F-measure:  0.6099499461646267
Confusion matrix: 
[[ 357   12   11   22  249    0    6   75]
 [   2  884    5    0   48    1    1   14]
 [  28    2  941    3   21    5    1  141]
 [1229   36   84  488  843    3  166  318]
 [   2   14    0    0 2593    0    0    0]
 [  40  120    3    0   60    7    8   87]
 [  43    0    0   10   12    7  297   37]
 [ 121  152   22   52  137    1    1 2414]]


## Part 5: Analysis

For each machine learning method:

1. Predict the labels for the test set using hyperparameters from the validation set
2. Compute & print the f-measures
3. Compute and print the confusion matrix

For each method, report on the following:

4. Which instrument class has the best & worst performance?
5. For the worst source, what other sources are commonly confused? Why?
6. Listen to the audio for examples the classifier got wrong. What do they have in common?

## KNN Analysis

In [14]:
# The k that gives the best f-score on the validation set is 1. 
k = 1

# Create a new classifier by using neighbors.KNeighborsClassifier()
KN_classifier = neighbors.KNeighborsClassifier( n_neighbors = k )

# "Fit" the training data to the model.
KN_classifier.fit( train_set_norm, train_labels )

# Get prediction data
validation_predict = KN_classifier.predict( validate_set_norm )

# Compute f1 score from validation set
f_measure_weighted = f1_score( validate_labels, validation_predict, average = 'weighted' )
print("f-measure weighted", f_measure_weighted, "\n")
f_measure_by_label = f1_score( validate_labels, validation_predict, average = None )
print("f-measure by category: ", f_measure_by_label)

f-measure weighted 0.7444470859111986 

f-measure by category:  [0.42857143 0.90441932 0.88073394 0.32586558 0.84392564 0.125
 0.77777778 0.74167508]


In [15]:
# ------------------------------------- #
#    Running predictions on test set    #
# ------------------------------------- #
test_predict_knn = KN_classifier.predict( test_set_norm )

# Printing F-measure
f_meaure_knn_weighted = f1_score( test_labels, test_predict_knn, average = 'weighted' )
print("F-measure (weighted): ", f_meaure_knn_weighted)
f_meaure_knn_category = f1_score( test_labels, test_predict_knn, average = None )
print("F-measure (by category): ", f_meaure_knn_category)

# Printing the confusion matrix
con_matrix_knn = confusion_matrix(test_labels, test_predict_knn)
print("Confusion matrix: ")
index = ["Clarinet (true)", "Distorted Electric Guitar (true)", "Female Singer (true)", "Flute (true)", "Piano (true)", "Tenor Saxophone (true)", "Trumpet (true)", "Violin (true)"]
columns = ["Clarinet (predict)", "Distorted Electric Guitar (predict)", "Female Singer (predict)", "Flute (predict)", "Piano (predict)", "Tenor Saxophone (predict)", "Trumpet (predict)", "Violin (predict)"]
pd.DataFrame(con_matrix_knn, index = index, columns = columns )

F-measure (weighted):  0.5720595894974918
F-measure (by category):  [0.42222222 0.83202358 0.86379928 0.29527763 0.72064434 0.08899297
 0.51948052 0.63947449]
Confusion matrix: 


Unnamed: 0,Clarinet (predict),Distorted Electric Guitar (predict),Female Singer (predict),Flute (predict),Piano (predict),Tenor Saxophone (predict),Trumpet (predict),Violin (predict)
Clarinet (true),323,23,10,31,198,4,5,138
Distorted Electric Guitar (true),0,847,2,0,70,2,4,30
Female Singer (true),0,2,964,0,50,12,0,114
Flute (true),345,56,105,569,866,17,137,1072
Piano (true),1,19,2,1,2550,0,0,36
Tenor Saxophone (true),0,83,3,0,133,19,6,81
Trumpet (true),44,1,1,6,29,43,200,82
Violin (true),85,50,3,80,572,5,12,2093


#### Which instrument class has the best & worst performance?##

`Best: Distorted Electric Guitar. It also does well on Female Singer and Piano.`

`Worst: Tenor Saxaphone. KNN doesn't seem to do well on wind instruments.`

#### For the worst source, what other sources are commonly confused? Why?

`For Tenor Saxophone, it's often predicted to be piano, distorted electric guitar, and violin. Perhaps the way data is represented ( summary statistics of summary statistics ) blurs away some harmonic information? It can also be the relative small data for tenor sax, and the comparative large size of piano samples.`

#### Listen to the audio for examples the classifier got wrong. What do they have in common?`

`For Violin and Distorted guitar and Tenor Sax, they all have very rich higher harmonic content, which could be confusing to the KNN.`

## MLP Analysis

In [18]:
# I've found that just a single hidden layer gives me consistantly good results on the validation set
MLP = neural_network.MLPClassifier( hidden_layer_sizes=(8), 
                                   max_iter=1000, verbose = False).fit(train_set_norm, train_labels)

validation_predict = MLP.predict( validate_set_norm )

# Compute f1 score from validation set
f_measure_weighted = f1_score( validate_labels, validation_predict, average = 'weighted' )
print("f-measure weighted", f_measure_weighted)
f_measure_by_label = f1_score( validate_labels, validation_predict, average = None )
print("f-measure by category: ", f_measure_by_label)

f-measure weighted 0.8326423143101185
f-measure by category:  [0.53181077 0.92768595 0.86666667 0.36856369 0.92728115 0.26923077
 0.66       0.91231433]


In [19]:
# ------------------------------------- #
#    Running predictions on test set    #
# ------------------------------------- #
test_predict_MLP = MLP.predict( test_set_norm )

# Printing F-measure
f_meaure_MLP_weighted = f1_score( test_labels, test_predict_MLP, average = 'weighted')
print("F-measure (weighted): ", f_meaure_MLP )
f_meaure_MLP_category = f1_score( test_labels, test_predict_MLP, average = None)
print("F-measure (by category): ", f_meaure_MLP_category )

con_matrix_mlp = confusion_matrix(test_labels, test_predict_MLP)
# Printing the confusion matrix
print("MLP Confusion matrix: ")
index = ["Clarinet (true)", "Distorted Electric Guitar (true)", "Female Singer (true)", "Flute (true)", "Piano (true)", "Tenor Saxophone (true)", "Trumpet (true)", "Violin (true)"]
columns = ["Clarinet (predict)", "Distorted Electric Guitar (predict)", "Female Singer (predict)", "Flute (predict)", "Piano (predict)", "Tenor Saxophone (predict)", "Trumpet (predict)", "Violin (predict)"]
pd.DataFrame(con_matrix_mlp, index = index, columns = columns )


F-measure (weighted):  0.6099499461646267
F-measure (by category):  [0.26285714 0.85380117 0.83368961 0.47333486 0.75745178 0.22105263
 0.64970646 0.84461289]
MLP Confusion matrix: 


Unnamed: 0,Clarinet (predict),Distorted Electric Guitar (predict),Female Singer (predict),Flute (predict),Piano (predict),Tenor Saxophone (predict),Trumpet (predict),Violin (predict)
Clarinet (true),253,14,24,38,333,0,20,50
Distorted Electric Guitar (true),0,876,5,0,58,7,1,8
Female Singer (true),26,2,975,4,9,0,0,126
Flute (true),674,44,188,1034,916,2,240,69
Piano (true),3,13,1,0,2592,0,0,0
Tenor Saxophone (true),12,80,1,0,132,42,10,48
Trumpet (true),42,0,1,9,19,3,332,0
Violin (true),183,68,2,117,176,1,13,2340


#### Which instrument class has the best & worst performance?##

`Best: Piano. It also does well on Distorted Electric Guitar, Violin, and Female Singer`

`Worst: Tenor Sax and Clarinet. MLP also doesn't do well on wind instruments as well`

#### For the worst source, what other sources are commonly confused? Why?

`For Clarinet, it's most commonly confused with Piano, and Violin. Again, This could be a result of the massive data size of piano samples, which contributes to a large variety of musicalities and playing styles, as well as frequency range. All of these factors can contribute to mis-classification. `

`For Tenor Sax, MLP performs similarly poor, compared to KNN. There are multiple explanations for this. First, the way audio data is represented in the classifier (Summary statistics of summary statics) might obfuscate certain important information. Second, The wind instruments have a lot of harmonic contents that are similar. They also have a considerable amount of reverb in the mix. Lastly, in comparason to piano samples, which contains a large variety of playing styles and frequency ranges, other information might be confused with instrument classification, such as musical patterns etc.`

#### Listen to the audio for examples the classifier got wrong. What do they have in common?`

`Between the clarinet, tenor sax, and the violin, both samples contain a similar amount of reverb. The playing style also frequently include vibrato. Lastly, when it comes to lower notes, they resemble a similar timbre. These factors could be a reason for how the classifier can get them wrong.`