# Pipeline Evaluation

We noticed that the output from the drum_exraction function has minor differences from time to time, which also impacted the dataframe output of the drum_to_frame function. It is impossible to provide manually created label for every possible outcome from the drum_to_frame function. Therefore, we decided to pre-process 2 songs, convert it into dataframe using the drum_to_frame function and manually transcribe it for evaluation.

To avoid potential copyright issues, we also hide the song name and labeled it as song_1 and song_2

In [1]:
# To begin evaluation, we first need to load the dataframe from pickles
import pandas as pd
df_song_1=pd.read_pickle('song_1.pkl')
df_song_2=pd.read_pickle('song_2.pkl')

In [2]:
# Initiate the pre-trained model
from tensorflow import keras
model = keras.models.load_model('../inference/pretrained_models/annoteators/complete_network.h5')

In [30]:
# Define a function here to convert the df.audio_clip into mel-frequency spectrogram, and make the prediction
import librosa
import numpy as np

def evaluate(df, song_sampling_rate):
    df=df.copy()
    pred_x = []

    for i in range(df.shape[0]):
        pred_x.append(librosa.feature.melspectrogram(y=df.audio_clip.iloc[i], 
                                                 sr=song_sampling_rate, n_mels=128, fmax=8000))
        
    X = np.array(pred_x)
    X = X.reshape(X.shape[0],X.shape[1],X.shape[2],1)
    result = []
    pred_raw=model.predict(X)
    
    pred = np.round(pred_raw)

    for i in range(pred_raw.shape[0]):
        prediction = pred[i]
        if sum(prediction) == 0:
            raw = pred_raw[i]
            new = np.zeros(6)
            ind = raw.argmax()
            new[ind] = 1
            result.append(new)
        else:
            result.append(prediction)

    result = np.array(result)

    drum_hits = ['SD','HH','KD','RC','TT','CC']
    prediction = pd.DataFrame(result, columns = drum_hits)
    
    df.reset_index(inplace=True)
    prediction.reset_index(inplace=True)

    result = df.merge(prediction,left_on='index', right_on= 'index')
    result.drop(columns=['index'],inplace=True)
    
    return result

In [40]:
#Then we will use the precision_recall_fscore_support function to evaluate the result
from sklearn.metrics import precision_recall_fscore_support, confusion_matrix
pred_df=evaluate(df_song_1, 44100)

c_matrix=[]

for pair in list(zip(['SD_T','HH_T','KD_T','RC_T','TT_T','CC_T'], ['SD','HH','KD','RC','TT','CC'])):
    tn, fp, fn, tp = confusion_matrix(pred_df[pair[0]], pred_df[pair[1]]).ravel()
    precision =tp / (tp + fp)
    recall=tp / (tp + fn)
    accuracy=(tp + tn) / (tn + fp + fn + tp)
    f_1=(2*tp)/(2*tp+fp+fn)
    c_matrix.append([pair[1], round(precision,2), round(recall, 2), round(accuracy,2), round(f_1,2)])

  precision =tp / (tp + fp)


In [43]:
pd.DataFrame(c_matrix, columns=['Drum type', 'precision', 'recall', 'accuracy', 'F1'])

Unnamed: 0,Drum type,precision,recall,accuracy,F1
0,SD,0.5,0.49,0.86,0.5
1,HH,0.9,0.6,0.66,0.72
2,KD,0.94,0.74,0.88,0.83
3,RC,0.0,0.0,0.99,0.0
4,TT,0.55,0.6,0.92,0.57
5,CC,,0.0,0.99,0.0
