<a href="https://colab.research.google.com/github/DrRC81/unikore/blob/main/Lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Preliminaries

Import the libraries

In [None]:
#accessing google drive folders
from google.colab import drive

#generic libraries
import pandas as pd
import numpy as np
import os
import glob as gl
import librosa
import pickle
import sys
import time
import matplotlib.pyplot as plt
import shutil

#svm
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC

#tensorflow (cnn)
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers, callbacks
from tensorflow import keras

#confusion matrix, recall and precision, roc
from sklearn.metrics import accuracy_score, confusion_matrix, roc_auc_score, accuracy_score, roc_curve
from sklearn.metrics import ConfusionMatrixDisplay, RocCurveDisplay, precision_recall_curve, PrecisionRecallDisplay

#Database analysis
First of all we have to mount the drive folder where the database is located




In [None]:
drive.mount('/content/drive')

Mounted at /content/drive


The database is located in the directory /content/drive/MyDrive/pc-gita/PC-GITA_per_task_44100Hz/

Let's have a look to the file metadata.xlsx, which collects all the meta data about the patients. In order to do that, we need pandas library.

In [None]:
metadata = pd.read_excel('/content/drive/MyDrive/pc-gita/metadata.xlsx')
metadata.head()

Unnamed: 0,ID,RECODING ORIGINAL NAME,UPDRS,UPDRS-speech,H/Y,SEX,AGE,time after diagnosis
0,A0001,AVPEPUDEA0001,28.0,1.0,2.0,M,64,3.0
1,A0002,AVPEPUDEA0002,19.0,0.0,1.0,F,72,2.5
2,A0003,AVPEPUDEA0003,52.0,2.0,3.0,F,75,3.0
3,A0005,AVPEPUDEA0005,32.0,1.0,2.0,M,65,12.0
4,A0006,AVPEPUDEA0006,28.0,1.0,2.0,F,66,4.0


About the columns

* RECODING ORIGINAL NAME: file name of recording
* UPDRS: Parkinson’s disease rating scale
* UPDRS-speech Parkinson’s disease rating scale related to speech
* H/Y: Hoehn&Yahr scale, Parkinson's disease rating scale

SEX, AGE, time after diagnosis are self explicative

We can make advantage of pandas to get some statistical parameters

In [None]:
print('Entries total: ' + str(len(metadata)))
print('Age avarage: ' + str(metadata['AGE'].mean()) + '\n')

print(metadata['SEX'].value_counts())
print('\n')

print('Age avarage by sex\n')
print(metadata.groupby('SEX')['AGE'].mean())

# Database refining

It is convenient to make a csv file to store all the the paths of file database. We should at first replace all blank spaces with underscore to have smooth file paths (otherwise disvoice features can not be loaded).

Note: looking at the ids, that not all the speckers are allowed into the training dataset, so we need to rearrange te ids to have a uniform distribution for the kfold.

In [None]:
# Rename a folder and its subfolders by replacing spaces with underscores

path = "/content/drive/MyDrive/pc-gita/PC-GITA_per_task_44100Hz"

# Split the path into parent directory and folder name
parent_dir, folder_name = os.path.split(path)

# Replace spaces with underscores in the folder name
new_folder_name = folder_name.replace(" ", "_")

# Create the new path by joining the parent directory and new folder name
new_folder_path = os.path.join(parent_dir, new_folder_name)

# Rename the main folder
os.rename(path, new_folder_path)

# Traverse through the subfolders and rename them
for root, dirs, files in os.walk(new_folder_path):
    for directory in dirs:
        # Create the path for the subfolder
        subfolder_path = os.path.join(root, directory)

        # Replace spaces with underscores in the subfolder name
        new_subfolder_name = directory.replace(" ", "_")

        # Create the new path for the renamed subfolder
        new_subfolder_path = os.path.join(root, new_subfolder_name)

        # Rename the subfolder
        os.rename(subfolder_path, new_subfolder_path)

        # Print the original and renamed subfolder paths
        print(subfolder_path + " > " + new_subfolder_path)


# Setting up csv file of all recording

We collect all waveforms according to the specific cathegory. Namely there are four groups: ...

In [None]:
# Get file paths using glob for different categories

# Get file paths for DDK analysis category
ddk = np.array(gl.glob('/content/drive/MyDrive/pc-gita/PC-GITA_per_task_44100Hz/DDK_analysis/*/*/*/*.wav'))

# Get file paths for Vowels category
vowels = np.array(gl.glob('/content/drive/MyDrive/pc-gita/PC-GITA_per_task_44100Hz/Vowels/*/*/*.wav'))

# Get file paths for Words category
words = np.array(gl.glob('/content/drive/MyDrive/pc-gita/PC-GITA_per_task_44100Hz/Words/*/*/*/*.wav'))

# Get file paths for Sentences category
sentences = np.array(gl.glob('/content/drive/MyDrive/pc-gita/PC-GITA_per_task_44100Hz/sentences/*/*/*/*.wav'))

# Get file paths for Monologues category
monologues = np.array(gl.glob('/content/drive/MyDrive/pc-gita/PC-GITA_per_task_44100Hz/monologue/*/*/*.wav'))

We create a csv file with all metadata of all wav files:

set: is the group of wav file such as DDK, sentences...

* text: is the sentences, consontant-vowels or words of the wav file
* group: is PD or HC, speaker with desease or healty
* id: is the speaker id
* cnn-label: 0/1 according to group
* svm-label: -1/1 according to group
* librosa: will be filled with the corresponding feature file made with librosa
* disvoice: will be filled with the corresponding feature file made with disvoice
* kfold: the partition used to run a kfold training
* sets: test, validation and test grouping to perform cnn training
* predict, accuracy: we will take advantage of these two columns to keep note of prediction and accuracy values in the cnn test phase in order to perform the confusion matrix, the precision and recall and the roc curve graphs

In [None]:
# Generate a CSV file with information from file paths

# Create an empty list to store the data
o = []

# Define the header for the CSV file
header = ['allowed', 'set', 'text', 'group', 'id_temp', 'id', 'path', 'cnn-label', 'svm-label', 'librosa', 'disvoice', 'kfold', 'sets', 'predict', 'accuracy']

# Append the header to the data list
o.append(header)

# Process file paths for DDK analysis category
for idx in range(len(ddk)):
    # Split the file path using "/" as the separator
    row = ddk[idx].split("/")
    # Append the processed information to the data list
    o.append(['', 'ddk', row[7], row[9], '', row[10], ddk[idx], '', '', '', '', '', '', '', ''])

# Process file paths for Vowels category
for idx in range(len(vowels)):
    # Split the file path using "/" as the separator
    row = vowels[idx].split("/")
    # Append the processed information to the data list
    o.append(['', 'vowels', row[8], row[5], '', row[9], vowels[idx], '', '', '', '', '', '', '', ''])

# Process file paths for Words category
for idx in range(len(words)):
    # Split the file path using "/" as the separator
    row = words[idx].split("/")
    # Append the processed information to the data list
    o.append(['', 'words', row[9], row[8], '', row[10], words[idx], '', '', '', '', '', '', '', ''])

# Process file paths for Sentences category
for idx in range(len(sentences)):
    # Split the file path using "/" as the separator
    row = sentences[idx].split("/")
    # Append the processed information to the data list
    o.append(['', 'sentences', row[7], row[9], '', row[10], sentences[idx], '', '', '', '', '', '', '', ''])

# Process file paths for Sentences category
for idx in range(len(monologues)):
    # Split the file path using "/" as the separator
    row = monologues[idx].split("/")
    # Append the processed information to the data list
    o.append(['', 'monologues', row[6], row[8], '', row[9], monologues[idx], '', '', '', '', '', '', '', ''])

# Create a DataFrame from the data list, excluding the header row
df = pd.DataFrame(o[1:], columns=o[0])

# Modify values in certain columns of the DataFrame
df['group'] = df['path'].apply(lambda x: 'pd' if ('PD' in x or 'Patologica' in x or 'pd' in x or 'Patologicas' in x) else 'hc')
df['id'] = df['id'].apply(lambda x: x[8:14] if ('AC' in x) else x[8:13])
df['cnn-label'] = df['group'].apply(lambda x: 1 if 'pd' in x else 0) # 1 for desease, -1 for control
df['svm-label'] = df['group'].apply(lambda x: 1 if 'pd' in x else -1)  # 1 for desease, 0 for control

# allow the only ids listed in metadata file
allowed_ids = list(metadata['RECODING ORIGINAL NAME'])
df['allowed'] = df['id'].apply(lambda x: 'yes' if 'AVPEPUDE' + x in allowed_ids else 'no')

#set a progressive id with no missing ids numeration
df['id_temp'] = df['id'].apply(lambda x :
                                   metadata.loc[metadata.loc[metadata['RECODING ORIGINAL NAME'] == 'AVPEPUDE' + x].index.item(), 'ID']
                                   if set(df.loc[df['id']==x]['allowed']) == {'yes'} else '')

# adding k-fold labels, we have 100 spekers into 2 groups PD and HC. To run a k-fold training, we make 10 groups each one of them not having a spekers.
for idx in range(len(df)):
  if df.loc[idx]['allowed'] == 'no':
    continue
  ordinal = int(df.loc[idx]['id_temp'][-4:])
  df.at[idx, 'kfold'] =  str(ordinal % 10)

df = df.drop('id_temp', axis=1)

# Save the DataFrame to a CSV file
header = ['allowed', 'set', 'text', 'group', 'id', 'path', 'cnn-label', 'svm-label', 'librosa', 'disvoice', 'kfold', 'sets', 'predict', 'accuracy']
df.to_csv('/content/drive/MyDrive/pc-gita/wav.csv', sep=",", index=False, header=header)
df.head()


# Feature extraction

We will make advantage of two different sets of features, those computed by librosa libary, which we will use to train a neural network, and those computed by disvoice, which we will use to train a SVM

##Librosa

This script extracts audio features using the librosa library and saves them as files. Here's a breakdown of the steps:

1. The variable `features_path` is set to the directory path where the extracted features will be stored.
2. The directory for storing the features is created using `os.makedirs()` if it doesn't already exist.
3. The script iterates over each file path in the 'path' column of the DataFrame.
4. The file name is extracted from the file path using `os.path.basename()`, and the extension is replaced with '.feat'.
5. The index of the current file path is retrieved from the DataFrame.
6. The 'librosa' column of the DataFrame is updated with the path to the extracted features file.
7. A message indicating the file being processed is printed.
8. If the file already exists, the script skips further processing for that file.
9. The audio file is loaded using `librosa.load()` with a sampling rate of 44100.
10. The audio is resampled to 16kHz and normalized.
11. The mel spectrogram is computed using `librosa.feature.melspectrogram()`, and the MFCC features are computed from the mel spectrogram using `librosa.feature.mfcc()`.
12. A temporary DataFrame is created from the extracted features.
13. The features are saved as a CSV file with `df_temp.to_csv()`.
14. The updated DataFrame is saved to the original CSV file.

Overall, the script processes each audio file, extracts features, and saves them as separate files in the specified directory.

In [None]:
# Extract audio features using librosa and save them as files

# Define the path to store the extracted features
features_path = '/content/drive/MyDrive/pc-gita/features/librosa/'

# Create the directory for storing the features if it doesn't exist
os.makedirs(features_path, exist_ok=True)

# Iterate over each file path in the 'path' column of the DataFrame
for file_path in df['path']:

    # Extract the file name from the file path and replace the extension with '.feat'
    file_name = os.path.basename(file_path)
    file_name = file_name.replace('.wav','.feat')

    # Get the index of the current file path in the DataFrame
    idx = df.index.get_loc(df[df['path'] == file_path].index[0])

    # Update the 'librosa' column of the DataFrame with the path of the extracted features
    #df['librosa'][idx] = file_name
    #df.set_value(idx, 'librosa', filename)
    df.loc[idx, 'librosa'] = features_path + file_name

    # Skip processing if the file already exists
    if os.path.exists(features_path + file_name):
        print('skipping: ' +  file_name)
        continue
    else:
        print("making: " +  file_name)

    # Load the audio file using librosa
    y, fs = librosa.load(file_path, sr=44100)

    # Resample the audio to 16kHz and normalize the amplitude
    y16 = librosa.resample(y, orig_sr=44100, target_sr=16000)
    y16 = y16 / np.max(np.abs(y16))

    # Compute the mel spectrogram and MFCC features
    S = librosa.feature.melspectrogram(y=y, sr=16000, n_fft=1024, hop_length=160, win_length=400, n_mels=80)
    feat = librosa.feature.mfcc(S=librosa.power_to_db(S), n_mfcc=13)

    # Create a temporary DataFrame from the extracted features
    df_temp = pd.DataFrame(data=np.around(feat, 3), index=None)

    # Save the features as a CSV file
    df_temp.to_csv(features_path + file_name, index=False, header=False)

# Save the updated DataFrame to the original CSV file
df.to_csv('/content/drive/MyDrive/pc-gita/wav.csv', sep=",", index=False, header=df.columns)


In [None]:
df= pd.read_csv('/content/drive/MyDrive/pc-gita/wav.csv')
df.head()

##Disvoice

We firslty have to install praat, then the librosa set of libraries. All installations are temporary and avaible untill the end of the session. (link al paper o titolo)

Installing praat

In [None]:
# !apt-get install praat --fix-missing
!apt-get install praat --fix-missing

Installing disvoice

In [None]:
!pip install disvoice

Let's find the location of articulation.py which is the script which generate the features

In [None]:
def find_files(filename, search_path):
   result = []

# Walking top-down from the root
   for root, dir, files in os.walk(search_path):
      if filename in files:
         result.append(os.path.join(root, filename))
   return result

print(find_files("articulation.py", "../../"))

We need to execute the script extract_features.py to extract the features from the directory where the script articulation.py is located

In [None]:
%cd /usr/local/lib/python3.10/dist-packages/disvoice/articulation/

In [None]:
!cp /content/drive/MyDrive/pc-gita/extract_features.py /usr/local/lib/python3.10/dist-packages/disvoice/articulation/
!ls -al

This script performs the following tasks:

1. Imports the required libraries: numpy, os, and pandas.
2. Imports the `Articulation` class from an external module.
3. Defines the paths for storing the disvoice features and the existing librosa features.
4. Creates the directory for storing the disvoice features using `os.makedirs()`.
5. Reads the DataFrame containing information about the audio files from a CSV file.
6. Initializes an instance of the `Articulation` class for feature extraction.
7. Iterates over each file path in the 'path' column of the DataFrame.
8. Extracts the file name from the file path using `os.path.basename()`.
9. Retrieves the index of the current file path in the DataFrame.
10. Updates the 'disvoice' column of the DataFrame with the path to the extracted disvoice features file.
11. Checks if the disvoice features file already exists. If so, skips further processing for that file.
12. Calls the `extract_features_file()` method of the `Articulation` class to extract disvoice features from the file.
13. Saves the disvoice features as a numpy array using `np.save()`.
14. Saves the updated DataFrame to the original CSV file.

Overall, the script processes each audio file, extracts disvoice features using the `Articulation` class, and saves them as separate numpy files. The DataFrame is updated with the paths to the disvoice features files.

In [None]:
!cat /content/drive/MyDrive/pc-gita/extract_features.py

In [None]:
!python extract_features.py

In [None]:
df= pd.read_csv('/content/drive/MyDrive/pc-gita/wav.csv')
df.head()

Unnamed: 0,allowed,set,text,group,id,path,cnn-label,svm-label,librosa,disvoice,kfold,sets,predict,accuracy
0,yes,ddk,pakata,pd,A0031,/content/drive/MyDrive/pc-gita/PC-GITA_per_tas...,1,1,/content/drive/MyDrive/pc-gita/features/libros...,,1.0,,,
1,yes,ddk,pakata,pd,A0013,/content/drive/MyDrive/pc-gita/PC-GITA_per_tas...,1,1,/content/drive/MyDrive/pc-gita/features/libros...,,3.0,,,
2,yes,ddk,pakata,pd,A0032,/content/drive/MyDrive/pc-gita/PC-GITA_per_tas...,1,1,/content/drive/MyDrive/pc-gita/features/libros...,,2.0,,,
3,yes,ddk,pakata,pd,A0054,/content/drive/MyDrive/pc-gita/PC-GITA_per_tas...,1,1,/content/drive/MyDrive/pc-gita/features/libros...,,9.0,,,
4,yes,ddk,pakata,pd,A0008,/content/drive/MyDrive/pc-gita/PC-GITA_per_tas...,1,1,/content/drive/MyDrive/pc-gita/features/libros...,,8.0,,,


# SVM

Making train and lab lists for each of 10 groups

In [None]:
df = pd.read_csv('/content/drive/MyDrive/pc-gita/wav.csv')

groups = []

for idx in range(10):

    train_list = df.loc[(df['set'] != 'vowels') & (df['kfold'] != idx)]['disvoice'].values.tolist()
    train_lab = df.loc[(df['set'] != 'vowels') & (df['kfold'] != idx)]['svm-label'].values.tolist()

    test_list = df.loc[(df['set'] != 'vowels') & (df['kfold'] == idx)]['disvoice'].values.tolist()
    test_lab = df.loc[(df['set'] != 'vowels') & (df['kfold'] == idx)]['svm-label'].values.tolist()

    #################################################################################################

    print("\nMaking gruop: " + str(idx))
    print("Train list files: " +  str(len(train_list)) + " - Train list lab: " +  str(len(train_lab)))
    print("Test list files: " +  str(len(test_list)) + " - Test list files: " +  str(len(test_lab)))

    X_train = []

    i = 0
    for idx in range(len(train_list)):
        print("File " + str(i) + " out of " + str(len(train_list)-1), end='\r', flush=True)
        X_train.append(np.nan_to_num(np.hstack(np.load(train_list[idx]).T), nan=0.0))
        i+=1
    print("")
    y_train = train_lab

    X_test = []
    i = 0
    for idx in range(len(test_list)):
        print("File " + str(i) + " out of " + str(len(test_list)-1), end='\r', flush=True)
        X_test.append(np.nan_to_num(np.hstack(np.load(test_list[idx]).T), nan=0.0))
        i+=1
    print("")
    y_test = test_lab

    #################################################################################################

    groups.append({'X_train': X_train,
                    'y_train': y_train,
                    'X_test': X_test,
                    'y_test': y_test})

In [None]:
# provare con meno dati e portare l'esecuzione fino alla fine

kernels = ['linear', 'poly', 'rbf', 'sigmoid', 'precomputed']
kernel = kernels[0]

output = []
header = ['group', 'kernel', 'C', 'gamma', 'accuracy_train', 'accuracy_test']

accuracy_best = 0
factors = [0.001, 0.01, 0.1, 1, 10, 100, 1000]

for C in factors:
    for gamma in factors:

        accuracies_test_list = []
        accuracies_train_list = []

        for idx in range(10):

            print(".", end='.', flush=True)

            X_train = groups[idx]['X_train']
            y_train = groups[idx]['y_train']
            X_test = groups[idx]['X_test']
            y_test = groups[idx]['y_test']

            ################################################################################################
            svm = make_pipeline(StandardScaler(), SVC(C = C, gamma = gamma, kernel=kernel, probability=False,
                                                      random_state=42, verbose=False, max_iter = -1))
            svm.fit(X_train, y_train)

            with open('model_' + str(idx) + "_kernel_"+ kernel + "_" + "C_" + str(C) + "_gamma_" + str(gamma) + '.pkl','wb') as f:
                pickle.dump(svm, f)

            #################################################################################################

            print("")

            print("Starting predict for group: " + str(idx))

            y_pred = svm.predict(X_train)
            accuracy_train = accuracy_score(y_train, y_pred)
            print("Train accuracy:" + str(accuracy_train))

            y_pred = svm.predict(X_test)
            accuracy_test = accuracy_score(y_test, y_pred)
            print("Test accuracy:" + str(accuracy_test))

            output.append([idx, kernel, C, gamma, accuracy_train, accuracy_test])

            accuracies_train_list.append(accuracy_train)
            accuracies_test_list.append(accuracy_test)

            print(str(idx) + ": Accuracy train: " + str(accuracy_train) + " Accuracy test: " + str(accuracy_test))

            df = pd.DataFrame(data = np.array(output), index = None)
            df.to_csv('svm_predict.csv',index=False, header = header)

        accuracy_test_mean = sum(accuracies_test_list)/len(accuracies_test_list)
        accuracy_train_mean = sum(accuracies_train_list)/len(accuracies_train_list)

        if accuracy_test_mean > accuracy_best:
            print('Accuracy train: ' + str(accuracy_train_mean) +   ', Accuracy test mean:' + str(accuracy_test_mean)+ " for C_" + str(C) + "_gamma_" + str(gamma) + ', last accuracy: ' + str(accuracy_best))
            accuracy_best = accuracy_test_mean

        print('Accuracy train mean:' + str(accuracy_train_mean)+ ', Accuracy test mean:' + str(accuracy_test_mean) + " for C_" + str(C) + "_gamma_" + str(gamma) + ', last accuracy: ' + str(accuracy_best))

        output.append([10, kernel, C, gamma, accuracy_train_mean, accuracy_test_mean])
        df = pd.DataFrame(data = np.array(output), index = None)
        df.to_csv('svm_predict.csv',index=False, header = header)




#CNN

The model

In [None]:
def conv2d_feature(seq_len, feat_dim, n_channels):
    i =  keras.layers.Input(shape=(seq_len,feat_dim, n_channels))
    h2 = keras.layers.Conv2D(64, 5, strides=2, padding='same', use_bias=True)(i)
    h2 = keras.layers.BatchNormalization()(h2)
    h2 = keras.layers.Activation('relu')(h2)
    h2 = keras.layers.Conv2D(64, 5, strides=1, padding='same', use_bias=True)(h2)
    h2 = keras.layers.BatchNormalization()(h2)
    h2 = keras.layers.Activation('relu')(h2)
    h2 = keras.layers.SpatialDropout2D(0.2)(h2, training=False)
    h2 = keras.layers.Conv2D(64, 5, strides=2, padding='same', use_bias=True)(h2)
    h2 = keras.layers.BatchNormalization()(h2)
    h2 = keras.layers.Activation('relu')(h2)
    h2 = keras.layers.SpatialDropout2D(0.2)(h2, training=False)
    h2 = keras.layers.Conv2D(64, 3, strides=1, padding='same', use_bias=True, activation='relu')(h2)
    h3 = keras.layers.GlobalMaxPooling2D()(h2)

    o =  keras.layers.Dense(1, 'sigmoid')(h3)

    model = keras.models.Model(i, o)

    return model

The following class will be emploied by tensorflow to load the data by batches

In [None]:
class DataGenFeat(keras.utils.Sequence):
    'Generates data for Keras'
    def __init__(self, list_file, list_lab, batch_size=32, n_channels=1,
                 shuffle=True, features_number = 0):
        'Initialization'

        # self.dim = dim
        self.batch_size = batch_size
        self.list_lab = list_lab
        self.list_file = list_file
        self.n_channels = n_channels
        self.shuffle = shuffle
        self.features_number = features_number

        self.indexes = np.arange(len(self.list_file))
        self.on_epoch_end()

# aggiungere normalizzazione

    def __len__(self):
        'Denotes the number of batches per epoch'
        if self.batch_size == 0:
            return 0
        else:
            return int(np.floor(len(self.list_file) / self.batch_size))

    def __getitem__(self, index):
        'Generate one batch of data'
        # Generate indexes of the batch
        indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]

        # Find list of IDs
        list_files_temp = [self.list_file[k] for k in indexes]
        list_labels_temp = [self.list_lab[k] for k in indexes]

	# Generate data
        X, Y = self.__data_generation(list_files_temp, list_labels_temp)

        return X, Y

    def on_epoch_end(self):
        'Updates indexes after each epoch'
        if self.shuffle == True:
            np.random.shuffle(self.indexes)

    def __data_generation(self, list_files_temp, list_labels_temp):
        'Generates data containing batch_size samples' # X : (n_samples, *dim, n_channels)
        # Initialization
        x_tmp = []
        max_len = 0
        features_number = 0

        # Generate data
        for i_file in list_files_temp:
            # Store sample
            feat_tmp = pd.read_csv(i_file, header=None)

            sig_feat = feat_tmp.to_numpy()

            max_len = np.max([max_len, sig_feat.shape[1]])

            x_tmp.append(sig_feat.T)

        X = np.zeros((len(list_files_temp), max_len, self.features_number, self.n_channels))
        for idx in range(len(list_files_temp)):
            X[idx, 0:x_tmp[idx].shape[0], :, 0] = x_tmp[idx]
            X[idx, 0:x_tmp[idx].shape[0], :, 1] = librosa.feature.delta(x_tmp[idx]) # self.D(x_temp[idx])
            X[idx, 0:x_tmp[idx].shape[0], :, 2] = librosa.feature.delta(x_tmp[idx], order=2) # self.D(self.D(x_temp[idx]))


        Y = np.hstack(list_labels_temp)

        return X, Y


Training

We now set the column of ours csv file to grouping the speakers into train, validation and test sets.

In [None]:
df.to_csv('/content/drive/MyDrive/pc-gita/wav.csv', sep=",", index=False, header = df.columns)
df.head()

In [None]:
train_ids = ['A0005','A0006','A0007','A0008','A0009','A0010','A0011','A0013','A0014','A0015',
             'A0017','A0021','A0022','A0023','A0025','A0026','A0029','A0030','A0031','A0032',
             'A0034','A0039','A0041','A0042','A0043','A0045','A0046','A0049','A0050','A0051',
             'A0052','A0053','A0054','A0055','A0056','A0057','A0058','A0059',
             'AC0001','AC0003','AC0004','AC0005','AC0006','AC0007','AC0008','AC0010','AC0013',
             'AC0014','AC0015','AC0016','AC0017','AC0018','AC0019','AC0020','AC0022','AC0023',
             'AC0024','AC0025','AC0026','AC0027','AC0028','AC0031','AC0033','AC0034','AC0035',
             'AC0039','AC0040','AC0042','AC0043','AC0045','AC0046','AC0047','AC0048','AC0049',
             'AC0050','AC0051','AC0053','AC0057']

validation_ids = ['A0038','A0035','A0037','A0047','A0003','AC0052','AC0037','AC0021','AC0029','AC0011']

test_ids = ['A0024','A0027','A0016','A0048','A0020','AC0044','AC0041','AC0054','AC0030','AC0012']

df['sets'] = df['id'].apply(lambda x: 'train' if (x in train_ids) else 'valid' if (x in validation_ids) else 'test')

df.to_csv('/content/drive/MyDrive/pc-gita/wav.csv', sep=",", index=False, header = df.columns)
df.head()

Unnamed: 0,allowed,set,text,group,id,path,cnn-label,svm-label,librosa,disvoice,kfold,sets,predict,accuracy
0,yes,ddk,pakata,pd,A0031,/content/drive/MyDrive/pc-gita/PC-GITA_per_tas...,1,1,/content/drive/MyDrive/pc-gita/features/libros...,,1.0,train,,
1,yes,ddk,pakata,pd,A0013,/content/drive/MyDrive/pc-gita/PC-GITA_per_tas...,1,1,/content/drive/MyDrive/pc-gita/features/libros...,,3.0,train,,
2,yes,ddk,pakata,pd,A0032,/content/drive/MyDrive/pc-gita/PC-GITA_per_tas...,1,1,/content/drive/MyDrive/pc-gita/features/libros...,,2.0,train,,
3,yes,ddk,pakata,pd,A0054,/content/drive/MyDrive/pc-gita/PC-GITA_per_tas...,1,1,/content/drive/MyDrive/pc-gita/features/libros...,,9.0,train,,
4,yes,ddk,pakata,pd,A0008,/content/drive/MyDrive/pc-gita/PC-GITA_per_tas...,1,1,/content/drive/MyDrive/pc-gita/features/libros...,,8.0,train,,


We now extract the test, validation and test list of features, performing other filters on set if necessary

In [None]:
df = pd.read_csv('/content/drive/MyDrive/pc-gita/wav.csv')

train_list = df.loc[df['sets'] == 'train']['librosa'].values.tolist()
valid_list = df.loc[df['sets'] == 'valid']['librosa'].values.tolist()
test_list = df.loc[df['sets'] == 'test']['librosa'].values.tolist()

train_lab = df.loc[df['sets'] == 'train']['cnn-label'].values.tolist()
valid_lab = df.loc[df['sets'] == 'valid']['cnn-label'].values.tolist()
test_lab = df.loc[df['sets'] == 'test']['cnn-label'].values.tolist()

We set the now compile the model with all additional parameters

In [None]:
# model parameters
features_number = 13
n_channels = 3

# traininig parameters
training_learning_rate = 0.0001
training_factor = 0.5
training_early_stop_patience = 3
training_lr_patience = 3
training_batches_size = 64
training_workers = 8
training_use_multiprocessing = True
training_epochs = 1000
training_verbose = 1
training_shuffle = True

os.environ["CUDA_VISIBLE_DEVICES"] = '0' # 0 for GPU # core = '-1' # for CPU
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

#################################################################

train_generator = DataGenFeat(
    train_list, train_lab,
    n_channels = n_channels,
    batch_size = training_batches_size,
    shuffle = training_shuffle,
    features_number= features_number)

valid_generator = DataGenFeat(
    valid_list, valid_lab,
    n_channels = n_channels,
    batch_size = training_batches_size,
    shuffle = training_shuffle,
    features_number= features_number)

#################################################################

model = conv2d_feature(seq_len = None, feat_dim = features_number , n_channels = n_channels)

model.compile(optimizer=optimizers.Adam(learning_rate=training_learning_rate),
            loss='binary_crossentropy',
            metrics=['accuracy'])

model.summary()

lr = callbacks.ReduceLROnPlateau(monitor = 'val_accuracy', factor=training_factor, patience=training_lr_patience, verbose=training_verbose, min_lr=0.000001, mode='max')
early_stop = callbacks.EarlyStopping(monitor = 'val_accuracy', patience=training_early_stop_patience, mode = 'max', restore_best_weights=True)

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, None, 13, 3)]     0         
                                                                 
 conv2d (Conv2D)             (None, None, 7, 64)       4864      
                                                                 
 batch_normalization (Batch  (None, None, 7, 64)       256       
 Normalization)                                                  
                                                                 
 activation (Activation)     (None, None, 7, 64)       0         
                                                                 
 conv2d_1 (Conv2D)           (None, None, 7, 64)       102464    
                                                                 
 batch_normalization_1 (Bat  (None, None, 7, 64)       256       
 chNormalization)                                            

We are ready now to fit the model (and save it to a h5 file with the training data stored in history)

In [None]:
training_save_model_to = ''  + 'model.h5' # to edit
training_save_history_to = '' # to edit

#################################################################

hist = model.fit(train_generator, validation_data = valid_generator,
                batch_size = training_batches_size,
                epochs = training_epochs,
                verbose = training_verbose,
                callbacks = [lr,  early_stop],
                shuffle = training_shuffle,
                workers = training_workers,
                use_multiprocessing = training_use_multiprocessing)

#################################################################

# model.save(training_save_model_to)

#################################################################

# train_loss = hist.history['loss'][len(hist.history['loss'])-1]
# train_acc = hist.history['accuracy'][len(hist.history['accuracy'])-1]
# val_loss = hist.history['val_loss'][len(hist.history['val_loss'])-1]
# val_acc = hist.history['val_accuracy'][len(hist.history['val_accuracy'])-1]

#################################################################

# f = open(training_save_history_to + 'history.py', 'a')
# f.write('hist = ' + str(hist.history))
# f.close()

Epoch 1/1000
 3/64 [>.............................] - ETA: 6:50 - loss: 2.0068 - accuracy: 0.4948

ResourceExhaustedError: ignored

Test

In [None]:
# test parameters
eval_batches_size = 8
eval_workers = 8
eval_use_multiprocessing = True
eval_epochs = 1000
eval_verbose = 1

#################################################################
# load the model if asynchronous execution, f.i. the pretrained model
# '/content/drive/MyDrive/pretrained_model.h5' is avaible

o = []
o.append(['model', 'loss', 'accuracy'])

for idx in range(6):
    model = models.load_model('/content/drive/MyDrive/pc-gita'+str(idx)+'_model.h5')

    test_list = df.loc[(df['allowed'] == 'yes') & (df['set'] != 'vowels') & (df['kfold'] == idx)]['librosa'].values.tolist()
    test_lab = df.loc[(df['allowed'] == 'yes') & (df['set'] != 'vowels') & (df['kfold'] == idx)]['cnn-label'].values.tolist()

    #################################################################

    if (len(test_list) < eval_batches_size):
        eval_batches_size = len(test_list)

    test_generator = DataGenFeat(
        test_list, test_lab,
        n_channels = n_channels,
        batch_size = eval_batches_size, #config.batch_size,
        shuffle = False,
        features_number= features_number)

    try:
        hist = model.evaluate(  test_generator, #x = X_eval, y = Y_eval,
                                batch_size = eval_batches_size,
                                verbose = eval_verbose,
                                workers = eval_workers,
                                use_multiprocessing = eval_use_multiprocessing)

        o.append([idx, np.around(hist[0], 3), np.around(hist[1], 3)])
    except:
        print("An exception occurred")

print(o)


# Confusion matrix, precision and recall, ROC curve

Assume we have all test data (prediction and accuracy) collected into the dataframe wav, by running the script below we can compute the confusion matrixa, the precision and recall and the roc curve (skip the following script if you have already collect the data)

In [None]:
import random

df = pd.read_csv('/content/drive/MyDrive/pc-gita/wav.csv')

#unbalnced rando numbers
choices = [0, 1]
weights = [0.2, 0.8]  # Higher probability for 1

df['predict'] = df['predict'].apply(lambda x: random.choices(choices, weights)[0])
df['accuracy'] = df['accuracy'].apply(lambda x: random.random())

df.head()

Load the data from df to perform computation

In [None]:
label = np.array(df['cnn-label'], dtype=int)
predict = np.array(df['predict'], dtype=float)
accuracy_score = np.array(df['accuracy'], dtype=float)

Confusion matrix

In [None]:
cm = confusion_matrix(y_true = label, y_pred = predict)
cm_display = ConfusionMatrixDisplay(cm).plot()
#plt.savefig('./cm.png')


Precision and Recall

In [None]:
prec, recall, _ = precision_recall_curve(y_true = label, probas_pred = accuracy_score, pos_label=None)
pr_display = PrecisionRecallDisplay(precision=prec, recall=recall).plot()
#plt.savefig('./pr.png')

ROC

In [None]:
fpr, tpr, thresholds = roc_curve(y_true = label, y_score = accuracy_score, pos_label = None)
roc_display = RocCurveDisplay(fpr=fpr, tpr=tpr).plot()
#plt.savefig('./roc.png')