<a href="https://colab.research.google.com/github/ageroul/music_emotion_recognition_cnn/blob/main/music_emotion_recognition_cnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Code for classification experiments (A, B,and part of C) with Deep Neural Networks
as described in paper wit title "Emotion recognition in music using deep neural networks"



# import libraries and packages installation


In [None]:
%matplotlib inline

In [None]:
!pip install torch-lr-finder 

In [None]:
from __future__ import print_function 
from __future__ import division
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
from  torch.utils.data import WeightedRandomSampler
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report
from sklearn.metrics  import confusion_matrix
import time
import os
import copy
import pandas as pd
import PIL.Image as Image
from torch.optim import lr_scheduler
from torch.optim.lr_scheduler import ReduceLROnPlateau 
try:            
  from torch_lr_finder import torch_lr_finder
except ImportError:
  import sys 
  sys.path.insert(0, '..')
  from torch_lr_finder import LRFinder    

print("PyTorch Version: ",torch.__version__)
print("Torchvision Version: ",torchvision.__version__)

PyTorch Version:  1.8.1+cu101
Torchvision Version:  0.9.1+cu101


# Input parameters






These are the basic parameters that we adjust each time in order to:


1.   **data_dir** -  Load the appropriate set (big-set or 360-set) 

1.   **model_name** - Select the architecture to run the classification task

1.   **workers** - Select the appropriate number of workers of the dataloader which means the number of processes running in parallel to create the batches. In our experiments our choice was 2 workers as the overhead on Colab's GPU was prohibitive for using more
2.   **num_classes** - The number of classes where the default is 3 for those tasks that have three classes (valence, energy, tension) and 5 for the emotion classification (anger, fear, happy, sad, tender)


5.   **batch_size** - the number of samples produced in each batch.
32 was the maximum allowed value for the algorithms to work without any problems due to Colab's GPU memory limits


6.   **num_epochs** - All trainings were run for 20 epochs for consistency of the experiments


7.   **feature_extract** - Boolean selection for the type of Transfer Learning to be used: If TRUE, only the weights of the classifier are updated, if FALSE it is used to fine-tune pre-trained weights of all network layers


The CNN architectures used in the experiments are listed below:

*   ResNeXt-101-32x8d
*   Alexnet


*   VGG16_bn

*   Squeezenet 1.0
*   Densenet 121


*   Inception v3

In [None]:

data_dir = "/content/drive/MyDrive/styleGAN2-train&val"

# CNN architectures[resnext, alexnet, vgg, squeezenet, densenet, inception]
model_name = "alexnet"
# 2 workers σε όλα τα πειράματα με το Colab
workers = 2
# Task dependent number: "emotions" has  5 classes, all ohers 3 classes
num_classes = 3
# fixed batch size (32)
batch_size = 32
# all models were trained for 20 epochs
num_epochs = 20
# boolean for transfer learning type
feature_extract = False

# Helper functions






The train_model function handles the training and validation phases and has the following inputs:


*   **model** - The selected model for training

*   **dataloaders** - the class of torch.utils.data that helps to organize and shuffle the sample mini-batches

*   **criterion** - the cost function (we have chosen CrossEntropyLoss)

*   **optimizer** - The optimization function (we have chosen SGD with momentum)
*   **scheduler** - Function that reduces the learning rate when a selected metric stops improving, we used ReduceLROnPlateau


*   **is_inception** - Boolean for the Inception v3 model as its architecture has two outputs and the losses take both into account




In [None]:
def train_model(model, dataloaders, criterion, optimizer, scheduler, num_epochs=25, is_inception=False):
    since = time.time()

    val_acc_history = []
    epoch_loss_tr = []
    epoch_acc_tr = []
    epoch_loss_val = [] 
    epoch_acc_val = []


    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)
       
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  
            else:
                model.eval()   

            running_loss = 0.0
            running_corrects = 0

            predicted_labels = []
            true_labels = []
           

            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)
                
                # resetting the gradients of the current batch
                optimizer.zero_grad()

                # forward propagation during training
                with torch.set_grad_enabled(phase == 'train'):
                    # Especially for Inception whice because of its 2 outputs
                    # in the train phase we add them up in order to calculate the error
                    if is_inception and phase == 'train':
# From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                        outputs, aux_outputs = model(inputs)
                        loss1 = criterion(outputs, labels)
                        loss2 = criterion(aux_outputs, labels)
                        loss = loss1 + 0.4*loss2
                    #   while in the test phase we only take into account the final output.
                    else:
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)
                        true_labels.extend(labels.cpu().numpy()) 


                    _, preds = torch.max(outputs, 1)
                    predicted_labels.extend(preds.cpu().numpy()) 

                    
                    if phase == 'train':
                    # Gradient calculation                     
                        loss.backward()
                    # backpropagation
                        optimizer.step()
              

                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)


            if phase == 'train':
                  epoch_loss = running_loss / len(dataloaders[phase].dataset)
                  epoch_loss_tr.append(epoch_loss)
                  epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)
                  epoch_acc_tr.append(epoch_acc)
                  print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
                  
            else:
                  
                  epoch_loss = running_loss / len(dataloaders[phase].dataset)
                  epoch_loss_val.append(epoch_loss)
                  epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)
                  epoch_acc_val.append(epoch_acc)
                  # scheduler operation according to the values of epoch loss
                  scheduler.step(epoch_loss) 
                  print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
                  print('Learning rate at this epoch is:', optimizer_ft.param_groups[0]['lr'])

            # keeping the best model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
                # classification report from scikit learn
                print(classification_report(true_labels, predicted_labels, target_names=class_names))

            if phase == 'val':
                val_acc_history.append(epoch_acc)

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))
    
    # loading best model
    model.load_state_dict(best_model_wts)
    return model, val_acc_history, epoch_loss_tr, epoch_acc_tr,epoch_loss_val, \
           epoch_acc_val, predicted_labels, true_labels

The **set_parameter_requires_grad** function is called when we need to "freeze" some layers of the network. That is, we stop calculating gradients with param.requires_grad = False

In [None]:
def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

# Initialization of the selected Networks



The initialize_model function takes as arguments the name of the model, the number of classes, the type of transfer learning and whether the selected model will be pre-trained. Each architecture requires different handling and configuration of its output(s). (Experiment A)


In [None]:
def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):

    model_ft = None
    input_size = 0

    if model_name == "resnext":
        """ ResNext101_32x8d
        """
        model_ft = models.resnext101_32x8d(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    elif model_name == "alexnet":
        """ Alexnet
        """
        model_ft = models.alexnet(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224

    elif model_name == "vgg":
        """ VGG16_bn
        """
        model_ft = models.vgg16_bn(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224

    elif model_name == "squeezenet":
        """ Squeezenet
        """
        model_ft = models.squeezenet1_0(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        model_ft.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))
        model_ft.num_classes = num_classes
        input_size = 224

    elif model_name == "densenet":
        """ Densenet
        """
        model_ft = models.densenet121(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier.in_features
        model_ft.classifier = nn.Linear(num_ftrs, num_classes) 
        input_size = 224

    elif model_name == "inception":
        """ Inception v3 
        Be careful, expects (299,299) sized images and has auxiliary output
        """
        model_ft = models.inception_v3(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        # Handle the auxilary net
        num_ftrs = model_ft.AuxLogits.fc.in_features
        model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
        # Handle the primary net
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs,num_classes)
        input_size = 299
    else:
        print("Invalid model name, exiting...")
        exit()
    
    return model_ft, input_size

# Initialization of the model
model_ft, input_size = initialize_model(model_name, num_classes, feature_extract, use_pretrained=True)


print(model_ft)



For the models we have pre-trained in Big-set and 360-set for Energy, Valence. (Experiment B)

In [None]:
def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):

    model_ft = None
    input_size = 0

    if model_name == "resnext":
        """ ResNext101_32x8d
        """
        model_ft = models.resnext101_32x8d()
        # pre-trained in big-set & 360-set (Energy,Valence)
        model_ft = torch.load('path of the appropriate .pt file')
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    elif model_name == "alexnet":
        """ Alexnet
        """
        model_ft = models.alexnet()
        # pre-trained in big-set & 360-set (Energy,Valence)
        model_ft = torch.load('path of the appropriate .pt file')
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224

    elif model_name == "vgg":
        """ VGG16_bn
        """
        model_ft = models.vgg16_bn()
        # pre-trained in big-set & 360-set (Energy,Valence)
        model_ft = torch.load('path of the appropriate .pt file')
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224

    elif model_name == "squeezenet":
        """ Squeezenet
        """
        model_ft = models.squeezenet1_0()
        # pre-trained in big-set & 360-set (Energy,Valence)
        model_ft = torch.load('path of the appropriate .pt file')
        set_parameter_requires_grad(model_ft, feature_extract)
        model_ft.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))
        model_ft.num_classes = num_classes
        input_size = 224

    elif model_name == "densenet":
        """ Densenet
        """
        model_ft = models.densenet121()
        # pre-trained in big-set & 360-set (Energy,Valence)
        model_ft = torch.load('path of the appropriate .pt file')

        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier.in_features
        model_ft.classifier = nn.Linear(num_ftrs, num_classes) 
        input_size = 224

    elif model_name == "inception":
        """ Inception v3 
        Be careful, expects (299,299) sized images and has auxiliary output
        """
        model_ft = models.inception_v3()
        # pre-trained in big-set & 360-set (Energy,Valence)
        model_ft = torch.load('path of the appropriate .pt file')

        set_parameter_requires_grad(model_ft, feature_extract)
        # Handle the auxilary net
        num_ftrs = model_ft.AuxLogits.fc.in_features
        model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
        # Handle the primary net
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs,num_classes)
        input_size = 299
    else:
        print("Invalid model name, exiting...")
        exit()
    
    return model_ft, input_size

# Initialization of the model
model_ft, input_size = initialize_model(model_name, num_classes, feature_extract, use_pretrained=True)


print(model_ft)



---



---



# Data augmentation, class imbalance handling








Using the torchvision.transforms library we proceed to some transformations of the images (the mel-spectrograms) that lead to an augmentation of the training data to avoid overfitting especially when there is a small number of samples to be trained. On the validation data we only perform normalization:  

*   **RandomResizedCrop** - crop the image to random size (from 0.08-1.0) and aspect ratio (3/4 to 4/3
*   **RandomHorizontalFlip** - flips the image horizontally with a predefined probability (p=0.5)


*   **Resize** - Changes the dimensions of the input image to the desired dimensions (input_size)


*   **ToTensor** - Before normalization we convert the image to tensor
*    **Normalize** - Each tensor is normalized around a specified set of mean and normal distribution. (Avoiding the exploding gradient problem if the values stray to sizes beyond the 0-1 limit)

WeightedRandomSampler is used to ensure that each batch of data contains samples of all classes in proportion.


In [None]:
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(input_size),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

print("Initializing Datasets and Dataloaders...")

# Create train - val sets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}


# The class_to_idx method returns the class names in numbers (id's) e.g. 0,1,2,3 
image_datasets['train'].class_to_idx
# idx2class performs the reverse process from class_to_idx
idx2class = {v: k for k, v in image_datasets['train'].class_to_idx.items()}

# The function for getting the distribution of classes in the dataset
def get_class_distribution_train(dataset_obj):
  count_dict = {k:0 for k,v in dataset_obj.class_to_idx.items()}
  for element in dataset_obj: 
    y_lbl = element[1] 
    y_lbl = idx2class[y_lbl]      
    count_dict[y_lbl] += 1
  return count_dict
print("Distribution of classes in train set: \n", get_class_distribution_train(image_datasets['train']))

target_list = torch.tensor(image_datasets['train'].targets)
class_count = [i for i in get_class_distribution_train(image_datasets['train']).values()]

# class weights calculation
class_weights = 1./torch.tensor(class_count, dtype=torch.float)
print(class_weights)

# Weights per class assigned to all samples 
class_weights_all = class_weights[target_list]
print(class_weights_all)

# Passing weights and number of samples to WeightedRandomSampler
sampler_train = WeightedRandomSampler(class_weights_all, \
                num_samples = len(class_weights_all), replacement=True)

# Passing the Sampler to the train set's dataloader
dataloader_train = torch.utils.data.DataLoader(image_datasets['train'], batch_size=batch_size, \
                   shuffle=False, sampler=sampler_train, num_workers=workers)
dataloader_val = torch.utils.data.DataLoader(image_datasets['val'], batch_size=batch_size, \
                 shuffle=True, num_workers=workers)


dataloaders_dict = {'train': dataloader_train, 'val':dataloader_val}

class_names = image_datasets['train'].classes
print(image_datasets['val'].class_to_idx)
print(image_datasets['val'].classes)
print(len(image_datasets['val']))

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [None]:
# Saving the weights for future use in the next classifications of the corresponding set
#torch.save(class_weights_all, 'weights to save as pt files')

In [None]:
# loading class weights
class_weights_all_saved = torch.load('path to the appropriate weight file (pt)')
class_weights_all_saved

Since the weights of the classes have been computed in the 1st classification task and are stored they do not need to be recalculated , therefore:

In [None]:
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(input_size),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

print("Initializing Datasets and Dataloaders...")

image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
sampler_train = WeightedRandomSampler(class_weights_all_saved, \
                num_samples = len(class_weights_all_saved), replacement=True)

dataloader_train = torch.utils.data.DataLoader(image_datasets['train'], batch_size=batch_size, shuffle=False, \
                   sampler=sampler_train, num_workers=workers)
dataloader_val = torch.utils.data.DataLoader(image_datasets['val'], batch_size=batch_size, shuffle=True,\
                 num_workers=workers)
dataloaders_dict = {'train': dataloader_train, 'val':dataloader_val}
class_names = image_datasets['train'].classes
print(image_datasets['val'].class_to_idx)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Sending to GPU and Optimization


The model is driven to the GPU and depending on the type of training we have already selected, the corresponding parameters are updated. So, if the selection is to fine-tune the model the list of parameters to be printed will be long (depending on the size of the model). If the option is to update only the classifier weights then the list of parameters will be short, showing only the classifier layer parameters

In [None]:
 # Send the model to GPU
model_ft = model_ft.to(device)


params_to_update = model_ft.parameters()
print("Params to learn:")
if feature_extract:
    params_to_update = []
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            params_to_update.append(param)
            print("\t",name)
else:
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            print("\t",name)


The optimization algorithm we used in all experiments is the Stochastic Gradient Descent with Momentum (SGD). The optimization of the learning rate is achieved with the help of the LRFinder tool where exponentially and in 100 iterations it finds the ideal training rate starting from a value of 0.001 to 1.0. Also, the ReduceOnPlateau scheduler lowers the rate if the loss value does not drop every 3 epochs.

In [None]:
optimizer_ft = optim.SGD(params_to_update, lr=0.001, momentum=0.9)
scheduler = ReduceLROnPlateau(optimizer_ft, patience=3,  mode='min')
##### learning rate finder: https://github.com/davidtvs/pytorch-lr-finder
lr_finder = LRFinder(model_ft, optimizer_ft, criterion = nn.CrossEntropyLoss(), device="cuda") 
lr_finder.range_test(dataloaders_dict['val'], end_lr=1, num_iter=100, step_mode="exp")    
lr_finder.plot()                 
print(optimizer_ft)

 Resetting the lr_finder in order to avoid that its test values (to find the ideal learning rate) are involved in the backpropagation, according to the instructions of the developers of the utility.

In [None]:
lr_finder.reset()

Applying the value of the training rate as estimated above

In [None]:
model_ft = model_ft.to(device)

params_to_update = model_ft.parameters()
print("Params to learn:")
if feature_extract:
    params_to_update = []
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            params_to_update.append(param)
            print("\t",name)
else:
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            print("\t",name)
# After finding the ideal value of the learning rate from lr_finder we place it here
optimizer_ft = optim.SGD(params_to_update, lr= 0.001, momentum=0.9)

scheduler = ReduceLROnPlateau(optimizer_ft, patience=3,  mode='min')
print(optimizer_ft)

# Visualization
With imshow function we display the images (Mel-Spectrograms) of a batch  from the training set.

In [None]:
def imshow(inp, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated

In [None]:
inputs, classes = next(iter(dataloaders_dict['train']))

In [None]:
out = torchvision.utils.make_grid(inputs)

In [None]:
imshow(out, title=[class_names[x] for x in classes])

# Starting the training of the network
We call the **train_model** function which starts the training and validation process.

In [None]:
# Loss function
criterion = nn.CrossEntropyLoss()

model_ft, val_acc_history, epoch_loss_tr, epoch_acc_tr,epoch_loss_val, epoch_acc_val,\
predicted_labels, true_labels = train_model(model_ft, dataloaders_dict, criterion, optimizer_ft,\
                                scheduler, num_epochs=num_epochs, is_inception=(model_name=="inception"))


# Model Visualization
We call the visualize_model function to visualize the validation model so that it displays 4 Mel-Spectrograms on the screen each time we call it. The spectrograms are labeled with both the actual and the prediction label.


In [None]:
def visualize_model(model, num_images=6):
    plt.rcParams['axes.grid'] = False 
    was_training = model.training
    model.eval()
    images_so_far = 0
    plt.figure()

    with torch.no_grad():
        for i, (inputs, labels,) in enumerate(dataloaders_dict['val']):
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            for j in range(inputs.shape[0]):
                images_so_far += 1
                print('ground truth: {}'.format(class_names[labels[j]]))
                ax = plt.subplot(2, num_images//2, images_so_far)
                ax.axis('equal')
                ax.set_title('predicted: {}'.format(class_names[preds[j]]))
                X = inputs.cpu().data[j]
                imshow(X)
                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return 
        model.train(mode=was_training)


In [None]:
visualize_model(model_ft, 4)

# Loss and accuracy plots
Using the package matplotlib we display the plots of the loss and accuracy of the model in relation to the training epochs.


In [None]:
plt.title("Val & Train Loss")
plt.xlabel("Training Epochs")
plt.ylabel("Loss")
plt.plot(range(1,num_epochs+1),epoch_loss_val,label="Validation loss")
plt.plot(range(1,num_epochs+1),epoch_loss_tr,label="Training loss")
plt.legend()
plt.show()

In [None]:
plt.title("Val & Train Accuracy")
plt.xlabel("Training Epochs")
plt.ylabel("Accuracy")
plt.plot(range(1,num_epochs+1), epoch_acc_val, label="Validation accuracy")
plt.plot(range(1,num_epochs+1), epoch_acc_tr, label="Training accuracy")
plt.legend()
plt.show()



---



---



# Predictions (Classification Report & Confusion Matrix)
By using the sklearn.metrics package we create the classification report and the confusion matrix.

First, with the function get_predictions we get the model predictions (y_pred) and the actual values in order to become inputs to sklearn's classification report & confusion matrix:

In [None]:
def get_predictions(model, data_loader):
  model = model.eval()
  predictions = []
  real_values = []
  with torch.no_grad():
    for inputs, labels in dataloaders_dict['val']:
      inputs = inputs.to(device)
      labels = labels.to(device)

      outputs = model(inputs)
      _, preds = torch.max(outputs, 1)
      predictions.extend(preds)
      real_values.extend(labels)
  predictions = torch.as_tensor(predictions).cpu()
  real_values = torch.as_tensor(real_values).cpu()
  return predictions, real_values

y_pred, y_test = get_predictions(model_ft, dataloaders_dict['val'].dataset)


In [None]:
def show_confusion_matrix(confusion_matrix, class_names):
  cm = confusion_matrix.copy()
  cell_counts = cm.flatten()
  cm_row_norm = cm /cm.sum(axis=1)[:, np.newaxis]

  row_percentages = ["{0:.2f}".format(value) for value in cm_row_norm.flatten()]

  cell_labels = [f"{cnt}\n{per}" for cnt, per in zip(cell_counts, row_percentages)]
  cell_labels = np.asarray(cell_labels).reshape(cm.shape[0], cm.shape[1])

  df_cm = pd.DataFrame(cm_row_norm, index=class_names, columns=class_names)

  hmap = sns.heatmap(df_cm, annot=cell_labels, fmt="", cmap="Blues")
  hmap.yaxis.set_ticklabels(hmap.yaxis.get_ticklabels(), rotation=0, ha='right')
  hmap.xaxis.set_ticklabels(hmap.xaxis.get_ticklabels(), rotation=30, ha='right')
  plt.ylabel('True labels')
  plt.xlabel('Predicted labels');

In [None]:
print(classification_report(y_test, y_pred, target_names=class_names))
cm=confusion_matrix(y_test,y_pred)
show_confusion_matrix(cm,class_names)

# Save and recall the model


The model after the necessary adjustments, for improving the quality of training, is stored so that it can be used in the future on a new, unknown set in order to classify it.


In [None]:
#torch.save(model_ft, 'path to the appropriate pt file')

In [None]:
 #test_load = torch.load('path to the appropriate pt file')

# Classifying an unknown set (test set)
Once the trained model has been saved we can recall it and make a prediction on a new set. The steps are similar as before during the validation set prediction

In [None]:
# input_size = 224 for all, except inception 299
new_input_size = 224
test_set_dir = 'path to the test set'
pretrained_model = 'path to the appropriate .pt file'
test_load = torch.load(pretrained_model)

In [None]:
data_transforms = {
    'test': transforms.Compose([
        transforms.Resize(new_input_size),
        transforms.CenterCrop(new_input_size),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

print("Initializing Datasets and Dataloaders...")

image_datasets = {'test': datasets.ImageFolder(test_set_dir, data_transforms['test'])}
dataloader_test = torch.utils.data.DataLoader(image_datasets['test'], batch_size=32, shuffle=True, num_workers=2)

class_names = image_datasets['test'].classes
print(image_datasets['test'].class_to_idx)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [None]:
def get_predictions(model, data_loader):
  model = test_load
  predictions = []
  real_values = []
  with torch.no_grad():
    for inputs, labels in dataloader_test:
      inputs = inputs.to(device)
      labels = labels.to(device)

      outputs = model(inputs)
      _, preds = torch.max(outputs, 1)
      predictions.extend(preds)
      real_values.extend(labels)
  predictions = torch.as_tensor(predictions).cpu()
  real_values = torch.as_tensor(real_values).cpu()
  return predictions, real_values

y_pred, y_test = get_predictions(test_load, dataloader_test)

In [None]:
def show_confusion_matrix(confusion_matrix, class_names):
  cm = confusion_matrix.copy()
  cell_counts = cm.flatten()
  cm_row_norm = cm /cm.sum(axis=1)[:, np.newaxis]

  row_percentages = ["{0:.2f}".format(value) for value in cm_row_norm.flatten()]

  cell_labels = [f"{cnt}\n{per}" for cnt, per in zip(cell_counts, row_percentages)]
  cell_labels = np.asarray(cell_labels).reshape(cm.shape[0], cm.shape[1])

  df_cm = pd.DataFrame(cm_row_norm, index=class_names, columns=class_names)

  hmap = sns.heatmap(df_cm, annot=cell_labels, fmt="", cmap="Blues")
  hmap.yaxis.set_ticklabels(hmap.yaxis.get_ticklabels(), rotation=0, ha='right')
  hmap.xaxis.set_ticklabels(hmap.xaxis.get_ticklabels(), rotation=30, ha='right')
  plt.ylabel('True labels')
  plt.xlabel('Predicted labels');
cm=confusion_matrix(y_test,y_pred)
show_confusion_matrix(cm,class_names)
print(classification_report(y_test, y_pred, target_names=class_names))
  

# Prediction of a single mel-spectrogram
With the following code we can, after recalling the model we trained, to see the class prediction with probability rates on a new, unknown image (mel-spectrogram)

In [None]:
# input_size = 224 for all, except inception 299
new_input_size = 224
# test_set_dir = 'path to the test set'
pretrained_model = 'path to the appropriate .pt file'
test_load = torch.load(pretrained_model)
test_load.eval()

In [None]:
data_transforms = {
    'test': transforms.Compose([
        transforms.Resize(new_input_size),
        transforms.CenterCrop(new_input_size),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
}

In [None]:
def predict_proba(model, image_path):
  img = Image.open(image_path)
  img = img.convert('RGB')
  img = data_transforms['test'](img).unsqueeze(0)
  device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  pred = model(img.to(device))
  return pred.detach().cpu().numpy().flatten()

In [None]:
pred = predict_proba(test_load, 'path to a single spectrogram image')
pred

In [None]:
def show_prediction_confidence(prediction, class_names):
  pred_df = pd.DataFrame({
    'class_names': class_names,
    'values': prediction
  })
  sns.barplot(x='values', y='class_names', data=pred_df, orient='h')
  plt.xlim([0, 1]);

In [None]:
### emotions - choose from
class_names = ['anger', 'fear', 'happy', 'sad', 'tender']
### energy or tension - choose from
# class_names = ['high', 'low', 'medium']
### valence - choose from
# class_names = ['negative', 'neutral', 'positive']

In [None]:
show_prediction_confidence(pred, class_names_emotions)