# Our ResNet50 Classifier

### Link to ReadMe:

https://gitlab.cs.vt.edu/sdeepti/facial-expression-recognition/-/tree/main/#baseline-model-vs-our-classifier

### Citations:

- https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html#create-the-optimizer
- https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
- https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
- https://pytorch.org/tutorials/beginner/basics/saveloadrun_tutorial.html
- https://scikit-learn.org/stable/modules/model_evaluation.html
- https://discuss.pytorch.org/t/discussion-why-normalise-according-to-imagenet-mean-and-std-dev-for-transfer-learning/115670
- https://pytorch.org/vision/stable/transforms.html


**Motivation**: Our goal is to try to classify the 7 standard expressions which are happy, sad, angry, afraid, disgust, surprised, neutral. We chose the ResNet architecture as it has one of the highest performances. These emotions serve as the foundation for the study of human emotional responses and have numerous applications such as Education, Medicine, Criminal Justice and Public Safety. Hence, we were motivated to see how well our model can perform. 

#### 1. Initial Set-Up

This adds all the imports that are necessary for the code to run smoothly. It involves importing `torch` which is necessary to build, train and test model, and retrieve our datasets. Additionally, `sklearn` is used for evaluation metrics to be reported.

In [1]:
# python libraries
import os
import time
import copy
import csv
from pprint import pprint

# machine learning libraries
import torch
from torchvision import transforms, models
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, f1_score

The code below sets up some of the constant parameters that will be used throughout.

The dataset being used for this experiment is the **KDEF Dataset** which can be found by clicking the following link:
https://www.kdef.se/.
The data will be split at a 80/10/10 ratio, where 80% will be used for training the model, 10% for validation, and the last 10% for testing the model.

The number of classes are also stored as a constant and if a GPU is available, we will perform training using one as it will be more efficient.



In [None]:
# path to load KDEF data: use 80-10-10 train-val-test split for main classifer
data_dir = 'data/face_images_80_10_10'
print(f'using {data_dir} as data folder')

# where to save trained model to be loaded for later experiments
model_save_path = 'FEC_resnet50_trained_' + data_dir + '.pt'

# number of classes in dataset
# afraid  angry  disgusted  happy  neutral  sad  surprised
num_classes = 7

# pytorch: set to cuda gpu device if available for faster training
# will use cpu if gpu not available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using {device} device')

# expected input size for resnet
input_size = 224

The flag below is will be used to tell model to finetune all layers (if true) or just last classification layer (if false).

*Note: We intitally set this to false to see accuracy with fintuning just the last classification layer on ResNet but only obtained an accuracy of 75%. Hence, the model was later trained again wih this flag set to True*

In [3]:
# flag for feature extracting, when true, finetune the entire model
# when false, only update reshaped layer parameters (last fully connected classification layer)
finetune_all_parmas = True

#### 2. Perform Image Pre-Processing and Data Augmentation

This performs the desired pre-processing and data augmentation steps. It splits the necessary transformations based on whether the image is used for training, validation or testing. 

The training images are resized, having arbitrary rotations added and random horizontal flips. They are also altered by varying their brightness, contrast and saturation values. They are lastly normalizd as per the ImageNet standard.

The validation and testing images are only resized and normalized.

In [4]:
# transformations to apply to images
# data augmentation and normalization for training
# just normalization for validation and testing
# https://pytorch.org/vision/stable/transforms.html
data_transforms = {
	'train': transforms.Compose([
		transforms.Resize(size=(input_size, input_size)),
		# transforms.Grayscale(), (cannot use greyscale with resnet)
		# rotation augmentation
		transforms.RandomRotation(10),
		# random flip augmentaion
		transforms.RandomHorizontalFlip(),
		# jitter brightness, contrast, saturation augmentaion
		transforms.ColorJitter(brightness=0.2, contrast=0.1, saturation=0.1, hue=0),
		# convert to tensor and normalize
		transforms.ToTensor(),
		# use ImageNet standard mean and std dev for transfer learning
		transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
	]),
	'val': transforms.Compose([
		transforms.Resize(size=(input_size, input_size)),
		transforms.ToTensor(),
		# use ImageNet standard mean and std dev for transfer learning
		transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
	]),
	'test': transforms.Compose([
		transforms.Resize(size=(input_size, input_size)),
		transforms.ToTensor(),
		# use ImageNet standard mean and std dev for transfer learning
		transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
	])
}

#### 3. Create Training & Validation & Testing Datasets

The following code sets the batch size and creates the training, validation and testing datasets. It then performs the respective transformations on the images of each dataset and creates data loaders for each.

In [5]:
# hyperparameters
batch_size = 16

# create train and validation datasets and apply transforms
datasets_dict = {dataset_type: ImageFolder(os.path.join(data_dir, dataset_type),
	transform=data_transforms[dataset_type]) for dataset_type in ['train', 'val', 'test']}

# create train and validation dataloaders
dataloaders_dict = {dataset_type: DataLoader(datasets_dict[dataset_type], 
    batch_size=batch_size, shuffle=True)for dataset_type in ['train', 'val', 'test']}



#### 4. Funcion to Initialize Our Pretrained ResNet Model

This code loads a pretrained resnet model of a specified size and marks either all layers as trainable or just the last fully connected classification layer based on the finetune_all_params flag. 

The line: `model.fc = torch.nn.Linear(model.fc.in_features, num_classes)` is what adds the fully connected classification layer with 7 outputs to the end of the resnet convolutional layers

In [6]:
# initialize a pretrained resnet model
def init_model(num_classes, resnet_size, finetune_all_parmas,
				print_model=True, class_to_idx=None):

	model = None

	if resnet_size == 18:
		model = models.resnet18(pretrained=True)
	elif resnet_size == 34:
		model = models.resnet34(pretrained=True)
	elif resnet_size == 50:
		model = models.resnet50(pretrained=True)
	elif resnet_size == 101:
		model = models.resnet101(pretrained=True)
	elif resnet_size == 152:
		model = models.resnet152(pretrained=True)
	else:
		raise ValueError(f'Invalid size of {resnet_size} given for resnet size.')


	# sets requires_grad attribute of parameters in model to false if not finetuning all parameters
	if not finetune_all_parmas:
		# don't relearn weights when transfer learning
		for param in model.parameters():
			param.requires_grad = False

	# when transfer learning, set last layer to be fully connected with num_classes number of outputs
	model.fc = torch.nn.Linear(model.fc.in_features, num_classes)

	# map classes to indexes
	if class_to_idx:
		model.class_to_idx = class_to_idx
		model.idx_to_class = {idx: class_ for class_, idx in model.class_to_idx.items()}


		# print model information
	if print_model:
		# print model summary
		print()
		print(f'Using resnet size: {resnet_size}')
		print('Model summary:')
		print(model)

		# print model parameters
		total_params = sum(p.numel() for p in model.parameters())
		trainable_names = [name for name, p in model.named_parameters() if p.requires_grad]
		total_trainable_params = len(trainable_names)
		print()
		print(f'Model parameters ({total_params} total, {total_trainable_params} trainable)')
		print('List of trainable parameters:')
		pprint(trainable_names, width=80, compact=True)
		print()

		# print mapping for class indicies
		if class_to_idx:
			print('Model index to class mappings:')
			print(model.idx_to_class)
			print()

	return model

#### 5. Function to Train the Model

The function below performs the training for the model by taking in the model, a dictionary of dataloaders, a loss function, an optimizer, and a specified number of epochs to train and validate for. It saves the train and validation loss and accuracy history for each of the epochs and writes it to a csv file.

In [7]:
# model training and validation
def train(model, dataloaders, criterion, optimizer,
			num_epochs=25, save_path=None, save_history_to_csv=True):

	since = time.time()

	# save train and val loss/accuracy history for each epoch
	train_loss_history = []
	val_loss_history = []
	train_acc_history = []
	val_acc_history = []


	best_model_wts = copy.deepcopy(model.state_dict())
	best_acc = 0.0

	for epoch in range(num_epochs):
		print(f'Epoch {epoch + 1}/{num_epochs}')
		print('-' * 10)

		# Each epoch has a training and validation phase
		for phase in ['train', 'val']:
			if phase == 'train':
				model.train()  # Set model to training mode
			else:
				model.eval()   # Set model to evaluate mode

			running_loss = 0.0
			running_corrects = 0

			# Iterate over data.
			for inputs, labels in dataloaders[phase]:
				inputs = inputs.to(device)
				labels = labels.to(device)

				# zero the parameter gradients
				optimizer.zero_grad()

				# forward
				# track history if only in train
				with torch.set_grad_enabled(phase == 'train'):
					# Get model outputs and calculate loss
					outputs = model(inputs)
					loss = criterion(outputs, labels)

					_, preds = torch.max(outputs, 1)

					# backward + optimize only if in training phase
					if phase == 'train':
						loss.backward()
						optimizer.step()

				# statistics
				running_loss += loss.item() * inputs.size(0)
				running_corrects += torch.sum(preds == labels.data)

			epoch_loss = running_loss / len(dataloaders[phase].dataset)
			epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)

			print(f'{phase} loss: {epoch_loss:.4f} acc: {epoch_acc:.4f}')

			# deep copy the model if best accuracy
			if phase == 'val' and epoch_acc > best_acc:
				best_acc = epoch_acc
				best_model_wts = copy.deepcopy(model.state_dict())

			# save loss and accuracy to history
			if phase == 'train':
				train_loss_history.append(epoch_loss)
				train_acc_history.append(epoch_acc.item())
			elif phase == 'val':
				val_loss_history.append(epoch_loss)
				val_acc_history.append(epoch_acc.item())

		print()

	time_elapsed = time.time() - since
	print(f'Training complete in {(time_elapsed // 60):.0f}m {(time_elapsed % 60):.0f}s')
	print(f'Best val Acc: {best_acc:4f}')

	# load best model weights
	model.load_state_dict(best_model_wts)

	# write history csv file
	if save_history_to_csv:
		history_header = ['train_loss', 'val_loss', 'train_acc', 'val_acc']
		history_filename = model_save_path.split('.')[0] + '_history.csv'
		history = zip(train_loss_history, val_loss_history, train_acc_history, val_acc_history)
		history = [list(row) for row in history]
		with open(history_filename, 'w') as csv_file:
			writer = csv.writer(csv_file)
			writer.writerow(history_header)
			for row in history:
				writer.writerow(row)

		# save trained model to disk
		if save_path:
			torch.save(model.state_dict(), save_path)

	return model

#### 6. Function to Test the Model

The following code creates the function used to test the model on the testing set. The function takes a trained model, and a test dataloader. The model will predict the label for each batch in the test dataloader and compare to the actual label. Overall, it evaluates the model's performance and computes metrics that are later used for analysis, such as the accuracy, F1 Score and confusion matrix.

In [8]:
# tests performance on test set and computes metrics
def test(model, test_loader):
	# list of predicted labels of all batches
	predicted_labels = torch.zeros(0, dtype=torch.long, device='cpu')
	# list of actual labels of all batches
	actual_labels = torch.zeros(0, dtype=torch.long, device='cpu')

	with torch.no_grad():
		model.eval()
		# get batch of inputs (image) and outputs (expression label) from test_loader
		for inputs, labels in test_loader:
			inputs = inputs.to(device)
			labels = labels.to(device)

			# use model to predict label
			outputs = model(inputs)
			_, preds = torch.max(outputs, dim=1)

			# append batch prediction labels and actual labels
			predicted_labels = torch.cat([predicted_labels, preds.view(-1).cpu()])
			actual_labels = torch.cat([actual_labels, labels.view(-1).cpu()])

	print('\nTest Metrics:')
	# print confusion matrix
	print('Confusion Matrix:')
	print(confusion_matrix(actual_labels.numpy(), predicted_labels.numpy()))

	print('Test Accuracy:', accuracy_score(actual_labels.numpy(), predicted_labels.numpy()))
	print('F1 score:', f1_score(actual_labels.numpy(), predicted_labels.numpy(), average='weighted'))
	# print classification report
	print('Classification Report:')
	print(classification_report(actual_labels.numpy(), predicted_labels.numpy()))

	return predicted_labels

#### 7. Initialize the Model as ResNet50

Initialize the model to the number of classes (7), ResNet model size (number of layers) (50) and dataloader necessary to train the model with the images that are part of the training set.

In [None]:
# load pretrained ResNet-50 model
model = init_model(num_classes, 50, finetune_all_parmas,
					class_to_idx=datasets_dict['train'].class_to_idx)

# transfer model to gpu if available
model = model.to(device)

#### 8. Set Parameters that to be Optimized/Updated
We are optimizing both the dense classification layers and the ResNet convolutional base. We are using 50 epochs with a learning rate of 0.0005 and Adam as our chosen optimizer as per our baseline. 

An optimizer is a function or an algorithm that modifies the attributes of the neural network, such as weights and learning rate. The learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between 0.0 and 1.0. The learning rate controls how quickly the model is adapted to the problem.

Moreover, as per our literature review, Cross Entropy is one of the best loss functions to use when performing categorical classification.

In [10]:
num_epochs = 50
learning_rate = 0.0005

# set parameters needed to be optimized/updated
params_to_update = None
if not finetune_all_parmas:
	params_to_update = [param for param in model.parameters() if param.requires_grad]
else:
	params_to_update = model.parameters()

# set optimizer
optimizer = torch.optim.Adam(params_to_update, lr=learning_rate)
# optimizer = torch.optim.SGD(params_to_update, lr=learning_rate)

# set loss function
criterion = torch.nn.CrossEntropyLoss()

#### 9. Call to Train the Model

Call the model to be trained with the above set parameters

In [None]:
# train model
trained_model = train(model, dataloaders_dict, criterion, optimizer,
		num_epochs, model_save_path)

#### 10. Call to Test the Model

In [None]:
# test model
test(trained_model, dataloaders_dict['test'])

### Results

<div align="center">
<img src="https://git.cs.vt.edu/sdeepti/facial-expression-recognition/-/raw/main/main_resnet50/main_model_results.png">
</div>

The results above show how well our model performed. 

<div align="center">
<img src="https://git.cs.vt.edu/sdeepti/facial-expression-recognition/-/raw/main/Images/train_val_graph.png">
</div>

By applying transfer learning on the ResNet-50 model, we achieve 95.7% accuracy on the KDEF dataset. 
We trained our model for 50 epochs using similar hyperparameter settings as our baseline paper. Our model was able to converge relatively quickly due to transfer learning. 

<div align="center">
<img src="https://git.cs.vt.edu/sdeepti/facial-expression-recognition/-/raw/main/main_resnet50/confusion_matrix.png">
</div>