# Bias Experiment

#### Link to Readme section: 

[Exploring Bias](https://gitlab.cs.vt.edu/sdeepti/facial-expression-recognition/-/blob/main/README.md#exploring-bias-in-our-model)

#### Citations:
- https://pytorch.org/tutorials/beginner/basics/saveloadrun_tutorial.html
- https://www.sciencedirect.com/science/article/pii/S0165178117321893

**Motivation:** A prominent issue in the facial recognition domain is that most datasets severely lack diversity. This is especially true regarding race. The lack of diversity means that machine learning models trained on these datasets are significantly biased.

To explore how biased our classifier was, we designed an experiment, where we first load our original model that was trained on [KDEF](https://www.kdef.se/home/aboutKDEF.html). Then, we evaluate the trained model on the racially diverse [Radiate](https://www.sciencedirect.com/science/article/pii/S0165178117321893) dataset.

### 1. Initial Set-Up

Like other experiments, we import the required pytorch classes and functions. We use sklearn for evaluation metrics.

In [None]:
import os
import numpy as np

import torch
from torchvision import datasets, transforms, models
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, f1_score


#### Datasets used in experiment
- [KDEF](https://www.kdef.se/home/aboutKDEF.html) (Karolinska Directed Emotional Faces) dataset used to train model
- [Radiate](https://www.sciencedirect.com/science/article/pii/S0165178117321893) (RAcially DIverse AffecTive Expression) dataset) for evaluating trained model

In [None]:
# constant parameters

#data_dir = 'face_images_80_10_10'
data_dir = 'radiate_faces_80_10_10'
print(f'using {data_dir} as data folder')

model_save_path = 'FEC_bias' + data_dir + '.pt'

num_classes = 7

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using {device} device')

### Load the trained model on KDEF (not diverse) images

Transfer the model to a GPU if avaliable, and then set the model to evaluation mode to imporove performance. 

In [None]:
# load the trained model
model = models.resnet50(num_classes=num_classes)
model.load_state_dict(torch.load(model_save_path, map_location='cpu'))
# transfer model to gpu if available
model = model.to(device)
# set model to evaluation mode
model.eval()

### Loading Radiate Images

We load the radiate images to use as the test set. 

We apply the same preprocessing steps to the Radiate test images. Notably resizing and normalizing.

In [None]:
# transformations to apply to test images
# data augmentation and normalization for training
# just normalization for validation and testing
# https://pytorch.org/vision/stable/transforms.html
test_transforms = transforms.Compose([
	transforms.Resize(size=(224, 224)),
	transforms.ToTensor(),
	# use ImageNet standard mean and std dev for transfer learning
	transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

batch_size = 16
# create test_set
test_set = ImageFolder(os.path.join(data_dir, 'test'))

# create tese_loader
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)

# Predict Test Images

We record the predictions our model makes on the test set, and then compute evaluation metrics.

In [None]:
# tests performance on test set and computes metrics
def test(model, test_loader):
	# list of predicted labels of all batches
	predicted_labels = torch.zeros(0, dtype=torch.long, device='cpu')
	# list of actual labels of all batches
	actual_labels = torch.zeros(0, dtype=torch.long, device='cpu')

	with torch.no_grad():
		model.eval()
		# get batch of inputs (image) and outputs (expression label) from test_loader
		for inputs, labels in test_loader:
			inputs = inputs.to(device)
			labels = labels.to(device)

			# use model to predict label
			outputs = model(inputs)
			_, preds = torch.max(outputs, dim=1)

			# append batch prediction labels and actual labels
			predicted_labels = torch.cat([predicted_labels, preds.view(-1).cpu()])
			actual_labels = torch.cat([actual_labels, labels.view(-1).cpu()])

	print('\nTest Metrics:')
	# print confusion matrix
	print('Confusion Matrix:')
	print(confusion_matrix(actual_labels.numpy(), predicted_labels.numpy()))

	print('Test Accuracy:', accuracy_score(actual_labels.numpy(), predicted_labels.numpy()))
	print('F1 score:', f1_score(actual_labels.numpy(), predicted_labels.numpy(), average='weighted'))
	# print classification report
	print('Classification Report:')
	print(classification_report(actual_labels.numpy(), predicted_labels.numpy()))

	return predicted_labels




# load model
model = models.resnet50(num_classes=num_classes)
model.load_state_dict(torch.load('dataset_size_experiment/dataset_size_70/FEC_resnet50_trained_face_images_70_10_20.pt'))
model.eval()

# transfer model to gpu if available
model = model.to(device)



# test model on radiate images
test(model, dataloaders_dict['test'])

#### 3. Load Test Dataset and Create Dataloader

Now we load our test dataset to which we applied transformations, as well as our Gaussian Noise. Then we create the dataloader. 

In [None]:
# load test dataset and create dataloader
batch_size = 16
test_set = datasets.ImageFolder(os.path.join(data_dir, 'test'), transform=test_transforms)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=True)


#### 4. Test Model Performance on Test Set and Compute Metrics

The function below evaluates our model's performance on the test set with Gaussian Noise added to it (i.e. variance) and computes the metrics we decided to use for all experiments, and prints them. The metrics we are using include a Confusion Matrix, F1 Score, and Classification Report.

In [None]:
# tests performance on test set and computes metrics
def test(model, test_loader):
	# list of predicted labels of all batches
	predicted_labels = torch.zeros(0, dtype=torch.long, device='cpu')
	# list of actual labels of all batches
	actual_labels = torch.zeros(0, dtype=torch.long, device='cpu')

	with torch.no_grad():
		model.eval()
		# get batch of inputs (image) and outputs (expression label) from test_loader
		for inputs, labels in test_loader:
			inputs = inputs.to(device)
			labels = labels.to(device)

			# use model to predict label
			outputs = model(inputs)
			_, preds = torch.max(outputs, dim=1)

			# append batch prediction labels and actual labels
			predicted_labels = torch.cat([predicted_labels, preds.view(-1).cpu()])
			actual_labels = torch.cat([actual_labels, labels.view(-1).cpu()])

	print('\nTest Metrics:')
	# print confusion matrix
	print('Confusion Matrix:')
	print(confusion_matrix(actual_labels.numpy(), predicted_labels.numpy()))

	print('Test Accuracy:', accuracy_score(actual_labels.numpy(), predicted_labels.numpy()))
	print('F1 score:', f1_score(actual_labels.numpy(), predicted_labels.numpy(), average='weighted'))
	# print classification report
	print('Classification Report:')
	print(classification_report(actual_labels.numpy(), predicted_labels.numpy()))

	return predicted_labels

# test model on radiate images
test(model, test_loader)

<img src="https://git.cs.vt.edu/sdeepti/facial-expression-recognition/-/raw/main/Images/bias-results.png">