# 1 Author

**Student Name**:  Liang Zheyu 梁哲与   
**Student ID**:  210977800



# 2 Problem formulation

Describe the machine learning problem that you want to solve and explain what's interesting about it.   
-   
Using the genki4k dataset, build a machine learning pipeline that takes as an input an image and predicts 1) whether the person in the image is similing or not 2) estimate the 3D head pose labels in the image.   
The interesting part of this problem is that it is a multi-task problem. First it needs to predict whether the people is smiling and also estimate the 3D head pose labels.

# 3 Machine Learning pipeline

Describe your ML pipeline. Clearly identify its input and output, any intermediate stages (for instance, transformation -> models), and intermediate data moving from one stage to the next. It's up to you to decide which stages to include in your pipeline.   
-   
The input of the pipeline is the image data. The output of the pipeline is the prediction of the smile and the 3D head pose labels. The intermediate stages are the transformation and the models. The transformation stage is to transform the image data into the format that the model can use. The model stage is to train the model and predict the result.   
   
Preprocess dataset => Train the model => Evaluate the model.   
   
**Classification:**   
The input of the classification model is the image data. The output of the classification model is the prediction of the smile. Make it into 0-1.   
   
**Regression:**   
The input of the regression model is the image data. The output of the regression model is the prediction of the 3D head pose labels. Including the yaw, pitch and roll. All are float numbers.

In [14]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset, random_split
from sklearn.metrics import accuracy_score, mean_squared_error
import os
from PIL import Image
import datetime


# 参数-------------------------------------
epochs = 10
isSaveModel = False
# 参数-------------------------------------

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# 4 Transformation stage

Describe any transformations, such as feature extraction. Identify input and output. Explain why you have chosen this transformation stage.   
-   
The transformation stage is to transform the image data into the format that the model can use. Load the image data and labels.   
   
Consistency: Transforming all images into the same RGB format and size ensures a consistent input format.   
   
Data Augmentation: Introducing random horizontal flipping during training enhances data diversity, aiding the model in better generalizing to various samples.   
   
Normalization: Standardizing the input accelerates the training process and assists the model in handling pixel values from different ranges.   
   
The selection of this transformation stage aims to ensure the consistency, diversity, and stability of input data, thereby improving the model's performance.   

# 5 Modelling

Describe the ML model(s) that you will build. Explain why you have chosen them.   
-   
This code defines a class named AlexNet, which inherits from the nn.Module module in PyTorch.

In the constructor of this class, the pre-trained AlexNet model is loaded and assigned to self.alexnet. Then, by iterating through all the parameters of the model and setting their requires_grad attribute to False, all layers of the pre-trained model are frozen. This means that during training, the weights of these layers will not be updated.

Next, it gets the number of input features of the last layer (a fully connected layer) of the original AlexNet model's classifier and replaces it with a new fully connected layer. This new fully connected layer has the same number of input features as the original one, but the number of output features is set to 2, meaning that this model is customized as a binary classification model.

In the forward method, the input data x is passed to self.alexnet for forward propagation and the output result is returned.

The reason for choosing AlexNet is that its pre-trained model has been trained on a large amount of image data and can extract effective features. Moreover, by freezing the layers of the pre-trained model and replacing the fully connected layer, it can be easily customized as a model for a specific task.And by comparing to other models, AlexNet is trainning faster and has a better performance.

In [13]:
class AlexNet2(nn.Module):
    def __init__(self):
        super(AlexNet2, self).__init__()
        self.alexnet = torchvision.models.alexnet(pretrained=True)
        # 冻结预训练模型的所有层
        for param in self.alexnet.parameters():
            param.requires_grad = False
        num_ftrs = self.alexnet.classifier[6].in_features
        self.alexnet.classifier[6] = nn.Linear(num_ftrs, 2)  # 替换全连接层

    def forward(self, x):
        x = self.alexnet(x)
        return x

class AlexNet3(nn.Module):
    def __init__(self):
        super(AlexNet3, self).__init__()
        self.alexnet = torchvision.models.alexnet(pretrained=True)
        # 冻结预训练模型的所有层
        for param in self.alexnet.parameters():
            param.requires_grad = False
        num_ftrs = self.alexnet.classifier[6].in_features
        self.alexnet.classifier[6] = nn.Linear(num_ftrs, 3)  # 替换全连接层

    def forward(self, x):
        x = self.alexnet(x)
        return x

# 6 Methodology

Describe how you will train and validate your models, how model performance is assesssed (i.e. accuracy, confusion matrix, etc)   
- 
Training: The model is trained over a specified number of epochs. In each epoch, the model is set to training mode and the dataset is looped over using a DataLoader (train_loader). For each batch of data, the inputs and labels are moved to the GPU. The optimizer's gradients are zeroed out, and a forward pass is performed to get the model's outputs. The loss between the outputs and the labels is calculated using a classification loss function (cla_loss). The loss is then backpropagated through the network, and the optimizer performs a step to update the model's parameters. The accuracy of the model on the training data is calculated by comparing the model's predictions (obtained by taking the argmax of the outputs) with the true labels.

Validation: After each epoch of training, the model is set to evaluation mode and validated on a validation set. The process is similar to the training loop, but no backpropagation or parameter updating is performed. The validation loss and accuracy are calculated and printed.

Testing: After all epochs, the model is evaluated on a test set. The process is the same as the validation loop. The test loss and accuracy are calculated and printed.

The performance of the model is assessed primarily through the loss and accuracy. The loss provides a measure of how well the model's predictions match the true labels, while the accuracy provides a measure of the proportion of inputs that the model correctly classified.

In [31]:
def TrainFunc2(model):
    cla_loss = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.005, weight_decay=0.0001)
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5) 

    for epoch in range(epochs):  # loop over the dataset multiple times
        model.train()
        print("Epoch", epoch+1, "start training...")
        for i, data in enumerate(train_loader, 0):
            inputs, labels_smile = data[0].to(device), data[1].to(device)
            optimizer.zero_grad()
            outputs_smile = model(inputs)
            loss_cla = cla_loss(outputs_smile, labels_smile.long())
            loss_cla.backward()
            optimizer.step()
            
            # 计算精度
            preds_smile = torch.argmax(outputs_smile, dim=1)
            acc_smile = torch.eq(preds_smile, labels_smile).sum().item() / labels_smile.size(0)
            print(f'\rEpoch {epoch+1}, Batch {i+1}, Loss(smile): {loss_cla.item():.4f}, Accuracy(smile): {acc_smile:.4f}',end='')
        
        scheduler.step()
        # 在验证集上验证
        model.eval()
        val_loss_cla = 0
        val_acc_smile = 0
        with torch.no_grad():
            for i, data in enumerate(val_loader, 0):
                inputs, labels_smile = data[0].to(device), data[1].to(device)
                outputs_smile = model(inputs)
                loss_cla = cla_loss(outputs_smile, labels_smile.long())
                val_loss_cla += loss_cla.item()
                preds_smile = torch.argmax(outputs_smile, dim=1)
                val_acc_smile += accuracy_score(labels_smile.cpu(), preds_smile.cpu())
        val_loss_cla /= len(val_loader)
        val_acc_smile /= len(val_loader)
        print(f'\nEpoch {epoch+1}, Validation Loss(smile): {val_loss_cla:.4f}, Validation Accuracy(smile): {val_acc_smile:.4f}')

    # 7. 评估模型
    # 在测试集上测试
    model.eval()
    test_loss_cla = 0
    test_acc_smile = 0
    with torch.no_grad():
        for i, data in enumerate(test_loader, 0):
            inputs, labels_smile= data[0].to(device), data[1].to(device)
            outputs_smile = model(inputs)
            loss_cla = cla_loss(outputs_smile, labels_smile.long())
            test_loss_cla += loss_cla.item()
            preds_smile = torch.argmax(outputs_smile, dim=1)
            test_acc_smile += accuracy_score(labels_smile.cpu(), preds_smile.cpu())
    test_loss_cla /= len(test_loader)
    test_acc_smile /= len(test_loader)
    print(f'\nTest Loss(smile): {test_loss_cla:.4f}, Test Accuracy(smile): {test_acc_smile:.4f}')




reg_loss = nn.MSELoss()
def TrainFunc3(model):
    optimizer = torch.optim.Adam(model.parameters(), lr=0.005, weight_decay=0.0001)
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5) 

    for epoch in range(epochs):  # loop over the dataset multiple times
        model.train()
        print("Epoch", epoch+1, "start training...")
        for i, data in enumerate(train_loader, 0):
            inputs, labels_pose = data[0].to(device), data[2].to(device)
            optimizer.zero_grad()
            outputs_pose = model(inputs)
            # loss_reg = torch.nn.functional.mse_loss(outputs_pose, labels_pose.float())
            loss_reg = reg_loss(outputs_pose, labels_pose.float())
            loss_reg.backward()
            optimizer.step()
            mse_pose = mean_squared_error(labels_pose.cpu().numpy(), outputs_pose.detach().cpu().numpy())
            print(f'\rEpoch {epoch+1}, Batch {i+1}, MSE(pose): {mse_pose:.4f}',end='')
        
        scheduler.step()
        # 在验证集上验证
        model.eval()
        val_loss_reg = 0
        val_mse_pose = 0
        with torch.no_grad():
            for i, data in enumerate(val_loader, 0):
                inputs, labels_pose = data[0].to(device), data[2].to(device)
                outputs_pose = model(inputs)
                # loss_reg = torch.nn.functional.mse_loss(outputs_pose, labels_pose.float())
                loss_reg = reg_loss(outputs_pose, labels_pose.float())
                val_loss_reg += loss_reg.item()
                val_mse_pose += mean_squared_error(labels_pose.cpu().numpy(), outputs_pose.detach().cpu().numpy())
        val_loss_reg /= len(val_loader)
        val_mse_pose /= len(val_loader)
        print(f'\nEpoch {epoch+1}, Validation MSE(pose): {val_mse_pose:.4f}')

    # 7. 评估模型
    # 在测试集上测试
    model.eval()
    test_loss_reg = 0
    test_mse_pose = 0
    with torch.no_grad():
        for i, data in enumerate(test_loader, 0):
            inputs, labels_pose= data[0].to(device), data[2].to(device)
            outputs_pose = model(inputs)
            # loss_reg = torch.nn.functional.mse_loss(outputs_pose, labels_pose.float())
            loss_reg = reg_loss(outputs_pose, labels_pose.float())
            test_loss_reg += loss_reg.item()
            test_mse_pose += mean_squared_error(labels_pose.cpu().numpy(), outputs_pose.detach().cpu().numpy())
    test_loss_reg /= len(test_loader)
    test_mse_pose /= len(test_loader)
    print(f'\nTest MSE(pose): {test_mse_pose:.4f}')

# 7 Dataset

Describe the dataset that you will use to create your models and validate them. If you need to preprocess it, do it here. Include visualisations too. You can visualise raw data samples or extracted features.
- 
The dataset used in this project is the genki4k dataset. It contains 4000 images of faces, with smiling faces and non-smiling faces. The dataset is divided into a training set, a validation set, and a test set, with 3000, 500, and 500 images respectively. The dataset is preprocessed by transforming all images into the same RGB format and size 224x224, introducing random horizontal flipping during training, and standardizing the input.

In [5]:
dataset_path = './genki4k/'
labels = []
with open(os.path.join(dataset_path, 'labels.txt'), 'r') as file:
    lines = file.readlines()
    for line in lines:
        items = line.split()
        label = [int(items[0])] + [float(x) for x in items[1:]]  # 第一个是整数，后面三个是浮点数
        labels.append(label)

class Genki4kDataset(Dataset):
    def __init__(self, img_dir, labels, transform=None):
        self.img_dir = img_dir
        self.labels = labels
        self.transform = transform

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        img_name = os.path.join(self.img_dir, 'files', f'file{idx+1:04}.jpg')  # 图片名称格式为file0001.jpg, file0002.jpg, ...
        image = Image.open(img_name)
        label_smile = self.labels[idx][0]
        label_pose = torch.tensor([self.labels[idx][1], self.labels[idx][2], self.labels[idx][3]])

        if self.transform:
            image = self.transform(image)

        return image, label_smile, label_pose

transform = transforms.Compose([
    transforms.Lambda(lambda image: image.convert('RGB')),  # 将所有图像转换为RGB图像
    transforms.Resize((224, 224)),  # 将所有图像调整为224x224
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # 对每个通道进行归一化  
])

dataset = Genki4kDataset(dataset_path, labels, transform)
total_size = len(dataset)  # 数据集大小
train_size = int(0.75 * total_size)  # 训练集
val_size = int(0.125 * total_size)    # 验证集
test_size = total_size - train_size - val_size  # 剩余部分为测试集
train_dataset, val_dataset, test_dataset = random_split(dataset, [train_size, val_size, test_size])

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=True)

# 8 Results

Carry out your experiments here, explain your results.

In [18]:
model2 = AlexNet2().to(device)
print("Training model2...")
TrainFunc2(model2)


Training model2...
Epoch: 1 start training...
Epoch 1, Batch 94, Loss(smile): 1.9177, Accuracy(smile): 0.3750
Epoch 1, Validation Loss(smile): 0.8908, Validation Accuracy(smile): 0.6328
Epoch: 2 start training...
Epoch 2, Batch 94, Loss(smile): 0.8532, Accuracy(smile): 0.6667
Epoch 2, Validation Loss(smile): 1.0150, Validation Accuracy(smile): 0.6602
Epoch: 3 start training...
Epoch 3, Batch 94, Loss(smile): 0.6484, Accuracy(smile): 0.6667
Epoch 3, Validation Loss(smile): 0.9225, Validation Accuracy(smile): 0.6559
Epoch: 4 start training...
Epoch 4, Batch 94, Loss(smile): 1.5133, Accuracy(smile): 0.6250
Epoch 4, Validation Loss(smile): 1.1140, Validation Accuracy(smile): 0.6480
Epoch: 5 start training...
Epoch 5, Batch 94, Loss(smile): 1.2884, Accuracy(smile): 0.5417
Epoch 5, Validation Loss(smile): 1.1631, Validation Accuracy(smile): 0.6113
Epoch: 6 start training...
Epoch 6, Batch 94, Loss(smile): 0.9984, Accuracy(smile): 0.6667
Epoch 6, Validation Loss(smile): 0.9705, Validation Acc

In [32]:
model3 = AlexNet3().to(device)
print("----------------------------------------------------------------------")
print("Training model3...")
TrainFunc3(model3)



----------------------------------------------------------------------
Training model3...
Epoch 1 start training...
Epoch 1, Batch 94, MSE(pose): 0.3644
Epoch 1, Validation MSE(pose): 0.2625
Epoch 2 start training...
Epoch 2, Batch 94, MSE(pose): 0.6048
Epoch 2, Validation MSE(pose): 0.6885
Epoch 3 start training...
Epoch 3, Batch 94, MSE(pose): 1.3643
Epoch 3, Validation MSE(pose): 0.9089
Epoch 4 start training...
Epoch 4, Batch 94, MSE(pose): 1.8211
Epoch 4, Validation MSE(pose): 1.1087
Epoch 5 start training...
Epoch 5, Batch 94, MSE(pose): 2.8323
Epoch 5, Validation MSE(pose): 1.0578
Epoch 6 start training...
Epoch 6, Batch 94, MSE(pose): 1.5832
Epoch 6, Validation MSE(pose): 0.5532
Epoch 7 start training...
Epoch 7, Batch 94, MSE(pose): 0.4120
Epoch 7, Validation MSE(pose): 0.3473
Epoch 8 start training...
Epoch 8, Batch 94, MSE(pose): 0.8337
Epoch 8, Validation MSE(pose): 0.2682
Epoch 9 start training...
Epoch 9, Batch 94, MSE(pose): 0.4791
Epoch 9, Validation MSE(pose): 0.2468
E

# 9 Conclusions

Your conclusions, improvements, etc should go here

# 10 code

In [2]:
import tensorflow as tf
import keras
tf.__version__




'2.15.0'