# 1 Author

**Student Name**:  Liang Zheyu   
**Student ID**:  210977800



# 2 Problem formulation

Describe the machine learning problem that you want to solve and explain what's interesting about it.
- 
For this part, Using the genki4k dataset, build a machine learning pipeline that takes as an input an image and predicts whether the person in the image is female or male. I manually labeled 4,000 datasets as male and female.

# 3 Machine Learning pipeline

Describe your ML pipeline. Clearly identify its input and output, any intermediate stages (for instance, transformation -> models), and intermediate data moving from one stage to the next. It's up to you to decide which stages to include in your pipeline. 
- 
The input of the pipeline is the image data. The output of the pipeline is the prediction of the person in the picture male or female. The intermediate stages are the transformation and the models. The transformation stage is to transform the image data into the format that the model can use. The model stage is to train the model and predict the result.   
   
Preprocess dataset => Train the model => Evaluate the model.  

**Classification:**   
The input of the classification model is the image data. The output of the classification model is the prediction of the sex. Make it into 0-1. 

In [1]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset, random_split
from sklearn.metrics import accuracy_score, mean_squared_error
import os
from PIL import Image
import datetime

# 参数-------------------------------------
epochs = 10
isSaveModel = False
# 参数-------------------------------------

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device = torch.device("cpu")

# 4 Transformation stage

Describe any transformations, such as feature extraction. Identify input and output. Explain why you have chosen this transformation stage.
- 
The transformation stage is to transform the image data into the format that the model can use. Load the image data and labels.   
   
Consistency: Transforming all images into the same RGB format and size ensures a consistent input format.   
   
Data Augmentation: Introducing random horizontal flipping during training enhances data diversity, aiding the model in better generalizing to various samples.   
   
Normalization: Standardizing the input accelerates the training process and assists the model in handling pixel values from different ranges.   
   
The selection of this transformation stage aims to ensure the consistency, diversity, and stability of input data, thereby improving the model's performance.   

# 5 Modelling

Describe the ML model(s) that you will build. Explain why you have chosen them.
- 
This code defines a class named AlexNet, which inherits from the nn.Module module in PyTorch.

In the constructor of this class, the pre-trained AlexNet model is loaded and assigned to self.alexnet. Then, by iterating through all the parameters of the model and setting their requires_grad attribute to False, all layers of the pre-trained model are frozen. This means that during training, the weights of these layers will not be updated.

Next, it gets the number of input features of the last layer (a fully connected layer) of the original AlexNet model's classifier and replaces it with a new fully connected layer. This new fully connected layer has the same number of input features as the original one, but the number of output features is set to 2, meaning that this model is customized as a binary classification model.

In the forward method, the input data x is passed to self.alexnet for forward propagation and the output result is returned.

The reason for choosing AlexNet is that its pre-trained model has been trained on a large amount of image data and can extract effective features. Moreover, by freezing the layers of the pre-trained model and replacing the fully connected layer, it can be easily customized as a model for a specific task.And by comparing to other models, AlexNet is trainning faster and has a better performance.

In [2]:
class AlexNet2(nn.Module):
    def __init__(self):
        super(AlexNet2, self).__init__()
        self.alexnet = torchvision.models.alexnet(pretrained=True)
        # 冻结预训练模型的所有层
        for param in self.alexnet.parameters():
            param.requires_grad = False
        num_ftrs = self.alexnet.classifier[6].in_features
        # 替换全连接层
        self.alexnet.classifier[6] = nn.Sequential(
            nn.Linear(num_ftrs, 512),
            nn.ReLU(),
            nn.Dropout(),
            nn.Linear(512, 128),
            nn.ReLU(),
            nn.Dropout(),
            nn.Linear(128, 2)
        )

    def forward(self, x):
        x = self.alexnet(x)
        return x



# 6 Methodology

Describe how you will train and validate your models, how model performance is assesssed (i.e. accuracy, confusion matrix, etc)
- 
Training: The model is trained over a specified number of epochs. In each epoch, the model is set to training mode and the dataset is looped over using a DataLoader (train_loader). For each batch of data, the inputs and labels are moved to the GPU. The optimizer's gradients are zeroed out, and a forward pass is performed to get the model's outputs. The loss between the outputs and the labels is calculated using a classification loss function (cla_loss). The loss is then backpropagated through the network, and the optimizer performs a step to update the model's parameters. The accuracy of the model on the training data is calculated by comparing the model's predictions (obtained by taking the argmax of the outputs) with the true labels.

Validation: After each epoch of training, the model is set to evaluation mode and validated on a validation set. The process is similar to the training loop, but no backpropagation or parameter updating is performed. The validation loss and accuracy are calculated and printed.

Testing: After all epochs, the model is evaluated on a test set. The process is the same as the validation loop. The test loss and accuracy are calculated and printed.

The performance of the model is assessed primarily through the loss and accuracy. The loss provides a measure of how well the model's predictions match the true labels, while the accuracy provides a measure of the proportion of inputs that the model correctly classified.

In [3]:
cla_loss = nn.CrossEntropyLoss()
def TrainFunc2(model):
    cla_loss = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.005, weight_decay=0.0001)
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5) 

    for epoch in range(epochs):  # loop over the dataset multiple times
        model.train()
        print("Epoch", epoch+1, "start training...")
        for i, data in enumerate(train_loader, 0):
            inputs, labels_gender = data[0].to(device), data[1].to(device)
            optimizer.zero_grad()
            outputs_gender = model(inputs)
            loss_cla = cla_loss(outputs_gender, labels_gender.long())
            loss_cla.backward()
            optimizer.step()
            
            # 计算精度
            preds_gender = torch.argmax(outputs_gender, dim=1)
            acc_gender = torch.eq(preds_gender, labels_gender).sum().item() / labels_gender.size(0)
            print(f'\rEpoch {epoch+1}, Batch {i+1}, Loss(gender): {loss_cla.item():.4f}, Accuracy(gender): {acc_gender:.4f}',end='')
        
        scheduler.step()
        # 在验证集上验证
        model.eval()
        val_loss_cla = 0
        val_acc_gender = 0
        with torch.no_grad():
            for i, data in enumerate(val_loader, 0):
                inputs, labels_gender = data[0].to(device), data[1].to(device)
                outputs_gender = model(inputs)
                loss_cla = cla_loss(outputs_gender, labels_gender.long())
                val_loss_cla += loss_cla.item()
                preds_gender = torch.argmax(outputs_gender, dim=1)
                val_acc_gender += accuracy_score(labels_gender.cpu(), preds_gender.cpu())
        val_loss_cla /= len(val_loader)
        val_acc_gender /= len(val_loader)
        print(f'\nEpoch {epoch+1}, Validation Loss(gender): {val_loss_cla:.4f}, Validation Accuracy(gender): {val_acc_gender:.4f}')

    # 7. 评估模型
    # 在测试集上测试
    model.eval()
    test_loss_cla = 0
    test_acc_gender = 0
    with torch.no_grad():
        for i, data in enumerate(test_loader, 0):
            inputs, labels_gender= data[0].to(device), data[1].to(device)
            outputs_gender = model(inputs)
            loss_cla = cla_loss(outputs_gender, labels_gender.long())
            test_loss_cla += loss_cla.item()
            preds_gender = torch.argmax(outputs_gender, dim=1)
            test_acc_gender += accuracy_score(labels_gender.cpu(), preds_gender.cpu())
    test_loss_cla /= len(test_loader)
    test_acc_gender /= len(test_loader)
    print(f'\nTest Loss(gender): {test_loss_cla:.4f}, Test Accuracy(gender): {test_acc_gender:.4f}')



# 7 Dataset

Describe the dataset that you will use to create your models and validate them. If you need to preprocess it, do it here. Include visualisations too. You can visualise raw data samples or extracted features.
- 
The dataset used in this project is the genki4k dataset. It contains 4000 images of faces, with different sex. The dataset is divided into a training set, a validation set, and a test set, with 3000, 500, and 500 images respectively. The dataset is preprocessed by transforming all images into the same RGB format and size 224x224, introducing random horizontal flipping during training, and standardizing the input.

In [4]:
dataset_path = './genki4k/'
labels = []
with open(os.path.join(dataset_path, 'labels-sex.txt'), 'r') as file:
    lines = file.readlines()
    for line in lines:
        items = line.split()
        label = [int(items[0])]  # 第一个是整数，1是男
        labels.append(label)

class Genki4kDataset(Dataset):
    def __init__(self, img_dir, labels, transform=None):
        self.img_dir = img_dir
        self.labels = labels
        self.transform = transform

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        img_name = os.path.join(self.img_dir, 'files', f'file{idx+1:04}.jpg')  # 图片名称格式为file0001.jpg, file0002.jpg, ...
        image = Image.open(img_name)
        label_smile = self.labels[idx][0]

        if self.transform:
            image = self.transform(image)

        return image, label_smile

transform = transforms.Compose([
    transforms.Lambda(lambda image: image.convert('RGB')),  # 将所有图像转换为RGB图像
    transforms.Resize((224, 224)),  # 将所有图像调整为224x224
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # 对每个通道进行归一化  
])

dataset = Genki4kDataset(dataset_path, labels, transform)
total_size = len(dataset)  # 数据集大小
train_size = int(0.8 * total_size)  # 训练集
val_size = int(0.1 * total_size)    # 验证集
test_size = total_size - train_size - val_size  # 剩余部分为测试集
train_dataset, val_dataset, test_dataset = random_split(dataset, [train_size, val_size, test_size])

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=True)

# 8 Results

Carry out your experiments here, explain your results.

In [5]:
model4 = AlexNet2().to(device)
print("Training model4...")
TrainFunc2(model4)



Training model4...
Epoch 1 start training...
Epoch 1, Batch 100, Loss(gender): 0.4264, Accuracy(gender): 0.7812
Epoch 1, Validation Loss(gender): 0.4225, Validation Accuracy(gender): 0.8221
Epoch 2 start training...
Epoch 2, Batch 100, Loss(gender): 0.5373, Accuracy(gender): 0.8125
Epoch 2, Validation Loss(gender): 0.3832, Validation Accuracy(gender): 0.8413
Epoch 3 start training...
Epoch 3, Batch 100, Loss(gender): 0.5744, Accuracy(gender): 0.8438
Epoch 3, Validation Loss(gender): 0.3699, Validation Accuracy(gender): 0.8678
Epoch 4 start training...
Epoch 4, Batch 100, Loss(gender): 0.5686, Accuracy(gender): 0.6875
Epoch 4, Validation Loss(gender): 0.3732, Validation Accuracy(gender): 0.8606
Epoch 5 start training...
Epoch 5, Batch 100, Loss(gender): 0.5782, Accuracy(gender): 0.7500
Epoch 5, Validation Loss(gender): 0.4072, Validation Accuracy(gender): 0.8606
Epoch 6 start training...
Epoch 6, Batch 100, Loss(gender): 0.4142, Accuracy(gender): 0.8750
Epoch 6, Validation Loss(gender):

# 9 Conclusions

Your conclusions, improvements, etc should go here
- 
The results show that the model has a good performance on the test set, with an accuracy of 0.84. However, the model's performance on the validation set is 0.87. This may be due to the fact that the model is overfitting on the training set, and the validation set is not large enough to prevent overfitting. In the future, we can try to increase the size of the validation set to improve the model's performance on the validation set. To improve our model, we can try using a more complex model such as ResNet or VGG, or try using a different pre-trained model. We can also try using different optimizers, such as SGD or RMSprop, or adjust the learning rate and other hyperparameters. In addition, we can experiment with data augmentation techniques such as random rotation, cropping, and flipping to increase the generalization ability of the model.

# 10 Github
https://github.com/Guest-Liang/Mini-Project