# CS4487 Project: Deep Fake Detection
#### Group20: Name: SHAN Jinyun, SID: 55670256, EID: jshan9

## Abstract
In this project, we are going to distinguish between fake faces and real faces with deep learning. Nowadays, it is quite cheap for anyone to create false faces, posing a threat to public privacy and safety. Thus, it is essential to figure out effective models to classify fake faces. With many trials, including but not limited to the upsampling on the datset, different data augmantation methods, and different model structures, the following model performs best. The code and the analysis are as follows, and the order is:
1. Background
2. Environment Setting 
3. Data Preprocessing 
4. Model Construction 
5. Training Process
6. Testing
7. Conclusion 
8. Reference

## 1. Background
Fake faces may now be made with little or no effort and at a low cost of time and money. There might be two explanations for this: Free access to vast public databases, as well as rapid advancements in deep learning techniques, particularly Generative Adversarial Networks (GAN). There are a great number of free accesses to big size public databases, and the face images are available both through the datasets and the Internet. It provides a large amount of training data. Additionally, numerous deep learning techniques have been developed, which perform excellently when it comes to creating fake faces. Many of these are open resources that the general public may simply access. Therefore , developing models to recognize fake faces is critical for public safety and privacy.

Fake faces may be divided into four categories. They are entire face synthesis, identity swap, attribute manipulation, and expression swap. Face synthesis builds entirely new faces from nothing. Identity swap replaces the face of one person with the face of another person, and it is commonly used in videos. Attribute manipulation modifies some attributes of the face, like the hair style. Expression Swap modifies the facial expression of the person, changing the facial expression of one person into another one’s and keep the original facial features.

In this project, fake faces are generated by deepfake and face to face. Our task is to distinguish between fake faces and real faces, which turns into a classification problem. 

## 2. Environmnet setting 

### 2.1. Importing all the packages required 

In [1]:
import os, glob
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from PIL import Image
import json
from torchvision import transforms
from torchvision import models
import numpy as np
import pandas as pd
import random
import math
import torch.utils.model_zoo as model_zoo
import torch.nn as nn
from torch.utils.tensorboard import SummaryWriter

### 2.2. Setting the GPU

In [2]:
os.environ['OMP_NUM_THREADS'] = '4'
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
cuda = torch.cuda.is_available()

### 2.3 Setting the random seed
The random seed is specified here to ensure that the random state is consistent across all models. As a result, the outcomes of the models can be compared evenly.

Note that we could always replicate the exactly same models by setting both the random seed and torch.backends.cudnn.deterministic = True under the same environment. To install the same environment, run the following command:

``conda env create --name fake-face-detection -f fake-face-detection.yml``

In [3]:
seed = 0
random.seed(seed)
np.random.seed(seed=seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True

## 3. Data Preprocessing

There are 4000 fake face images by face to face, 4000 fake face images by deepfake and 4000 real face images. There are three types of images, and the ration between fake faces and real faces is 2:1. 

To keep the data balanced, I tried to upsample the real faces while splitting the dataset. However, it may result in overfitting, and the typical unbalanced dataset ratio is 3:1. After the trials, it was discovered that models without upsampling outperform those with upsampling.

After splitting, the training set, validation set, and test set have a ration of around 8:1:1. We will create jshon files first, then the Dataset and the Dataloader.

### 3.1 Generating the json files
Three json files containing the image paths and labels are created in this part. Fake face images are labeled as 0, and real face images are labeled as 1.  The training json file contains 9600 photos with 3200 images from each image type. The testing json file contains 1200 images from the original dataset, divided into 400 images for each image types. The validation json file contains 1200 images from the original dataset, with 400 images for each image types.

#### 3.1.1  Getting image paths

In [4]:
img_dir1 = '/home/felix/disk1/sss/deepfake_detection/data/fake_deepfake'
img_dir2 = '/home/felix/disk1/sss/deepfake_detection/data/fake_face2face'
img_dir3 =  '/home/felix/disk1/sss/deepfake_detection/data/real'

imgs = sorted(glob.glob('{:s}/*.png'.format(img_dir1)))
imgs2 = sorted(glob.glob('{:s}/*.png'.format(img_dir2)))
imgs3 = sorted(glob.glob('{:s}/*.png'.format(img_dir3)))

#### 3.1.2  Generating random index

The random index seeks to construct the training set, the testing set and the validation set by randomly selecting the data from the original dataset. Furthermore, because the random seed has already been determined, the datasets created are identical for all the models.

In [5]:
index = [i for i in range(0, 4000)]
random.shuffle(index)
train_index = index[0:3200]
valid_index = index[3200:3600]
test_index = index[3600:4000]

#### 3.1.3 Adding image paths and labels into the corresponding json files
 The images paths and labels are saved in the form of dictonary. 

In [6]:
items = {}
items_valid = {}
items_test = {}
for i in range(4000):
    sample_imgs = imgs[i]
    label = 0
    if i in train_index:
        items[i] = {
            'img_paths': sample_imgs,
            'label': label
        }
    elif i in valid_index:
        items_valid[i] = {
            'img_paths': sample_imgs, 
            'label': label
        }
    else:
        items_test[i] = {
            'img_paths': sample_imgs, 
            'label': label
        }

train_index = [i + 4000 for i in train_index]
valid_index = [i + 4000 for i in valid_index]
test_index = [i + 4000 for i in test_index]
for i in range(4000, 8000):
    sample_imgs = imgs2[i-4000]
    label = 0
    if i in train_index:
        items[i] = {
            'img_paths': sample_imgs,
            'label': label
        }
    elif i in valid_index:
        items_valid[i] = {
            'img_paths': sample_imgs, 
            'label': label
        }
    else:
        items_test[i] = {
            'img_paths': sample_imgs, 
            'label': label
        }

train_index = [i + 4000 for i in train_index]
valid_index = [i + 4000 for i in valid_index]
test_index = [i + 4000 for i in test_index]
for i in range(8000, 12000):
    sample_imgs = imgs3[i-8000]
    label = 1
    if i in train_index:
        items[i] = {
            'img_paths': sample_imgs,
            'label': label
        }
    elif i in valid_index:
        items_valid[i] = {
            'img_paths': sample_imgs, 
            'label': label
        }
    else:
        items_test[i] = {
            'img_paths': sample_imgs, 
            'label': label
        }

#### 3.1.4 Saving the json files

In [7]:
json_path_train = '/home/felix/disk1/sss/deepfake_detection/data/train_2-1.json'
json_path_valid = '/home/felix/disk1/sss/deepfake_detection/data/valid_2-1.json'
json_path_test = '/home/felix/disk1/sss/deepfake_detection/data/test_2-1.json'
with open(json_path_train, 'w') as f:
    json.dump(items, f, sort_keys=True, indent=4)
with open(json_path_valid, 'w') as f:
    json.dump(items_valid, f, sort_keys=True, indent=4)
with open(json_path_test, 'w') as f:
    json.dump(items_test, f, sort_keys=True, indent=4)

### 3.2 Constructing the Dataset
For the puropse of data augmentation and overfitting prevention, the methods applied here contains ramdom cropping, random horizontal flip, color jitter and noramlization. 

Randomly cropping a image into the size of 224*224 can not only satisfy the data augmentation, but also fit the default setting of ResNet, which would lead to a better result for the default parameters setting. 

The brightness, contrast, saturation, and hue of the images are changed randomly for the purpose of data augmentation and the prevention of overfitting on the training set. The brightness is the overall lightness or darkness of the image. The contrast refers to the amount of color or grayscale differentiation that exists between various image features in both analog and digital images. The saturation describes the intensity of the color, and the hue means the visible spectrum of basic colors that can be seen in a rainbow. The images are taken in different environments and of different people, so we change the properties above to increase the generality. 

After the trials, it shows that the model trained with random cropping, random horizontal flip and normalization performs best.

In [8]:
class Imageset(Dataset):
    def __init__(self, json_path, train = True):
        f = open(json_path)
        data = json.load(f)

        self.items = []
        for item in data.values():
            term = {'img_paths': item['img_paths'], 'label': item['label']}  
            self.items.append(term)
    
        if train:
            self.transform = transforms.Compose([
                transforms.Resize(256),
                transforms.RandomCrop(224),
                transforms.RandomHorizontalFlip(),
                # transforms.RandomVerticalFlip(),
                # transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.1, hue=0.1),
                # transforms.RandomAffine(degrees=10, translate=None, scale=(1, 2), shear=15, resample=False, fillcolor=0),
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
            ])
        else:
            self.transform = transforms.Compose([
                transforms.Resize(256),
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
            ])

    def __len__(self):
        return len(self.items)
    
    def __getitem__(self, idx):
        item = self.items[idx]
        img_paths = item['img_paths']
        label = item['label']
        img = Image.open(img_paths)
        img = self.transform(img)

        label = int(label)
        label = torch.LongTensor([label])[0]
        
        return {'imgs': img, 'label': label}

###  3.3 Constructing the Dataloader
The Dataloaders are built in this part using the Datasets construted before. The batch size for the training data is 32. The data would be shuffled so training images would be in different order for different epoch. It benefits the generality of the model.

In [9]:
train_dataset = Imageset('/home/felix/disk1/sss/deepfake_detection/data/train_2-1.json', train=True)
train_dataloader = DataLoader(
    train_dataset,
    batch_size=32,
    num_workers=4,
    shuffle=True,
    pin_memory=False,
)

valid_dataset = Imageset('/home/felix/disk1/sss/deepfake_detection/data/valid_2-1.json', train=False)
valid_dataloader = DataLoader(
    valid_dataset,
    batch_size=1,
    num_workers=0,
    shuffle=False,
    pin_memory=False,
)

test_dataset = Imageset('/home/felix/disk1/sss/deepfake_detection/data/test_2-1.json', train=False)
test_dataloader = DataLoader(
    test_dataset,
    batch_size=1,
    num_workers=0,
    shuffle=False,
    pin_memory=False,
)

##  4. Model Construction
The basic model applied in this project is ResNet50. Its invention addresses the issue of neural network degradation. It contains additional residual learning when compared to VGG19. Shortcut connections are added between some units. It would extract the output of previous networks and combine it with the output of the next several layers to form the input of subsequent networks. As a result, the networks in the middle are only required to acquire the knowledge of the residual. Therefore, the model can converge faster and achieve better results. 

The basic structure given in Pytorch suggests an input of size 224, and the output size is 1000. There are 4 basicblocks, and each basicblock contains several bottlenecks. In each bottleneck, there are 3 convolution layers, and there are 48 convolution layers totally. For the purpose of the classification task, a linear layer with ReLU activation function ans a linear layer converting 1000 outputs into 2 are added. The detailed structure is shown as follows. 

In [10]:
class resnet(torch.nn.Module):
    def __init__(self, pretrained=False):
        super(resnet, self).__init__()
        if pretrained == True:
            model = models.resnet50(pretrained = True)
        else:
            model = models.resnet50(pretrained = False)
        # model.conv1.in_channels = 3

        self.slice1 = torch.nn.Sequential()
        self.slice1.add_module(str(1), model.conv1)
        self.slice1.add_module(str(2), model.bn1)
        self.slice1.add_module(str(3), model.relu)
        self.slice1.add_module(str(4), model.maxpool)
        self.slice1.add_module(str(5), model.layer1)
        self.slice1.add_module(str(6), model.layer2)
        self.slice1.add_module(str(7), model.layer3)
        
        model_other = models.resnet50(pretrained = False)

        self.slice2 = model_other.layer4
        self.avgpool = model_other.avgpool

        self.classifier = torch.nn.Sequential()
        self.classifier.add_module(str(1), model_other.fc)
        self.classifier.add_module(str(2), torch.nn.ReLU(inplace=True))
        self.classifier.add_module(str(3), torch.nn.Linear(1000, 2))
        # self.classifier.add_module(str(3), torch.nn.Dropout())
        # self.classifier.add_module(str(4), torch.nn.Linear(1000, 2))

    def forward(self, x):
        x = self.slice1(x)
        x = self.slice2(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x
resnet(pretrained = True)

resnet(
  (slice1): Sequential(
    (1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (5): Sequential(
      (0): Bottleneck(
        (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(64, 25

## 5. Training Process

### 5.1. Setting the tensorboard and preparing to save the experiments

In [11]:
tb_logger_dir = './tb_logger/'
exp_name = 'aug3_v2'
tb_logger = SummaryWriter(log_dir=tb_logger_dir + exp_name)

def log_and_print(logger, message):
    logger.write(message+'\n')
    logger.flush()
    print(message)

In [12]:
experiment_dir = os.path.join('./experiments', exp_name)
if not os.path.exists(experiment_dir):
    os.makedirs(experiment_dir)

logger = open(os.path.join(experiment_dir, 'train.log'), 'w')

### 5.2. Defining the hyperparameters
The optimizer applied here is Adaptive Moment Estimation (Adam), which combines root mean square propagation (RMSprop) and momentum. It shows great performance for model training. The following is the the formula.
$$
\begin{array}{l}
m_{t}=\mu * m_{t-1}+(1-\mu) * g_{t} \\
n_{t}=\nu * n_{t-1}+(1-\nu) * g_{t}^{2} \\
\hat{m}_{t}=\frac{m_{t}}{1-\mu^{t}} \\
\hat{n_{t}}=\frac{n_{t}}{1-\nu^{t}} \\
\Delta \theta_{t}=-\frac{\hat{m}_{t}}{\sqrt{\hat{n_{t}}}+\epsilon} * \eta
\end{array}
$$
where $m_t$ is the exponential decay rate for the first moment estimates, $n_t$ is the exponential decay rate for the second-moment estimates, $\hat{m}_{t}$ and $\hat{n_{t}}$ are the updates for $m_t$ and $n_t$.

The criterion is the cross entropy loss between input and target, which is commonly used for classification problem. 

We also apply a scheduler for learning rate here to achieve a better model. With the scheduler, the learning rate would be reduced by the factor 0.5 if there is no improvement on the model parameters for the maximum patience epoch 10. The learning rate at the beginning is $2*10^{-4}$, and the lower bound for the learning rate is $10^{-6}$. In this case, we can get a faster convergence at the beginning with relatively big learning rates and then get a preciser model with relatively small learning rates. 

In [13]:
# define optimizer, scheduler, and loss
model = resnet(pretrained = True).cuda()
learning_rate = 2e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()
lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=10, min_lr=1e-6)
n_epochs = 200
current_step = 0
valid_loss_min = math.inf


### 5.3 Training the model
The model is trained with the training data, and evaluted with the validation data. The best model is saved at the minimum loss on the validation data. The loss, accuracy, precision, and recall are recorded at the tensorboard for both the training set and the validation set for evaluation. 

In [14]:
for epoch in range(n_epochs):
    log_and_print(logger, 'Training epoch: {}. Learning rate: {:.4e}'.format(epoch, optimizer.param_groups[0]['lr']))
    train_loss = 0.0
    model.train()
    for i, item in enumerate(train_dataloader):
        data = item['imgs']
        target = item['label']
        data, target = data.cuda(), target.cuda()
        optimizer.zero_grad() # clear the gradients of all optimized variables
        output = model(data) # forward pass: compute predicted outputs by passing inputs to the model
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        tb_logger.add_scalar('loss_train', loss.item(), current_step)
        current_step += 1
        if current_step % 100 == 0:
            log_and_print(logger, 'epoch = {:d} | current step = {:d} | train loss = {:.4f}'.format(epoch, current_step, loss.item()))
        train_loss += loss.item()

    train_loss /= len(train_dataloader)
    log_and_print(logger, 'Average train loss: {:.6f}\n'.format(train_loss))

    log_and_print(logger, 'Validating epoch: {}'.format(epoch))
    valid_loss = 0.0
    TP = 0
    TN = 0
    FP = 0
    FN = 0
    model.eval()
    with torch.no_grad():
        for i, item in enumerate(valid_dataloader):
            data = item['imgs']
            target = item['label']
            data, target = data.cuda(), target.cuda()
            output = model(data) # forward pass: compute predicted outputs by passing inputs to the model
            loss = criterion(output, target)

            label_true = target.item()
            label_pred = torch.argmax(output).item()
            if label_true == 1 and label_pred == 1:
                TP += 1
            elif label_true == 0 and label_pred == 0:
                TN += 1
            elif label_true == 0 and label_pred == 1:
                FP += 1
            elif label_true == 1 and label_pred == 0:
                FN += 1
            else:
                assert False, 'error label'
            if i % 500 == 0:
                log_and_print(logger, 'epoch = {:d} | i = {:d} | valid loss = {:.4f}'.format(epoch, i, loss.item()))
            valid_loss += loss.item()

    valid_loss /= len(valid_dataloader)
    lr_scheduler.step(valid_loss)
    accuracy = (TP + TN) / (TP + TN + FP + FN)
    precision = (TP) / (TP + FP)
    recall = (TP) / (TP + FN)
    tb_logger.add_scalar('loss_valid', valid_loss, epoch)
    tb_logger.add_scalar('accuracy_valid', accuracy, epoch)
    tb_logger.add_scalar('precision_valid', precision, epoch)
    tb_logger.add_scalar('recall_valid', recall, epoch)
    log_and_print(logger, 'Average valid loss: {:.6f} | Accuracy: {:.4f} | Recall: {:.4f} | Precision: {:.4f}'.format(valid_loss, accuracy, recall, precision))

    if valid_loss < valid_loss_min:
        valid_loss_min = valid_loss

        save_dict = {
            "epoch": epoch,
            "iter": current_step,
            "state_dict": model.state_dict(),
            "loss": valid_loss_min,
            "optimizer": optimizer.state_dict(),
            "lr_scheduler": lr_scheduler.state_dict(),
        }
        save_path = os.path.join(experiment_dir, 'checkpoint_best_loss.pth.tar')
        torch.save(save_dict, save_path)
        log_and_print(logger, 'best checkpoint saved!')

    if (epoch + 1) % 10 == 0:
        save_dict = {
            "epoch": epoch,
            "iter": current_step,
            "state_dict": model.state_dict(),
            "loss": valid_loss_min,
            "optimizer": optimizer.state_dict(),
            "lr_scheduler": lr_scheduler.state_dict(),
        }
        save_path = os.path.join(experiment_dir, 'checkpoint_epoch_{:03d}.pth.tar'.format(epoch + 1))
        torch.save(save_dict, save_path)
        log_and_print(logger, 'checkpoint at epoch {:03d} saved!'.format(epoch + 1))

    log_and_print(logger, '')

Training epoch: 0. Learning rate: 2.0000e-04
epoch = 0 | current step = 100 | train loss = 0.1845
epoch = 0 | current step = 200 | train loss = 0.1803
epoch = 0 | current step = 300 | train loss = 0.0238
Average train loss: 0.260384

Validating epoch: 0
epoch = 0 | i = 0 | valid loss = 0.0000
epoch = 0 | i = 500 | valid loss = 0.0000
epoch = 0 | i = 1000 | valid loss = 1.6779
Average valid loss: 0.158219 | Accuracy: 0.9417 | Recall: 0.8750 | Precision: 0.9459
best checkpoint saved!

Training epoch: 1. Learning rate: 2.0000e-04
epoch = 1 | current step = 400 | train loss = 0.0660
epoch = 1 | current step = 500 | train loss = 0.1375
epoch = 1 | current step = 600 | train loss = 0.1671
Average train loss: 0.119841

Validating epoch: 1
epoch = 1 | i = 0 | valid loss = 0.0000
epoch = 1 | i = 500 | valid loss = 0.0002
epoch = 1 | i = 1000 | valid loss = 0.0021
Average valid loss: 0.085271 | Accuracy: 0.9683 | Recall: 0.9500 | Precision: 0.9548
best checkpoint saved!

Training epoch: 2. Learn

epoch = 17 | current step = 5200 | train loss = 0.0088
epoch = 17 | current step = 5300 | train loss = 0.0155
epoch = 17 | current step = 5400 | train loss = 0.0353
Average train loss: 0.020009

Validating epoch: 17
epoch = 17 | i = 0 | valid loss = 0.0000
epoch = 17 | i = 500 | valid loss = 0.0000
epoch = 17 | i = 1000 | valid loss = 0.0017
Average valid loss: 0.075531 | Accuracy: 0.9767 | Recall: 0.9350 | Precision: 0.9947

Training epoch: 18. Learning rate: 1.0000e-04
epoch = 18 | current step = 5500 | train loss = 0.0008
epoch = 18 | current step = 5600 | train loss = 0.0002
epoch = 18 | current step = 5700 | train loss = 0.0021
Average train loss: 0.015632

Validating epoch: 18
epoch = 18 | i = 0 | valid loss = 0.0000
epoch = 18 | i = 500 | valid loss = 0.0000
epoch = 18 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.036623 | Accuracy: 0.9883 | Recall: 0.9875 | Precision: 0.9777
best checkpoint saved!

Training epoch: 19. Learning rate: 1.0000e-04
epoch = 19 | current step

epoch = 34 | current step = 10300 | train loss = 0.1215
epoch = 34 | current step = 10400 | train loss = 0.0001
epoch = 34 | current step = 10500 | train loss = 0.0025
Average train loss: 0.011304

Validating epoch: 34
epoch = 34 | i = 0 | valid loss = 0.0000
epoch = 34 | i = 500 | valid loss = 0.0000
epoch = 34 | i = 1000 | valid loss = 0.0002
Average valid loss: 0.047619 | Accuracy: 0.9842 | Recall: 0.9800 | Precision: 0.9727

Training epoch: 35. Learning rate: 1.0000e-04
epoch = 35 | current step = 10600 | train loss = 0.0006
epoch = 35 | current step = 10700 | train loss = 0.0003
epoch = 35 | current step = 10800 | train loss = 0.0004
Average train loss: 0.010883

Validating epoch: 35
epoch = 35 | i = 0 | valid loss = 0.0000
epoch = 35 | i = 500 | valid loss = 0.0000
epoch = 35 | i = 1000 | valid loss = 0.0001
Average valid loss: 0.041684 | Accuracy: 0.9867 | Recall: 0.9825 | Precision: 0.9776

Training epoch: 36. Learning rate: 1.0000e-04
epoch = 36 | current step = 10900 | train 

epoch = 51 | current step = 15400 | train loss = 0.0004
epoch = 51 | current step = 15500 | train loss = 0.0000
epoch = 51 | current step = 15600 | train loss = 0.0001
Average train loss: 0.002819

Validating epoch: 51
epoch = 51 | i = 0 | valid loss = 0.0000
epoch = 51 | i = 500 | valid loss = 0.0000
epoch = 51 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.040010 | Accuracy: 0.9883 | Recall: 0.9775 | Precision: 0.9874

Training epoch: 52. Learning rate: 5.0000e-05
epoch = 52 | current step = 15700 | train loss = 0.0008
epoch = 52 | current step = 15800 | train loss = 0.0314
epoch = 52 | current step = 15900 | train loss = 0.1170
Average train loss: 0.004944

Validating epoch: 52
epoch = 52 | i = 0 | valid loss = 0.0000
epoch = 52 | i = 500 | valid loss = 0.0000
epoch = 52 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.048680 | Accuracy: 0.9858 | Recall: 0.9750 | Precision: 0.9824

Training epoch: 53. Learning rate: 5.0000e-05
epoch = 53 | current step = 16000 | train 

epoch = 68 | current step = 20600 | train loss = 0.0000
epoch = 68 | current step = 20700 | train loss = 0.0000
Average train loss: 0.000435

Validating epoch: 68
epoch = 68 | i = 0 | valid loss = 0.0000
epoch = 68 | i = 500 | valid loss = 0.0000
epoch = 68 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.049642 | Accuracy: 0.9900 | Recall: 0.9775 | Precision: 0.9924

Training epoch: 69. Learning rate: 1.2500e-05
epoch = 69 | current step = 20800 | train loss = 0.0000
epoch = 69 | current step = 20900 | train loss = 0.0000
epoch = 69 | current step = 21000 | train loss = 0.0021
Average train loss: 0.000322

Validating epoch: 69
epoch = 69 | i = 0 | valid loss = 0.0000
epoch = 69 | i = 500 | valid loss = 0.0000
epoch = 69 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.051609 | Accuracy: 0.9900 | Recall: 0.9850 | Precision: 0.9850
checkpoint at epoch 070 saved!

Training epoch: 70. Learning rate: 1.2500e-05
epoch = 70 | current step = 21100 | train loss = 0.0000
epoch = 70 

epoch = 85 | current step = 25700 | train loss = 0.0000
epoch = 85 | current step = 25800 | train loss = 0.0000
Average train loss: 0.000047

Validating epoch: 85
epoch = 85 | i = 0 | valid loss = 0.0000
epoch = 85 | i = 500 | valid loss = 0.0000
epoch = 85 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.048405 | Accuracy: 0.9908 | Recall: 0.9850 | Precision: 0.9875

Training epoch: 86. Learning rate: 6.2500e-06
epoch = 86 | current step = 25900 | train loss = 0.0000
epoch = 86 | current step = 26000 | train loss = 0.0000
epoch = 86 | current step = 26100 | train loss = 0.0001
Average train loss: 0.000044

Validating epoch: 86
epoch = 86 | i = 0 | valid loss = 0.0000
epoch = 86 | i = 500 | valid loss = 0.0000
epoch = 86 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.049528 | Accuracy: 0.9917 | Recall: 0.9850 | Precision: 0.9899

Training epoch: 87. Learning rate: 6.2500e-06
epoch = 87 | current step = 26200 | train loss = 0.0002
epoch = 87 | current step = 26300 | train 

epoch = 102 | current step = 30800 | train loss = 0.0000
epoch = 102 | current step = 30900 | train loss = 0.0000
Average train loss: 0.000031

Validating epoch: 102
epoch = 102 | i = 0 | valid loss = 0.0000
epoch = 102 | i = 500 | valid loss = 0.0000
epoch = 102 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.056371 | Accuracy: 0.9900 | Recall: 0.9800 | Precision: 0.9899

Training epoch: 103. Learning rate: 1.5625e-06
epoch = 103 | current step = 31000 | train loss = 0.0000
epoch = 103 | current step = 31100 | train loss = 0.0000
epoch = 103 | current step = 31200 | train loss = 0.0000
Average train loss: 0.000192

Validating epoch: 103
epoch = 103 | i = 0 | valid loss = 0.0000
epoch = 103 | i = 500 | valid loss = 0.0000
epoch = 103 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.056583 | Accuracy: 0.9900 | Recall: 0.9825 | Precision: 0.9874

Training epoch: 104. Learning rate: 1.5625e-06
epoch = 104 | current step = 31300 | train loss = 0.0000
epoch = 104 | current step

epoch = 119 | current step = 35800 | train loss = 0.0001
epoch = 119 | current step = 35900 | train loss = 0.0000
epoch = 119 | current step = 36000 | train loss = 0.0000
Average train loss: 0.000033

Validating epoch: 119
epoch = 119 | i = 0 | valid loss = 0.0000
epoch = 119 | i = 500 | valid loss = 0.0000
epoch = 119 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.057899 | Accuracy: 0.9900 | Recall: 0.9800 | Precision: 0.9899
checkpoint at epoch 120 saved!

Training epoch: 120. Learning rate: 1.0000e-06
epoch = 120 | current step = 36100 | train loss = 0.0000
epoch = 120 | current step = 36200 | train loss = 0.0000
epoch = 120 | current step = 36300 | train loss = 0.0000
Average train loss: 0.000017

Validating epoch: 120
epoch = 120 | i = 0 | valid loss = 0.0000
epoch = 120 | i = 500 | valid loss = 0.0000
epoch = 120 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.053695 | Accuracy: 0.9900 | Recall: 0.9825 | Precision: 0.9874

Training epoch: 121. Learning rate: 1.0000

Average valid loss: 0.061692 | Accuracy: 0.9908 | Recall: 0.9800 | Precision: 0.9924

Training epoch: 136. Learning rate: 1.0000e-06
epoch = 136 | current step = 40900 | train loss = 0.0000
epoch = 136 | current step = 41000 | train loss = 0.0000
epoch = 136 | current step = 41100 | train loss = 0.0000
Average train loss: 0.000012

Validating epoch: 136
epoch = 136 | i = 0 | valid loss = 0.0000
epoch = 136 | i = 500 | valid loss = 0.0000
epoch = 136 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.057278 | Accuracy: 0.9892 | Recall: 0.9800 | Precision: 0.9874

Training epoch: 137. Learning rate: 1.0000e-06
epoch = 137 | current step = 41200 | train loss = 0.0000
epoch = 137 | current step = 41300 | train loss = 0.0000
epoch = 137 | current step = 41400 | train loss = 0.0000
Average train loss: 0.000041

Validating epoch: 137
epoch = 137 | i = 0 | valid loss = 0.0000
epoch = 137 | i = 500 | valid loss = 0.0000
epoch = 137 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.0625

epoch = 152 | i = 500 | valid loss = 0.0000
epoch = 152 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.068233 | Accuracy: 0.9917 | Recall: 0.9825 | Precision: 0.9924

Training epoch: 153. Learning rate: 1.0000e-06
epoch = 153 | current step = 46000 | train loss = 0.0000
epoch = 153 | current step = 46100 | train loss = 0.0000
epoch = 153 | current step = 46200 | train loss = 0.0000
Average train loss: 0.000025

Validating epoch: 153
epoch = 153 | i = 0 | valid loss = 0.0000
epoch = 153 | i = 500 | valid loss = 0.0000
epoch = 153 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.057524 | Accuracy: 0.9917 | Recall: 0.9850 | Precision: 0.9899

Training epoch: 154. Learning rate: 1.0000e-06
epoch = 154 | current step = 46300 | train loss = 0.0000
epoch = 154 | current step = 46400 | train loss = 0.0000
epoch = 154 | current step = 46500 | train loss = 0.0000
Average train loss: 0.000005

Validating epoch: 154
epoch = 154 | i = 0 | valid loss = 0.0000
epoch = 154 | i = 500 | va

epoch = 169 | i = 500 | valid loss = 0.0000
epoch = 169 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.062878 | Accuracy: 0.9908 | Recall: 0.9800 | Precision: 0.9924
checkpoint at epoch 170 saved!

Training epoch: 170. Learning rate: 1.0000e-06
epoch = 170 | current step = 51100 | train loss = 0.0000
epoch = 170 | current step = 51200 | train loss = 0.0000
epoch = 170 | current step = 51300 | train loss = 0.0000
Average train loss: 0.000004

Validating epoch: 170
epoch = 170 | i = 0 | valid loss = 0.0000
epoch = 170 | i = 500 | valid loss = 0.0000
epoch = 170 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.061231 | Accuracy: 0.9908 | Recall: 0.9850 | Precision: 0.9875

Training epoch: 171. Learning rate: 1.0000e-06
epoch = 171 | current step = 51400 | train loss = 0.0000
epoch = 171 | current step = 51500 | train loss = 0.0000
epoch = 171 | current step = 51600 | train loss = 0.0000
Average train loss: 0.000023

Validating epoch: 171
epoch = 171 | i = 0 | valid loss = 0.

epoch = 186 | i = 500 | valid loss = 0.0000
epoch = 186 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.067315 | Accuracy: 0.9917 | Recall: 0.9825 | Precision: 0.9924

Training epoch: 187. Learning rate: 1.0000e-06
epoch = 187 | current step = 56200 | train loss = 0.0000
epoch = 187 | current step = 56300 | train loss = 0.0000
epoch = 187 | current step = 56400 | train loss = 0.0000
Average train loss: 0.000005

Validating epoch: 187
epoch = 187 | i = 0 | valid loss = 0.0000
epoch = 187 | i = 500 | valid loss = 0.0000
epoch = 187 | i = 1000 | valid loss = 0.0000
Average valid loss: 0.061629 | Accuracy: 0.9917 | Recall: 0.9825 | Precision: 0.9924

Training epoch: 188. Learning rate: 1.0000e-06
epoch = 188 | current step = 56500 | train loss = 0.0000
epoch = 188 | current step = 56600 | train loss = 0.0000
epoch = 188 | current step = 56700 | train loss = 0.0000
Average train loss: 0.000006

Validating epoch: 188
epoch = 188 | i = 0 | valid loss = 0.0000
epoch = 188 | i = 500 | va

## 6. Testing

We validate and test two models on the splitted validation and test set, respectively; they are the model with the best validation loss (denoted as model_best) and the model at the last epoch of 200 (denoted as model_last).

In [15]:
def test(img_path_list, true_class_list, path):
    from PIL import Image
    import torch
    from torchvision import models
    from torchvision import transforms

    class resnet(torch.nn.Module):
        def __init__(self, pretrained=False):
            super(resnet, self).__init__()
            if pretrained == True:
                model = models.resnet50(pretrained = True)
            else:
                model = models.resnet50(pretrained = False)

            self.slice1 = torch.nn.Sequential()
            self.slice1.add_module(str(1), model.conv1)
            self.slice1.add_module(str(2), model.bn1)
            self.slice1.add_module(str(3), model.relu)
            self.slice1.add_module(str(4), model.maxpool)
            self.slice1.add_module(str(5), model.layer1)
            self.slice1.add_module(str(6), model.layer2)
            self.slice1.add_module(str(7), model.layer3)
            
            model_other = models.resnet50(pretrained = False)

            self.slice2 = model_other.layer4
            self.avgpool = model_other.avgpool

            self.classifier = torch.nn.Sequential()
            self.classifier.add_module(str(1), model_other.fc)
            self.classifier.add_module(str(2), torch.nn.ReLU(inplace=True))
            self.classifier.add_module(str(3), torch.nn.Linear(1000, 2))

        def forward(self, x):
            x = self.slice1(x)
            x = self.slice2(x)
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            x = self.classifier(x)
            return x

    # load the model
    model = resnet(pretrained=False)
    checkpoint = torch.load(path)
    model.load_state_dict(checkpoint['state_dict'])
    model = model.cuda()
    model.eval()

    # prepare data
    transform = transforms.Compose([
                    transforms.Resize(256),
                    transforms.CenterCrop(224),
                    transforms.ToTensor(),
                    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
                ])

    TP, TN, FP, FN = 0, 0, 0, 0
    for img_path, true_class in zip(img_path_list, true_class_list):
        img = Image.open(img_path)
        img = transform(img).unsqueeze(0)
        img = img.cuda()
        with torch.no_grad():
            output = model(img)
        pred_class = torch.argmax(output).item()

        if true_class == 1 and pred_class == 1:
            TP += 1
        elif true_class == 0 and pred_class == 0:
            TN += 1
        elif true_class == 0 and pred_class == 1:
            FP += 1
        elif true_class == 1 and pred_class == 0:
            FN += 1

    accuracy = (TP + TN) / (TP + TN + FP + FN)
    precision = (TP) / (TP + FP)
    recall = (TP) / (TP + FN)
    print('TP = {:d}, TN = {:d}, FP = {:d}, FN = {:d}'.format(TP, TN, FP, FN))

    return accuracy, recall, precision

In [16]:
json_path_test = '/home/felix/disk1/sss/deepfake_detection/data/valid_2-1.json'
data = json.load(open(json_path_test))
img_path_list = [value['img_paths'] for value in data.values()]
true_class_list = [value['label'] for value in data.values()]

# validation for model_best
print('validation results for model_best:')
model_last_epoch = '/home/felix/disk1/sss/deepfake_detection/script/experiments/aug3_v2/checkpoint_best_loss.pth.tar'
accuracy, recall, precision = test(img_path_list, true_class_list, model_last_epoch)
print('accuracy = {:.6f}, recall = {:.6f}, precision = {:.6f}\n'.format(accuracy, recall, precision))

# validation for model_last
print('validation results for model_last:')
model_last_epoch = '/home/felix/disk1/sss/deepfake_detection/script/experiments/aug3_v2/checkpoint_epoch_200.pth.tar'
accuracy, recall, precision = test(img_path_list, true_class_list, model_last_epoch)
print('accuracy = {:.6f}, recall = {:.6f}, precision = {:.6f}'.format(accuracy, recall, precision))

validation results for model_best:
TP = 395, TN = 797, FP = 3, FN = 5
accuracy = 0.993333, recall = 0.987500, precision = 0.992462

validation results for model_last:
TP = 393, TN = 797, FP = 3, FN = 7
accuracy = 0.991667, recall = 0.982500, precision = 0.992424


In [17]:
json_path_test = '/home/felix/disk1/sss/deepfake_detection/data/test_2-1.json'
data = json.load(open(json_path_test))
img_path_list = [value['img_paths'] for value in data.values()]
true_class_list = [value['label'] for value in data.values()]

# testing for model_best
print('testing results for model_best:')
model_last_epoch = '/home/felix/disk1/sss/deepfake_detection/script/experiments/aug3_v2/checkpoint_best_loss.pth.tar'
accuracy, recall, precision = test(img_path_list, true_class_list, model_last_epoch)
print('accuracy = {:.6f}, recall = {:.6f}, precision = {:.6f}\n'.format(accuracy, recall, precision))

# testing for model_last
print('testing results for model_last:')
model_last_epoch = '/home/felix/disk1/sss/deepfake_detection/script/experiments/aug3_v2/checkpoint_epoch_200.pth.tar'
accuracy, recall, precision = test(img_path_list, true_class_list, model_last_epoch)
print('accuracy = {:.6f}, recall = {:.6f}, precision = {:.6f}'.format(accuracy, recall, precision))

testing results for model_best:
TP = 399, TN = 794, FP = 6, FN = 1
accuracy = 0.994167, recall = 0.997500, precision = 0.985185

testing results for model_last:
TP = 399, TN = 797, FP = 3, FN = 1
accuracy = 0.996667, recall = 0.997500, precision = 0.992537


## 7. Conclusion

After training different data spliting methods, different data augmentation methods, and different model structures, we finally validate and test two models (model_best and model_last) on the splitted validation and test set, respectively. 

The corresponding performance on the validation set is as follows (TP: true positive, TN: true negative, FP: false positive, FN: false negative):

validation results for model_best:
* TP = 395, TN = 797, FP = 3, FN = 5
* accuracy = 0.993333, recall = 0.987500, precision = 0.992462

validation results for model_last:
* TP = 393, TN = 797, FP = 3, FN = 7
* accuracy = 0.991667, recall = 0.982500, precision = 0.992424

The corresponding performance on the validation set is as follows (TP: true positive, TN: true negative, FP: false positive, FN: false negative):

testing results for model_best:
* TP = 399, TN = 794, FP = 6, FN = 1
* accuracy = 0.994167, recall = 0.997500, precision = 0.985185

testing results for model_last:
* TP = 399, TN = 797, FP = 3, FN = 1
* accuracy = 0.996667, recall = 0.997500, precision = 0.992537

The two models have similar performance on the test set, where model_last has better accuracy and precision compared with model_best. We can see that the testing performance of our modal is very good and consistent with the validation performance. Apart from the performance, we decide to select the model at the last epoch of 200 (model_last) as our final model, considering the following two reasons:

1. The model at the last epoch of 200 is trained much longer.
2. The model with the best validation loss may somehow overfit on the validation set at early epoch.

So, we decide to submit model_last as our final model.

## 8. Reference
1. S. Karen and Z. Andrew, ”Very Deep Convolutional Networks for Large-Scale Image Recognition,” The 3rd International Conference on Learning Representations (ICLR2015). https://arxiv.org/abs/1409.1556
2. K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
3. R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales and J. Ortega-Garcia,“DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection.” Inf. Fusion 64 (2020): 131-148.