# Introduction

## Foreword

Contains Chris's iterations of NN models. 

Outputs exported to `../Results/model_scores.csv`

## Problem Statement

The rapid evolution of generative artificial intelligence (GPAI, LLMs) social media has rapidly increased the public’s access to powerful, deceptive tools. One such concern is the increasing prevalence of deepfake images, which pose a significant threat to public trust and undermines the epistemic integrity of visual media. (Source).

These manipulated images can be utilized to spread false information, manipulate public opinion, and polarize communities, which can have serious consequences for both social and political discourse. In this project, we aim to develop a machine learning model that can detect differences between deepfakes and real images to combat the spread of manipulated visual media and protect the integrity of social discourse.

## Imports, Global Variables, and Helper Functions

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from functools import partial
import sys

from tensorflow.keras.utils import set_random_seed
from tensorflow.keras.preprocessing.image import img_to_array, load_img
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, BatchNormalization, Input, Rescaling
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.preprocessing import image_dataset_from_directory
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import BinaryAccuracy, AUC, Precision, Recall, TrueNegatives, TruePositives, FalsePositives, FalseNegatives

In [2]:
# Importing global variables
sys.path.append('../../Helper')
import config

# Setting random value
set_random_seed(config.random_seed_value)

## Data Loading

In [11]:
# Extracting w/h
image_shape = img_to_array(load_img("../../../Data/Train/Real/real_1.jpg")).shape
w = image_shape[0]
h = image_shape[1]

In [12]:
# Setting filepaths to image data
train_directory = "../../../Data/Train"
validation_directory = "../../../Data/Validation/"
test_directory = "../../../Data/Test/"
directories = [train_directory, validation_directory, test_directory]

In [13]:
# Loading data
mega_data = []
for i in range(3):
    mega_data.append(image_dataset_from_directory(
        directory=directories[i],
        image_size=(w, h),
        batch_size=64,
        seed=config.random_seed_value,
        label_mode='binary'
    ))

Found 140002 files belonging to 2 classes.
Found 39428 files belonging to 2 classes.
Found 10905 files belonging to 2 classes.


In [14]:
# Variable names to data
train_ds = mega_data[0]
val_ds = mega_data[1]
test_ds = mega_data[2]

# Modeling

## Baseline Model (Sequential)

### Preprocessing (Topology + Compiling)

In [19]:
# instantiate
model = Sequential()

# input layer
model.add(Input(shape=(w, h, 3)))
model.add(Rescaling(1./255))
model.add(BatchNormalization())

# convolutional layers
model.add(Dropout(0.3))
model.add(Conv2D(32, (3,3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Dropout(0.3))
model.add(Conv2D(8, (3,3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())

# hidden layers
model.add(Dropout(0.3))
model.add(Dense(32, activation="relu"))
model.add(BatchNormalization())

model.add(Dropout(0.3))
model.add(Dense(8, activation="relu"))
model.add(BatchNormalization())

# output layer
model.add(Dense(1, activation="sigmoid"))

# compile
model.compile(
    optimizer=Adam(learning_rate=0.005),
    loss=BinaryCrossentropy(),
    metrics=config.standard_metrics
)

### Fitting

In [20]:
# es = EarlyStopping(patience=3)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
                              patience=5, min_lr=0.0005)
res = model.fit(train_ds,
                validation_data=val_ds, 
                epochs=20,
                callbacks=[reduce_lr])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


### Saving Files

In [28]:
'''
loss: 0.2020 - binary_accuracy: 0.9176 - auc: 0.9747 - precision: 0.9100 - recall: 0.9270 - true_negatives: 63583.0000 - true_positives: 64889.0000 - false_positives: 6418.0000 - false_negatives: 5112.0000
val_loss: 0.4216 - val_binary_accuracy: 0.8216 - val_auc: 0.9169 - val_precision: 0.7723 - val_recall: 0.9139 - val_true_negatives: 14310.0000 - val_true_positives: 18083.0000 - val_false_positives: 5331.0000 - val_false_negatives: 1704.0000 - lr: 1.0000e-03
'''
data = [[0.2020, 0.9176, 0.9100, 0.9270, 0.9747, 5112, 6418, 63583, 64889, 0.4216, 0.8216, 0.7723, 0.9139, 0.9169, 1704, 5331, 14310, 18083]]
df = pd.DataFrame(data=np.array(data), columns=config.column_names, index=["cnn_baseline"])
df

Unnamed: 0,train_loss,train_acc,train_precision,train_recall,train_auc,train_fn,train_fp,train_tn,train_tp,val_loss,val_acc,val_precision,val_recall,val_auc,val_fn,val_fp,val_tn,val_tp
cnn_baseline,0.202,0.9176,0.91,0.927,0.9747,5112.0,6418.0,63583.0,64889.0,0.4216,0.8216,0.7723,0.9139,0.9169,1704.0,5331.0,14310.0,18083.0


In [29]:
test = pd.read_csv("../../Results/model_eval.csv")
test

Unnamed: 0.1,Unnamed: 0,train_loss,train_acc,train_precision,train_recall,train_auc,train_fn,train_fp,train_tn,train_tp,val_loss,val_acc,val_precision,val_recall,val_auc,val_fn,val_fp,val_tn,val_tp
0,models,,,,,,,,,,,,,,,,,,
1,model_1,0.046418,0.983007,0.981008,0.985086,0.998207,1044.0,1335.0,68666.0,68957.0,0.728149,0.859973,0.848666,0.877445,0.918686,2425.0,3096.0,16545.0,17362.0
2,efficientnetv2-b0_retrain,0.003119,0.977746,0.99427,0.963431,0.993228,86257.0,89180.0,3385.0,608.0,0.119601,0.965126,0.991854,0.949819,0.982413,18614.0,19439.0,1027.0,348.0
3,cnn_reid,0.052528,0.978279,0.998641,0.963815,0.993872,67389.0,69572.0,2612.0,429.0,0.367717,0.902252,0.965261,0.853517,0.972052,16340.0,19234.0,3301.0,553.0


In [30]:
df.to_csv("../../Results/model_eval.csv", mode="a", header=False)

In [31]:
test = pd.read_csv("../../Results/model_eval.csv")
test

Unnamed: 0.1,Unnamed: 0,train_loss,train_acc,train_precision,train_recall,train_auc,train_fn,train_fp,train_tn,train_tp,val_loss,val_acc,val_precision,val_recall,val_auc,val_fn,val_fp,val_tn,val_tp
0,models,,,,,,,,,,,,,,,,,,
1,model_1,0.046418,0.983007,0.981008,0.985086,0.998207,1044.0,1335.0,68666.0,68957.0,0.728149,0.859973,0.848666,0.877445,0.918686,2425.0,3096.0,16545.0,17362.0
2,efficientnetv2-b0_retrain,0.003119,0.977746,0.99427,0.963431,0.993228,86257.0,89180.0,3385.0,608.0,0.119601,0.965126,0.991854,0.949819,0.982413,18614.0,19439.0,1027.0,348.0
3,cnn_reid,0.052528,0.978279,0.998641,0.963815,0.993872,67389.0,69572.0,2612.0,429.0,0.367717,0.902252,0.965261,0.853517,0.972052,16340.0,19234.0,3301.0,553.0
4,cnn_baseline,0.202,0.9176,0.91,0.927,0.9747,5112.0,6418.0,63583.0,64889.0,0.4216,0.8216,0.7723,0.9139,0.9169,1704.0,5331.0,14310.0,18083.0
