In [1]:
import os
import util
import numpy as np
import pandas as pd
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Input, TimeDistributed, GRU, Dense, Dropout, Lambda, GlobalAveragePooling2D, BatchNormalization
from keras.callbacks import EarlyStopping
from keras.optimizers import Adam
from sklearn.metrics import accuracy_score

Working plan - Since it's seen that MobileNetV2 + GRU gave an adequately good result, I want to try the architecture (the best alone or the top 5) on some variations:
1. on the same videos decomposed into more number of frames (I did 16, try 32 and 64)
2. on a different dataset (I used DFD, try CelebDF)
3. using xceptionnet or some other pretrained model (so adjust image sizes accordingly)

Make generic functions so that any data, any number of frames, and any pretrained model can be used. Save all the best ones.

In [None]:
base_dir = r'data'
data_dir = os.path.join(base_dir,'CelebDF')

In [3]:
seed = 42
np.random.seed(seed)
tf.random.set_seed(seed)

# Convert Video to Frames & Split Data

In [4]:
X_train, X_val, X_test, y_train, y_val, y_test = util.load_split_3d_data(data)

In [5]:
X_train.shape, X_test.shape, X_val.shape

((280, 16, 224, 224, 3), (60, 16, 224, 224, 3), (60, 16, 224, 224, 3))

In [6]:
num_frames, img_size = X_train.shape[1], X_train.shape[2:4]
print(num_frames, img_size)

16 (224, 224)


In [7]:
y_train.shape, y_test.shape, y_val.shape

((280,), (60,), (60,))

In [8]:
num_train, num_val, num_test = X_train.shape[0], X_val.shape[0], X_test.shape[0]
num_train, num_val, num_test

(280, 60, 60)

## Factorizing Target

In [9]:
labels = pd.factorize(y_val)[1]
print(labels)

['real' 'fake']


In [10]:
y_train, y_val, y_test = pd.factorize(y_train)[0], pd.factorize(y_val)[0], pd.factorize(y_test)[0]

In [11]:
y_train.shape, y_test.shape, y_val.shape

((280,), (60,), (60,))

# MobileNetV2 & GRU

In [12]:
base_cnn = keras.applications.MobileNetV2(
    input_shape=img_size+(3,),
    include_top=False,
    weights='imagenet')
# unfreezing a few of the last layers
base_cnn.trainable = True
for layer in base_cnn.layers[:-30]:
    layer.trainable = False

In [13]:
# adding data augmentation for make model more robust
data_augmentation = Sequential([
    keras.layers.RandomFlip("horizontal"),
    keras.layers.RandomRotation(0.1),
    keras.layers.RandomZoom(0.1),
    keras.layers.RandomBrightness(0.2),
    keras.layers.RandomContrast(0.2)
])

In [14]:
(num_frames,)+img_size+(3,)

(16, 224, 224, 3)

In [15]:
model = Sequential()
model.add(Input(shape=(num_frames,)+img_size+(3,)))
# Using TimeDistributed to apply Augmentation, MobileNetV2 Preprocessing, & MobileNetV2 CNN frame-by-frame
model.add(TimeDistributed(data_augmentation))
model.add(TimeDistributed(Lambda(keras.applications.mobilenet_v2.preprocess_input)))
model.add(TimeDistributed(base_cnn))
model.add(TimeDistributed(GlobalAveragePooling2D()))
model.add(GRU(256, return_sequences=False))
model.add(Dropout(0.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.summary()




In [16]:
model.compile(
    optimizer=Adam(learning_rate=1e-5),
    loss='binary_crossentropy',
    metrics=['accuracy'])

In [15]:
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=5,
                      restore_best_weights=True, verbose=1)
model.fit(X_train, y_train, 
           validation_data=(X_val, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

Epoch 1/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m177s[0m 10s/step - accuracy: 0.4536 - loss: 0.9709 - val_accuracy: 0.5333 - val_loss: 0.7297
Epoch 2/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m61s[0m 7s/step - accuracy: 0.5321 - loss: 0.8538 - val_accuracy: 0.5167 - val_loss: 0.7160
Epoch 3/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m61s[0m 7s/step - accuracy: 0.5214 - loss: 0.8434 - val_accuracy: 0.5500 - val_loss: 0.7121
Epoch 4/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m62s[0m 7s/step - accuracy: 0.4821 - loss: 0.8644 - val_accuracy: 0.5000 - val_loss: 0.7065
Epoch 5/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m61s[0m 7s/step - accuracy: 0.5357 - loss: 0.8430 - val_accuracy: 0.4667 - val_loss: 0.7049
Epoch 6/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m60s[0m 7s/step - accuracy: 0.5143 - loss: 0.8350 - val_accuracy: 0.4667 - val_loss: 0.7067
Epoch 7/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x19f43734110>

In [16]:
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print("Test Accuracy:", test_accuracy)
print("Test Loss:", test_loss)

Test Accuracy: 0.4833333194255829
Test Loss: 0.7482662796974182


In [20]:
model.save('artifacts/mobilenetv2_gru.keras')

**Didn't perform as good as the majority voting technique. The reason is the small size of the dataset (just 400 videos in total). Deep Neural Networks particularly one that includes some type of RNNs will require several hundred thousands of samples to train and generalize well.**

* We know that the MobileNetV2 model is able to extract features from the frames, because it is already trained on ImageNet (1.2M images), hence, it knows how to see.
* So, instead of training that base model again and again and wasting the 400 videos teaching it basic vision, we can simply obtain those features as 'embeddings'.Then these embeddings can be passed to the GRU or any other model for the final classification task.
* The use of this is that we can add regularization and try tuning without having to include the training of base-CNN as well, reducing the training time and number of learnable parameters.
* Each embedding vector will be like a semantic summary of the video frames. So, feeding those instead of raw pixels, should make it easier for the classifier to learn, even with little data.

# Embeddings & GRU

In [12]:
def get_embeddings(X_frames, batch_size, num_data, num_frames):
    base_cnn = keras.applications.MobileNetV2(
        input_shape=img_size+(3,),
        include_top=False,
        weights='imagenet',
        pooling='avg')
    preprocessed = keras.applications.mobilenet_v2.preprocess_input(X_frames)
    embeddings = base_cnn.predict(preprocessed, batch_size=batch_size, verbose=1)
    print("Embeddings shape:",embeddings.shape)
    return embeddings.reshape(num_data, num_frames, -1)

In [13]:
train_frames, val_frames, test_frames = util.convert_3d_to_2d(split=3,
                                                         train=(X_train, y_train),
                                                         val=(X_val,y_val),
                                                         test=(X_test, y_test))
X_train_frames, _ = train_frames
X_val_frames, _ = val_frames
X_test_frames, _ = test_frames

In [14]:
train_embeddings = get_embeddings(X_train_frames, batch_size, num_train, num_frames)
train_embeddings.shape

[1m140/140[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 411ms/step
Embeddings shape: (4480, 1280)


(280, 16, 1280)

In [15]:
val_embeddings = get_embeddings(X_val_frames, 8, num_val, num_frames)
val_embeddings.shape

[1m120/120[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 136ms/step
Embeddings shape: (960, 1280)


(60, 16, 1280)

In [16]:
test_embeddings = get_embeddings(X_val_frames, 8, num_val, num_frames)
test_embeddings.shape

[1m120/120[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 194ms/step
Embeddings shape: (960, 1280)


(60, 16, 1280)

In [17]:
num_features = train_embeddings.shape[-1]
num_features

1280

In [95]:
LR = 0.00001

In [96]:
model1 = Sequential([
    Input(shape=(num_frames, num_features)),
    GRU(128, return_sequences=False),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])
model1.summary()

In [97]:
model1.compile(
    optimizer=Adam(learning_rate=LR),
    loss='binary_crossentropy',
    metrics=['accuracy'])
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=10,
                      restore_best_weights=True, verbose=1)
model1.fit(train_embeddings, y_train, 
           validation_data=(val_embeddings, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

Epoch 1/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 403ms/step - accuracy: 0.4821 - loss: 0.8666 - val_accuracy: 0.4667 - val_loss: 0.7930
Epoch 2/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 109ms/step - accuracy: 0.4714 - loss: 0.8088 - val_accuracy: 0.4500 - val_loss: 0.7853
Epoch 3/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 187ms/step - accuracy: 0.4964 - loss: 0.7754 - val_accuracy: 0.4167 - val_loss: 0.7822
Epoch 4/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 129ms/step - accuracy: 0.5000 - loss: 0.7860 - val_accuracy: 0.4167 - val_loss: 0.7802
Epoch 5/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 109ms/step - accuracy: 0.4893 - loss: 0.7676 - val_accuracy: 0.4333 - val_loss: 0.7781
Epoch 6/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 108ms/step - accuracy: 0.5393 - loss: 0.7696 - val_accuracy: 0.4333 - val_loss: 0.7761
Epoch 7/500
[1m9/9[0m [32m━━━━

<keras.src.callbacks.history.History at 0x213628c2690>

In [98]:
test_loss1, test_accuracy1 = model1.evaluate(test_embeddings, y_test, verbose=0)
print("Test Accuracy:", test_accuracy1)
print("Test Loss:", test_loss1)

Test Accuracy: 0.550000011920929
Test Loss: 0.718789279460907


In [99]:
model2 = Sequential([
    Input(shape=(num_frames, num_features)),
    GRU(256, return_sequences=False),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid'),
])
model2.summary()

In [100]:
model2.compile(
    optimizer=Adam(learning_rate=LR),
    loss='binary_crossentropy',
    metrics=['accuracy'])
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=10,
                      restore_best_weights=True, verbose=1)
model2.fit(train_embeddings, y_train, 
           validation_data=(val_embeddings, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

Epoch 1/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 634ms/step - accuracy: 0.4857 - loss: 0.9411 - val_accuracy: 0.5167 - val_loss: 0.7769
Epoch 2/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 172ms/step - accuracy: 0.5000 - loss: 0.8654 - val_accuracy: 0.5333 - val_loss: 0.7458
Epoch 3/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 202ms/step - accuracy: 0.4750 - loss: 0.8590 - val_accuracy: 0.5833 - val_loss: 0.7280
Epoch 4/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 155ms/step - accuracy: 0.5179 - loss: 0.8107 - val_accuracy: 0.5500 - val_loss: 0.7172
Epoch 5/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 137ms/step - accuracy: 0.5179 - loss: 0.7987 - val_accuracy: 0.5167 - val_loss: 0.7107
Epoch 6/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 128ms/step - accuracy: 0.5214 - loss: 0.7749 - val_accuracy: 0.5000 - val_loss: 0.7083
Epoch 7/500
[1m9/9[0m [32m━━━━

<keras.src.callbacks.history.History at 0x21390be20c0>

In [101]:
test_loss2, test_accuracy2 = model2.evaluate(test_embeddings, y_test, verbose=0)
print("Test Accuracy:", test_accuracy2)
print("Test Loss:", test_loss2)

Test Accuracy: 0.6166666746139526
Test Loss: 0.67915278673172


In [102]:
model3 = Sequential([
    Input(shape=(num_frames, num_features)),
    GRU(256, return_sequences=True),
    GRU(256, return_sequences=False),
    Dropout(0.5),
    Dense(1, activation='sigmoid'),
])
model3.summary()

In [103]:
model3.compile(
    optimizer=Adam(learning_rate=LR),
    loss='binary_crossentropy',
    metrics=['accuracy'])
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=10,
                      restore_best_weights=True, verbose=1)
model3.fit(train_embeddings, y_train, 
           validation_data=(val_embeddings, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

Epoch 1/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 895ms/step - accuracy: 0.5643 - loss: 0.7153 - val_accuracy: 0.4500 - val_loss: 0.7214
Epoch 2/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 231ms/step - accuracy: 0.5036 - loss: 0.7490 - val_accuracy: 0.4667 - val_loss: 0.7136
Epoch 3/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 204ms/step - accuracy: 0.5179 - loss: 0.7347 - val_accuracy: 0.4667 - val_loss: 0.7093
Epoch 4/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 225ms/step - accuracy: 0.4964 - loss: 0.7296 - val_accuracy: 0.5000 - val_loss: 0.7062
Epoch 5/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 212ms/step - accuracy: 0.5036 - loss: 0.7366 - val_accuracy: 0.5167 - val_loss: 0.7035
Epoch 6/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 177ms/step - accuracy: 0.5321 - loss: 0.7219 - val_accuracy: 0.5167 - val_loss: 0.7008
Epoch 7/500
[1m9/9[0m [32m━━━━

<keras.src.callbacks.history.History at 0x213953d4770>

In [104]:
test_loss3, test_accuracy3 = model3.evaluate(test_embeddings, y_test, verbose=0)
print("Test Accuracy:", test_accuracy3)
print("Test Loss:", test_loss3)

Test Accuracy: 0.6666666865348816
Test Loss: 0.6623032689094543


In [105]:
model4 = Sequential([
    Input(shape=(num_frames, num_features)),
    GRU(256, return_sequences=True),
    GRU(256, return_sequences=False),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid'),
])
model4.summary()

In [106]:
model4.compile(
    optimizer=Adam(learning_rate=LR),
    loss='binary_crossentropy',
    metrics=['accuracy'])
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=10,
                      restore_best_weights=True, verbose=1)
model4.fit(train_embeddings, y_train, 
           validation_data=(val_embeddings, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

Epoch 1/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 527ms/step - accuracy: 0.4643 - loss: 0.7880 - val_accuracy: 0.4667 - val_loss: 0.7058
Epoch 2/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 68ms/step - accuracy: 0.5357 - loss: 0.7671 - val_accuracy: 0.5000 - val_loss: 0.7021
Epoch 3/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 60ms/step - accuracy: 0.4714 - loss: 0.7720 - val_accuracy: 0.4833 - val_loss: 0.7005
Epoch 4/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 62ms/step - accuracy: 0.4821 - loss: 0.7670 - val_accuracy: 0.5167 - val_loss: 0.7004
Epoch 5/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 119ms/step - accuracy: 0.4786 - loss: 0.7529 - val_accuracy: 0.4833 - val_loss: 0.7003
Epoch 6/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 153ms/step - accuracy: 0.4893 - loss: 0.7445 - val_accuracy: 0.4500 - val_loss: 0.6998
Epoch 7/500
[1m9/9[0m [32m━━━━━━━

<keras.src.callbacks.history.History at 0x21395add880>

In [107]:
test_loss4, test_accuracy4 = model4.evaluate(test_embeddings, y_test, verbose=0)
print("Test Accuracy:", test_accuracy4)
print("Test Loss:", test_loss4)

Test Accuracy: 0.5666666626930237
Test Loss: 0.6789262294769287


In [108]:
model5 = Sequential([
    Input(shape=(num_frames, num_features)),
    GRU(256, return_sequences=False),
    Dropout(0.5),
    Dense(256, activation='relu'),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid'),
])
model5.summary()

In [109]:
model5.compile(
    optimizer=Adam(learning_rate=LR),
    loss='binary_crossentropy',
    metrics=['accuracy'])
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=10,
                      restore_best_weights=True, verbose=1)
model5.fit(train_embeddings, y_train, 
           validation_data=(val_embeddings, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

Epoch 1/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 486ms/step - accuracy: 0.4821 - loss: 0.8488 - val_accuracy: 0.4500 - val_loss: 0.7103
Epoch 2/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 77ms/step - accuracy: 0.4250 - loss: 0.9178 - val_accuracy: 0.4167 - val_loss: 0.7098
Epoch 3/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 115ms/step - accuracy: 0.5250 - loss: 0.7944 - val_accuracy: 0.4500 - val_loss: 0.7090
Epoch 4/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 190ms/step - accuracy: 0.5071 - loss: 0.8340 - val_accuracy: 0.4500 - val_loss: 0.7085
Epoch 5/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 78ms/step - accuracy: 0.5321 - loss: 0.7924 - val_accuracy: 0.4833 - val_loss: 0.7083
Epoch 6/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 57ms/step - accuracy: 0.5429 - loss: 0.7688 - val_accuracy: 0.5333 - val_loss: 0.7085
Epoch 7/500
[1m9/9[0m [32m━━━━━━━

<keras.src.callbacks.history.History at 0x2139a8b3260>

In [110]:
test_loss5, test_accuracy5 = model5.evaluate(test_embeddings, y_test, verbose=0)
print("Test Accuracy:", test_accuracy5)
print("Test Loss:", test_loss5)

Test Accuracy: 0.4833333194255829
Test Loss: 0.7083225250244141


In [111]:
model6 = Sequential([
    Input(shape=(num_frames, num_features)),
    GRU(256, return_sequences=True),
    BatchNormalization(),
    GRU(256, return_sequences=False),
    BatchNormalization(),
    Dense(128, activation='relu'),
    BatchNormalization(),
    Dense(1, activation='sigmoid'),
])
model6.summary()

In [112]:
model6.compile(
    optimizer=Adam(learning_rate=LR),
    loss='binary_crossentropy',
    metrics=['accuracy'])
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=10,
                      restore_best_weights=True, verbose=1)
model6.fit(train_embeddings, y_train, 
           validation_data=(val_embeddings, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

Epoch 1/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 322ms/step - accuracy: 0.4500 - loss: 0.9287 - val_accuracy: 0.4833 - val_loss: 0.7343
Epoch 2/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 220ms/step - accuracy: 0.4857 - loss: 0.8289 - val_accuracy: 0.4500 - val_loss: 0.7337
Epoch 3/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 189ms/step - accuracy: 0.5107 - loss: 0.7576 - val_accuracy: 0.4333 - val_loss: 0.7333
Epoch 4/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 191ms/step - accuracy: 0.5500 - loss: 0.6991 - val_accuracy: 0.4167 - val_loss: 0.7335
Epoch 5/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 194ms/step - accuracy: 0.6214 - loss: 0.6501 - val_accuracy: 0.4167 - val_loss: 0.7340
Epoch 6/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 167ms/step - accuracy: 0.6643 - loss: 0.6089 - val_accuracy: 0.4500 - val_loss: 0.7348
Epoch 7/500
[1m9/9[0m [32m━━━━

<keras.src.callbacks.history.History at 0x2139a8b3c50>

In [113]:
test_loss6, test_accuracy6 = model6.evaluate(test_embeddings, y_test, verbose=0)
print("Test Accuracy:", test_accuracy6)
print("Test Loss:", test_loss6)

Test Accuracy: 0.4333333373069763
Test Loss: 0.7332785725593567


In [114]:
model7 = Sequential([
    Input(shape=(num_frames, num_features)),
    GRU(256, return_sequences=True),
    GRU(256, return_sequences=False),
    Dropout(0.5),
    Dense(256, activation='relu'),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid'),
])
model7.summary()

In [115]:
model7.compile(
    optimizer=Adam(learning_rate=LR),
    loss='binary_crossentropy',
    metrics=['accuracy'])
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=10,
                      restore_best_weights=True, verbose=1)
model7.fit(train_embeddings, y_train, 
           validation_data=(val_embeddings, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

Epoch 1/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 670ms/step - accuracy: 0.4929 - loss: 0.7564 - val_accuracy: 0.4667 - val_loss: 0.7015
Epoch 2/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 194ms/step - accuracy: 0.5250 - loss: 0.7444 - val_accuracy: 0.4500 - val_loss: 0.7020
Epoch 3/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 187ms/step - accuracy: 0.5286 - loss: 0.7112 - val_accuracy: 0.4500 - val_loss: 0.7020
Epoch 4/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 214ms/step - accuracy: 0.4750 - loss: 0.7571 - val_accuracy: 0.4500 - val_loss: 0.7013
Epoch 5/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 206ms/step - accuracy: 0.4821 - loss: 0.7547 - val_accuracy: 0.4500 - val_loss: 0.7010
Epoch 6/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 190ms/step - accuracy: 0.4857 - loss: 0.7636 - val_accuracy: 0.4333 - val_loss: 0.7008
Epoch 7/500
[1m9/9[0m [32m━━━━

<keras.src.callbacks.history.History at 0x2134145dd60>

In [116]:
test_loss7, test_accuracy7 = model7.evaluate(test_embeddings, y_test, verbose=0)
print("Test Accuracy:", test_accuracy7)
print("Test Loss:", test_loss7)

Test Accuracy: 0.46666666865348816
Test Loss: 0.6986681818962097


In [117]:
model8 = Sequential([
    Input(shape=(num_frames, num_features)),
    GRU(256, return_sequences=True),
    GRU(256, return_sequences=False),
    Dropout(0.5),
    Dense(512, activation='relu'),
    Dropout(0.5),
    Dense(256, activation='relu'),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid'),
])
model8.summary()

In [118]:
model8.compile(
    optimizer=Adam(learning_rate=LR),
    loss='binary_crossentropy',
    metrics=['accuracy'])
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=10,
                      restore_best_weights=True, verbose=1)
model8.fit(train_embeddings, y_train, 
           validation_data=(val_embeddings, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

Epoch 1/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m46s[0m 879ms/step - accuracy: 0.5179 - loss: 0.7840 - val_accuracy: 0.5000 - val_loss: 0.7045
Epoch 2/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 151ms/step - accuracy: 0.5036 - loss: 0.7731 - val_accuracy: 0.5000 - val_loss: 0.7025
Epoch 3/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 110ms/step - accuracy: 0.4964 - loss: 0.7820 - val_accuracy: 0.5167 - val_loss: 0.7012
Epoch 4/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 196ms/step - accuracy: 0.4893 - loss: 0.7772 - val_accuracy: 0.4833 - val_loss: 0.7003
Epoch 5/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 189ms/step - accuracy: 0.4786 - loss: 0.7339 - val_accuracy: 0.4833 - val_loss: 0.6995
Epoch 6/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 179ms/step - accuracy: 0.5321 - loss: 0.7303 - val_accuracy: 0.5167 - val_loss: 0.6987
Epoch 7/500
[1m9/9[0m [32m━━━━

<keras.src.callbacks.history.History at 0x213a2942840>

In [119]:
test_loss8, test_accuracy8 = model8.evaluate(test_embeddings, y_test, verbose=0)
print("Test Accuracy:", test_accuracy8)
print("Test Loss:", test_loss8)

Test Accuracy: 0.5166666507720947
Test Loss: 0.6958596706390381


In [120]:
model9 = Sequential([
    Input(shape=(num_frames, num_features)),
    GRU(256, return_sequences=True),
    GRU(256, return_sequences=True),
    GRU(256, return_sequences=False),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid'),
])
model9.summary()

In [None]:
model9.compile(
    optimizer=Adam(learning_rate=LR),
    loss='binary_crossentropy',
    metrics=['accuracy'])
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=10,
                      restore_best_weights=True, verbose=1)
model9.fit(train_embeddings, y_train, 
           validation_data=(val_embeddings, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

Epoch 1/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 920ms/step - accuracy: 0.5179 - loss: 0.7134 - val_accuracy: 0.5000 - val_loss: 0.6932
Epoch 2/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 228ms/step - accuracy: 0.4821 - loss: 0.7267 - val_accuracy: 0.5333 - val_loss: 0.6927
Epoch 3/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 154ms/step - accuracy: 0.5393 - loss: 0.6992 - val_accuracy: 0.5500 - val_loss: 0.6922
Epoch 4/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 187ms/step - accuracy: 0.5214 - loss: 0.7086 - val_accuracy: 0.5167 - val_loss: 0.6914
Epoch 5/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 157ms/step - accuracy: 0.5071 - loss: 0.7084 - val_accuracy: 0.5333 - val_loss: 0.6909
Epoch 6/500
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 183ms/step - accuracy: 0.5429 - loss: 0.7007 - val_accuracy: 0.5333 - val_loss: 0.6907
Epoch 7/500
[1m9/9[0m [32m━━━━

In [None]:
test_loss9, test_accuracy9 = model9.evaluate(test_embeddings, y_test, verbose=0)
print("Test Accuracy:", test_accuracy9)
print("Test Loss:", test_loss9)

In [None]:
model10 = Sequential([
    Input(shape=(num_frames, num_features)),
    GRU(256, return_sequences=True),
    GRU(256, return_sequences=True),
    GRU(256, return_sequences=False),
    Dropout(0.5),
    Dense(256, activation='relu'),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid'),
])
model10.summary()

In [None]:
model10.compile(
    optimizer=Adam(learning_rate=LR),
    loss='binary_crossentropy',
    metrics=['accuracy'])
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=10,
                      restore_best_weights=True, verbose=1)
model10.fit(train_embeddings, y_train, 
           validation_data=(val_embeddings, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

In [None]:
test_loss10, test_accuracy10 = model10.evaluate(test_embeddings, y_test, verbose=0)
print("Test Accuracy:", test_accuracy10)
print("Test Loss:", test_loss10)

In [None]:
model11 = Sequential([
    Input(shape=(num_frames, num_features)),
    GRU(256, return_sequences=True),
    GRU(256, return_sequences=True),
    GRU(256, return_sequences=False),
    Dropout(0.5),
    Dense(512, activation='relu'),
    Dropout(0.5),
    Dense(256, activation='relu'),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid'),
])
model11.summary()

In [None]:
model11.compile(
    optimizer=Adam(learning_rate=LR),
    loss='binary_crossentropy',
    metrics=['accuracy'])
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=10,
                      restore_best_weights=True, verbose=1)
model11.fit(train_embeddings, y_train, 
           validation_data=(val_embeddings, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

In [None]:
test_loss11, test_accuracy11 = model11.evaluate(test_embeddings, y_test, verbose=0)
print("Test Accuracy:", test_accuracy11)
print("Test Loss:", test_loss11)

In [None]:
model12 = Sequential([
    Input(shape=(num_frames, num_features)),
    GRU(256, return_sequences=True),
    BatchNormalization(),
    GRU(256, return_sequences=True),
    BatchNormalization(),
    GRU(256, return_sequences=False),
    Dropout(0.5),
    Dense(256, activation='relu'),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid'),
])
model12.summary()

In [None]:
model12.compile(
    optimizer=Adam(learning_rate=LR),
    loss='binary_crossentropy',
    metrics=['accuracy'])
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=10,
                      restore_best_weights=True, verbose=1)
model12.fit(train_embeddings, y_train, 
           validation_data=(val_embeddings, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

In [None]:
test_loss12, test_accuracy12 = model12.evaluate(test_embeddings, y_test, verbose=0)
print("Test Accuracy:", test_accuracy12)
print("Test Loss:", test_loss12)

In [None]:
model13 = Sequential([
    Input(shape=(num_frames, num_features)),
    GRU(256, return_sequences=True),
    BatchNormalization(),
    GRU(256, return_sequences=False),
    BatchNormalization(),
    Dense(256, activation='relu'),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid'),
])
model13.summary()

In [None]:
model13.compile(
    optimizer=Adam(learning_rate=LR),
    loss='binary_crossentropy',
    metrics=['accuracy'])
estop = EarlyStopping(monitor='val_loss', mode='min',
                      min_delta=1e-5, patience=10,
                      restore_best_weights=True, verbose=1)
model13.fit(train_embeddings, y_train, 
           validation_data=(val_embeddings, y_val),
           epochs=500, batch_size=batch_size,
           callbacks=[estop], verbose=1)

In [None]:
test_loss13, test_accuracy13 = model13.evaluate(test_embeddings, y_test, verbose=0)
print("Test Accuracy:", test_accuracy13)
print("Test Loss:", test_loss13)

In [None]:
model_records = [
    (model1, test_accuracy1),
    (model2, test_accuracy2),
    (model3, test_accuracy3),
    (model4, test_accuracy4),
    (model5, test_accuracy5),
    (model6, test_accuracy6),
    (model7, test_accuracy7),
    (model8, test_accuracy8),
    (model9, test_accuracy9),
    (model10, test_accuracy10),
    (model11, test_accuracy11),
    (model12, test_accuracy12),
    (model13, test_accuracy13),
]
best = np.argmax([accuracy for model,accuracy in model_records])
best_model, best_acc = model_records[best]
print("Best accuracy:", best_acc)
best_model.summary()

In [None]:
best_model.save('artifacts/best_mobilenetv2_embeddings_gru.keras')

* Adding more FC layers and more GRUs seems to improve the performance, but when the number of FCs exceed GRUs the performance drops.
* Including dropouts between GRU and FC layer and between the FCs, also result in better performing models, while including BatchNormalization gives mixed results.
* The best performing achitecture is one with 2 GRUs and 2 FCs with dropouts between GRU-FC and between FCs, with an accuracy of **66.7%** which is also the highest accuracy obtained among all the experimented models. The second highest accuracy seen is **63.3%** from the model having 3 GRUs with BatchNormalization between every pair and 2 FC layers with dropouts before and after each.

**It should also be noted that these models were extremely quick to train, which made trying out several different architectures very easy.**

* This architecture can be tuned further with the inclusion of different regularization parameters, different dropout rates, different optimizers, and momentum-based or scheduled learning rates.
  * Since, the training is fast as it is, adding momentum may not necessarily help.
  * Scheduling the learning rates and making it slower after a while may have higher scope of giving an improvement, even though all of the above trials used a small learning rate of 1e-5 (perhaps even smaller learning rates could help in this case).

* Furthermore, we could also try other pretrained models for obtaining the embeddings. But we need to take care of the sizes of the images that are fed into those models.
  * densenet, efficientnetb0, mobilenetv2, resnet50 -> 224x224
  * xception, inceptionv3 -> 299x299
  * efficientnetb3 -> 300x300

**Regardless of the model and technique used, we don't appear to get any high values of accuracy. This is due to the small size of the dataset and also because of the nature of the dataset. The deepfake videos aren't entirely AI-generated, instead the faces/expressions alone, of the people in the videos, have been swapped/altered. So, our model needs to identify the fakeness of the video from a very small spatial range of the frames. That is a sensitive task, and a model will only be able to handle that if it were fed a significantly large dataset to learn from.**