<a href="https://www.kaggle.com/code/aruneembhowmick/real-vs-ai-generated-face-classifier-resnet50?scriptVersionId=200925808" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Introduction

In this notebook, we will build a binary classification model to distinguish between **real human faces** and **AI-generated images**. With the growing sophistication of AI-generated media, being able to accurately classify real and generated content has become increasingly important in fields like security, digital art, and media verification. Using a pre-trained **ResNet50** architecture, we'll fine-tune a model to tackle this problem with a custom dataset.

## Key Steps
1. **Data Preparation**: We'll start by organizing the dataset into training, validation, and test sets. Our dataset consists of two categories: "Real Images" and "AI-Generated Images." We'll shuffle and split the data into 70% training, 15% validation, and 15% test sets to ensure robust evaluation of the model's performance.
2. Model Development with Transfer Learning: Using **transfer learning** with **ResNet50**, we'll leverage the power of this pre-trained model, originally trained on **ImageNet**, to extract features from our images. By freezing the convolutional base and adding a custom fully connected layer, we fine-tune the model for our binary classification task.
3. **Data Augmentation and Preprocessing**: We'll preprocess the images using the `ImageDataGenerator` to rescale pixel values and generate batches of data for training and validation. This step helps ensure the model is trained efficiently with normalized inputs.
4. **Model Training and Evaluation**: The model will be trained using binary cross-entropy loss, with accuracy as the main evaluation metric. We'll track the training and validation accuracy to monitor the model's learning process over several epochs. Once training is complete, we'll evaluate the model on the test set to determine its generalization ability.
5. **Visualizing Predictions**: Finally, we will implement a function to visually inspect the model's predictions by displaying real and AI-generated images alongside their true labels and predicted outcomes. This will help us analyze the model's performance in greater detail.

![](https://miro.medium.com/v2/resize:fit:1400/0*nwS9AyXvtcU1Ek1a.jpg)

# 1.) Dataset Preparation and Splitting

We organize our dataset of human faces by splitting it into training, validation, and test sets. Our dataset contains two categories of images: **Real Images** and **AI-Generated Images**. The code prepares these images for model training by distributing them into appropriate directories while maintaining a 70/15/15 split across training, validation, and test sets.

## Key Steps
1. **Defining Paths**: We begin by specifying the paths to the original dataset, which contains subdirectories for **Real Images** and **AI-Generated Images**. We then set up the paths for our output directories (`/kaggle/working/`) where the images will be copied for training, validation, and testing purposes.
2. **Creating Directories**: The code creates subdirectories under `train`, `val`, and `test` for both **Real Images** and **AI-Generated Images** using the `os.makedirs()` function. The `exist_ok = True` ensures that directories are created only if they don't already exist.
3. **Splitting Ratios**: We define the split ratios:
    * **70%** for the training set
    * **15%** for the validation set
    * **15%** for the test set
   
   These ratios are stored as `train_split`, `val_split`, and `test_split` respectively.
4. **Image Shuffling and Distribution**: The `split_and_copy_images` function handles the process of copying images from the source directories to the respective train, validation, and test folders. First, it retrieves a list of images in the source directory and shuffles them randomly using `random.shuffle()`, ensuring that the image split is random. Then, it calculates how many images will go into each split (train, validation, test) based on the total number of images and the defined split ratios. Finally, it copies each image into its designated directory using `shutil.copy()`. The function works for both **Real Images** and **AI-Generated Images**.

In [None]:
import shutil
import random
import os

dataset_path = '/kaggle/input/human-faces-dataset/Human Faces Dataset'
real_images_path = os.path.join(dataset_path, 'Real Images')
ai_generated_images_path = os.path.join(dataset_path, 'AI-Generated Images')

output_path = '/kaggle/working/'
train_path = os.path.join(output_path, 'train')
val_path = os.path.join(output_path, 'val')
test_path = os.path.join(output_path, 'test')

for path in [train_path, val_path, test_path]:
    os.makedirs(os.path.join(path, 'Real Images'), exist_ok = True)
    os.makedirs(os.path.join(path, 'AI-Generated Images'), exist_ok = True)

train_split = 0.7
val_split = 0.15
test_split = 0.15

def split_and_copy_images(source_dir, dest_dirs, split_ratios):
    images = os.listdir(source_dir)
    random.shuffle(images)
    
    train_size = int(len(images) * split_ratios[0])
    val_size = int(len(images) * split_ratios[1])
    
    for i, img in enumerate(images):
        if i < train_size:
            dest_dir = dest_dirs[0]
        elif i < train_size + val_size:
            dest_dir = dest_dirs[1]
        else:
            dest_dir = dest_dirs[2]
            
        shutil.copy(os.path.join(source_dir, img), os.path.join(dest_dir, img))

split_and_copy_images(real_images_path, [os.path.join(train_path, 'Real Images'), os.path.join(val_path, 'Real Images'), os.path.join(test_path, 'Real Images')], [train_split, val_split, test_split])
split_and_copy_images(ai_generated_images_path, [os.path.join(train_path, 'AI-Generated Images'), os.path.join(val_path, 'AI-Generated Images'), os.path.join(test_path, 'AI-Generated Images')], [train_split, val_split, test_split])

## Result
After running this code, we have our dataset neatly organized in the following folder structure:

`/kaggle/working/
    ├── train/
    │   ├── Real Images/
    │   └── AI-Generated Images/
    ├── val/
    │   ├── Real Images/
    │   └── AI-Generated Images/
    └── test/
        ├── Real Images/
        └── AI-Generated Images/`

# 2.) Data Preprocessing and Augmentation with `ImageDataGenerator`
We use TensorFlow’s `ImageDataGenerator` to handle the loading and preprocessing of images for our deep learning model. The generator simplifies the process of loading batches of images from directories, applying real-time data augmentation, and preparing them for training, validation, and testing. Below is an explanation of the key components:

## Key Steps
1. **Image Dimensions and Batch Size**: We specify the target dimensions for our images as **224 x 224** pixels (which is the default input size for the **ResNet50** model), and set the **batch size** to 32, meaning the model will process 32 images at a time during training.
2. **Rescaling**: Each image’s pixel values are normalized by rescaling them from the range [0, 255] to the range [0, 1] using `rescale = 1./255`. This is important as it helps the neural network converge faster during training by standardizing the input values.
3. **Data Generators**: We create three `ImageDataGenerator` instances for training, validation, and testing, each responsible for loading images from specific directories.
    * `train_datagen`: Used for generating the training data.
    * `val_datagen`: Used for generating validation data during model training.
    * `test_datagen`: Used for generating test data to evaluate the model after training.
4. **Flow from Directory**: The `flow_from_directory` function allows us to load images directly from folders:
    * `train_path`, `val_path`, and `test_path`: These are the paths where the training, validation, and test images are stored, respectively.
    * `target_size`: Specifies the dimensions to which all images will be resized (224 x 224 in this case).
    * `batch_size`: Defines how many images will be processed in one iteration.
    * `class_mode`: We use binary because this is a binary classification problem (real vs AI-generated images).

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

img_height, img_width = 224, 224
batch_size = 32

train_datagen = ImageDataGenerator(rescale = 1./255)
val_datagen = ImageDataGenerator(rescale = 1./255)
test_datagen = ImageDataGenerator(rescale = 1./255)

train_generator = train_datagen.flow_from_directory(
    train_path,
    target_size = (img_height, img_width),
    batch_size = batch_size,
    class_mode = 'binary'
)

val_generator = val_datagen.flow_from_directory(
    val_path,
    target_size = (img_height, img_width),
    batch_size = batch_size,
    class_mode = 'binary'
)

test_generator = test_datagen.flow_from_directory(
    test_path,
    target_size = (img_height, img_width),
    batch_size = batch_size,
    class_mode = 'binary'
)

## Result 
* **Training Generator** (`train_generator`): Loads images from the training directory, resizes them to 224x224, normalizes their pixel values, and feeds them to the model in batches of 32.
* **Validation Generator** (`val_generator`): Works similarly to the training generator but for validation data, used to monitor model performance during training.
* **Test Generator** (`test_generator`): Loads the test data in the same manner and will be used for final evaluation after training.

## Benefits of Using `ImageDataGenerator`
* Efficient loading and processing of images in batches.
* Real-time augmentation (if needed) during training.
* Automatic shuffling of training data for better generalization.
* Scalable for large datasets without loading everything into memory at once.

This process ensures that our images are properly preprocessed and ready for model training and evaluation.

# 3.) Model Architecture: Transfer Learning with ResNet50

We leverage **transfer learning** to build a deep learning model for binary classification. Specifically, we use the **ResNet50** architecture as the base model, which is pretrained on the **ImageNet** dataset. Here's a breakdown of the steps and layers involved:

## Key Steps
1. **Loading the Pretrained ResNet50 Model**: We begin by loading the **ResNet50** model using `tensorflow.keras.applications.ResNet50`. The key parameters include:
    * `weights = 'imagenet'`: This means that we are using the weights that were pre-trained on the ImageNet dataset.
    * `include_top = False`: We exclude the top classification layer because we will be adding our custom classification layers.
    * `input_shape = (img_height, img_width, 3)`: We define the input shape for the images, where `img_height` and `img_width` are the height and width of our input images, and `3` corresponds to the RGB channels.
2. **Adding Custom Layers**: After obtaining the output of the base ResNet50 model, we add our own layers:
    * `GlobalAveragePooling2D()`: This layer performs global average pooling, reducing the spatial dimensions of the output from the ResNet50 model while retaining important features.
    * `Dense(128, activation = 'relu')`: We add a fully connected dense layer with 128 units and ReLU activation to introduce non-linearity and learn higher-level patterns.
    * `Dense(1, activation = 'sigmoid')`: Finally, we add a dense layer with a single output neuron and a sigmoid activation function for binary classification.
3. **Freezing the Pretrained Layers**: Since ResNet50 was pre-trained on a large dataset (ImageNet), we **freeze its layers** to prevent them from being updated during training. This allows us to focus training on the newly added custom layers. Freezing is achieved by setting `layer.trainable = False` for each layer in the base model.
4. **Compiling the Model**: The model is compiled with the following parameters:
    * **Optimizer**: We use the Adam optimizer, which is well-suited for deep learning tasks and adapts the learning rate during training.
    * **Loss Function**: `binary_crossentropy` is used as the loss function because this is a binary classification problem.
    * **Metrics**: We monitor accuracy during training to evaluate the model's performance.

In [None]:
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model

base_model = ResNet50(weights = 'imagenet', include_top = False, input_shape = (img_height, img_width, 3))

x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(128, activation ='relu')(x)
predictions = Dense(1, activation ='sigmoid')(x)

model = Model(inputs = base_model.input, outputs = predictions)

for layer in base_model.layers:
    layer.trainable = False

model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

## Result (Generated Model Architecture):
* Pretrained ResNet50 (with ImageNet weights)
* Global Average Pooling layer
* Dense layer with 128 units (ReLU activation)
* Dense output layer with 1 unit (Sigmoid activation) for binary classification

This setup allows us to benefit from ResNet50's ability to extract powerful features from images, while fine-tuning a small custom network on top of it to perform our specific binary classification task.

# 4.) Model Training with `fit()`

We train our binary classification model using the `.fit()` method from **TensorFlow**. The training process involves feeding batches of images from the training set into the model and using the validation set to evaluate the model's performance after each epoch.

## Key Components
1. **Training Data** (`train_generator`): The `train_generator` supplies the batches of preprocessed images (from the training directory) to the model during training. These batches are loaded and processed by the `ImageDataGenerator`, ensuring that images are fed into the model in a normalized format and shuffled for better generalization.
2. **Validation Data** (`val_generator`): The `validation_data` parameter is set to the `val_generator`, which provides validation images after each epoch. This helps in monitoring how well the model is performing on unseen data, allowing us to track overfitting or underfitting during training.
3. **Epochs** (`epochs = 10`): The epochs parameter defines how many times the model will see the entire dataset. In this case, the model will go through 10 full training cycles. More epochs generally help the model learn better, but too many epochs can lead to overfitting, where the model becomes too specific to the training data.

## **Training Process**
* **Forward Pass**: For each batch of images in `train_generator`, the model makes predictions and calculates the loss.
* **Backward Pass (Backpropagation)**: Based on the calculated loss, the model adjusts its weights using the optimizer (in this case, Adam) to reduce the error in future predictions.
* **Validation Check**: At the end of each epoch, the model evaluates itself on the validation set using `val_generator`. This helps us monitor how well the model generalizes to unseen data.

The `.fit()` method stores the training history in the history object, which includes the following:
* Training loss and accuracy for each epoch.
* Validation loss and accuracy, which give us insights into how the model performs on unseen data.

This setup allows us to visualize the model's learning process, such as observing whether the model is improving over time, converging to a solution, or overfitting.

In [None]:
history = model.fit(
    train_generator,
    validation_data = val_generator,
    epochs = 10
)

# 5.) Model Evaluation with `evaluate()`
We evaluate the model's performance on the test set to see how well it generalizes to completely unseen data. This is done using the `model.evaluate()` function, which computes the loss and accuracy on the test data.

## Key Components
1. **Test Data (`test_generator`)**: The `test_generator` provides batches of test images from the test directory, which the model has not seen during training or validation. These images are normalized (rescaled) and resized similarly to the training and validation sets.
2. **Evaluation Metrics**: The `evaluate()` method returns two key metrics:
* `test_loss`: This represents how well the model performs on the test data in terms of loss (binary cross-entropy in this case). A lower loss indicates better performance.
* `test_acc`: This is the accuracy of the model on the test set, showing how many predictions were correct compared to the actual labels. A higher accuracy score indicates that the model generalizes well to unseen data.
3. **Printing the Results**: After evaluating the model, we print the test loss and test accuracy to assess the model's overall performance.

## Key Points
* **Test Loss**: This value measures how well the model fits the test data. If the test loss is significantly higher than the training or validation loss, it may indicate overfitting.
* **Test Accuracy**: This provides a clear metric to understand the model's predictive power. A high test accuracy (close to 100%) indicates that the model can generalize well, while a low test accuracy might suggest the need for further tuning or that the dataset is challenging.

Evaluating the model on the test set gives a final confirmation of how well the model performs in real-world scenarios, making it an essential step in the model development process.

In [None]:
test_loss, test_acc = model.evaluate(test_generator)
print(f'Test Loss: {test_loss}')
print(f'Test Accuracy: {test_acc}')

# 6.) Displaying Model Predictions with Correct Labels

We develop a function intended to provide a clear visualization of the model's predictions for a batch of images, allowing us to see how well it performs and where it may be making errors. Here's a detailed breakdown of the steps and logic involved:

## Key Components 
1. **Loading a Batch of Images**: The function begins by calling `next(generator)` to load a batch of images and their corresponding true labels from the specified generator (in this case, the validation generator).
    * `images`: A batch of images (converted into a NumPy array) that will be used for prediction.
    * `true_labels`: The true labels for these images, indicating whether they are "Real" or "AI-Generated."
    
2. **Model Predictions**: Using the model, we generate predictions for each image in the batch. These predictions are probabilities, and by applying a threshold (0.5 by default), we classify them into either the "Real" or "AI-Generated" categories.
    * `predicted_labels`: Binary labels created by thresholding the predicted probabilities (i.e., if the predicted probability is ≥ 0.5, the image is classified as "Real"; otherwise, it is classified as "AI-Generated").
    
3. **Handling Labels**: The labels are processed for display:
    * **True labels and predicted labels** are converted into human-readable strings ("Real" or "AI Generated") to make the output more intuitive.
    * **Correct predictions** are identified by comparing the true labels with the predicted labels.
    
4. **Creating a Table**: The data for each image (index, true label, predicted label, and whether the prediction was correct) is stored in a pandas `DataFrame`. This table is returned at the end of the function, making it easy to review and analyze the model's performance.

5. **Plotting Images with Predictions**: The function generates a plot that displays the first 20 images from the batch, each with:
    * The true label.
    * The predicted label.
    * Whether the prediction was correct (True/False).
    
    The images are displayed in a grid layout using matplotlib to give a visual summary of the model's performance.


## Benefits
* **Visual Feedback**: Seeing both the images and the predictions allows you to better understand how the model is performing. You can easily spot where it makes mistakes and identify patterns in its errors.
* **Comprehensive Table**: The accompanying table gives a quick overview of which predictions were correct and incorrect, helping in error analysis.
* **Customizable**: The function allows you to specify how many images to display (via the num_images parameter) and adjust the threshold for classification.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

def show_predictions_table_with_correct_labels(model, generator, num_images = 20, threshold = 0.5):
    images, true_labels = next(generator)
    images, true_labels = images[:num_images], true_labels[:num_images]
    
    predictions = model.predict(images)
    predicted_labels = (predictions >= threshold).astype(int)
  
    if true_labels.ndim > 1:
        true_labels = np.argmax(true_labels, axis=1)
    
    true_label_names = ["Real" if label == 1 else "AI Generated" for label in true_labels]
    predicted_label_names = ["Real" if label == 1 else "AI Generated" for label in predicted_labels]
    correct_predictions = predicted_labels.flatten() == true_labels.flatten()

    df = pd.DataFrame({
        "Image Index": list(range(num_images)),
        "True Label": true_label_names,
        "Predicted Label": predicted_label_names,
        "Correct": correct_predictions
    })

    fig, axes = plt.subplots(5, 4, figsize = (15, 12))

    for i, ax in enumerate(axes.flat):
        ax.imshow(images[i])
        ax.set_title(f"True: {df['True Label'][i]}\nPred: {df['Predicted Label'][i]}\nCorrect: {df['Correct'][i]}")
        ax.axis('off')

    plt.tight_layout()
    plt.show()

    return df

show_predictions_table_with_correct_labels(model, val_generator, num_images = 20, threshold = 0.5)

## Conclusion

In this notebook, we successfully developed a deep learning model capable of distinguishing between **real human faces** and **AI-generated images** using a transfer learning approach with **ResNet50**. By leveraging the pre-trained model, we were able to efficiently extract features from images and fine-tune the model for our binary classification task.

### Key Takeaways:

1. **Data Preparation and Splitting**:  
   We effectively organized and split the dataset into training, validation, and test sets, ensuring a balanced approach for model training and performance evaluation.

2. **Transfer Learning with ResNet50**:  
   Utilizing the ResNet50 architecture enabled us to capitalize on its feature extraction capabilities, allowing us to focus on training the classifier without having to build the entire model from scratch. By freezing the convolutional layers and training custom dense layers, we tailored the model to the specific task of face classification.

3. **Model Training and Performance**:  
   Our model achieved good results during training, with accuracy metrics indicating that it learned to differentiate between real and AI-generated images effectively. We validated the model's performance not only through quantitative metrics but also by visually inspecting the predictions for a subset of images.

4. **Visualization of Predictions**:  
   The visualization of model predictions, along with correct labels, allowed us to gain insights into the areas where the model excelled and where it struggled. This visual approach complements the accuracy metrics and helps in understanding the strengths and limitations of the model.

### Future Work:

- **Data Augmentation**: To further improve the model's robustness, we could explore more advanced data augmentation techniques, such as rotation, flipping, and contrast adjustment, to simulate various real-world conditions.
  
- **Fine-tuning the Entire Model**: In this notebook, we froze the ResNet50 base model layers. Unfreezing some or all of the layers and fine-tuning the entire network could lead to even better performance.
  
- **Exploring Other Architectures**: While ResNet50 performed well, experimenting with other architectures such as **EfficientNet** or **Inception** could provide insights into whether these models perform better in this specific task.