# GANFingerprint Deepfake Detection

This notebook provides a complete walkthrough of training, evaluating, and using the GANFingerprint deepfake detection model. The model is designed to detect GAN-generated fake images by analyzing subtle fingerprint patterns in both spatial and frequency domains.

## What are GAN Fingerprints?
GAN Fingerprints are distinctive patterns or traces that are unintentionally embedded in images generated by Generative Adversarial Networks (GANs). These GAN fingerprints are akin to real human fingerprints, with the comparison that humans unintentionally leave fingerprints on the items they touch, that can be used to trace their identities. Just like human fingerprints, these GAN Fingerprints are unique to the GAN architecture the images are generated from, due to these factors:

1. Each GAN architecture has its own unique way of generating images based on its specific design, loss functions, and optimization methods.

2. Even GANs with identical architectures but different training datasets, random initializations, or hyperparameters will produce images with subtly different characteristics.

## Objective of the project

With GAN image generation images getting more advanced, there may be difficulties identifying deepfake images through existing methods, such as detecting distortions in facial features and image details. Through our project, we hope to create a deepfake detection model that can identify deepfake images reliably, no matter how realistic the generated images are to the human eye. By customizing and creating a model that can discriminate deepfake images from real ones through their GAN Fingerprint profiles, we hope to come up with a more sophisticated model which can capture details invisible to the human eye.

## 1. Setup and Dependencies

#### First, let's install all necessary dependencies so that the model runs properly.

In [None]:
# Create Virtual Environment for model
!python3 -m venv myenv

#### Next: Activate the Virtual Environment (manually in terminal)

Open a terminal in the notebook's folder, then run:

- **On macOS/Linux/WSL**:
  ```bash
  source myenv/bin/activate
  ```

- **On Windows**:
  ```cmd
  myenv\Scripts\activate.bat
  ```

Once activated, your terminal prompt will show `(myenv)`, meaning you're using the virtual environment.


#### Next, install all necessary dependencies using requirements.txt

In [None]:
# Installation cell - Run this to install all necessary dependencies
%pip install -r requirements.txt

# Install PyTorch with CUDA support (needed for this model to run efficiently):
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Then, let's verify that we're running from the correct directory and all required files are present.

In [None]:
import os
import sys

# Add the current directory to the Python path
sys.path.append('.')

# Check if all required files exist
files_needed = [
    "config.py",
    "data_loader.py",
    "models/__init__.py",
    "models/fingerprint_net.py", 
    "models/layers.py",
    "train.py",
    "evaluate.py",
    "inference.py",
    "utils/metrics.py",
    "utils/experiment.py",
    "utils/visualization.py",
    "utils/reproducibility.py",
    "utils/augmentations.py"
]

missing = [f for f in files_needed if not os.path.exists(f)]
if missing:
    print("❌ Missing required files:")
    for f in missing:
        print(f"  - {f}")
    print("\nPlease run this notebook from the project root directory")
else:
    print("✅ All required files found!")

### Model Directory and dataset required

If configured properly, the model should have the following directory layout:

```
deepfake_detector/
├── config.py                 # Configuration parameters
├── data_loader.py            # Dataset and dataloader implementation
├── models/
│   ├── __init__.py           # Module initialization
│   ├── fingerprint_net.py    # GANFingerprint model architecture
│   ├── layers.py             # Custom layers and blocks
├── train.py                  # Training script
├── evaluate.py               # Evaluation script
├── inference.py              # Inference on new images
├── utils/
│   ├── __init__.py           # Utilities module initialization
│   ├── reproducibility.py    # Random seed and reproducibility utilities
│   ├── visualization.py      # Plotting and visualization tools
│   ├── metrics.py            # Performance metrics calculation
│   ├── augmentations.py      # Advanced augmentation techniques
|   ├── experiment.py         # Logging of information when training model
├── checkpoints/              # Directory for saved model checkpoints
├── logs/                     # TensorBoard logs and training records
```

### Dataset needed and dataset configuration

The dataset used to train the model is the 'deepfake and real images' dataset by Manjil Kariki.

Link: https://www.kaggle.com/datasets/manjilkarki/deepfake-and-real-images

This dataset provides a large dataset of real and deepfake images, split into train, test and validation sets, making it one of the best datasets to be used for deepfake classification models.

To configure the dataset, create a file named 'data' in the __root folder__ and follow this directory layout:



```
deepfake_detector/
├── config.py                 
├── data_loader.py
data/
├── train/
│   ├── real/   # Real images
│   └── fake/   # Fake/deepfake images
├── validation/
│   ├── real/
│   └── fake/
└── test/
    ├── real/
    └── fake/
├── models/
│   ├── __init__.py          
│   ├── fingerprint_net.py    
│   ├── layers.py             
├── train.py                  
├── evaluate.py               
├── inference.py             
├── utils/
│   ├── __init__.py           
│   ├── reproducibility.py    
│   ├── visualization.py      
│   ├── metrics.py            
│   ├── augmentations.py      
|   ├── experiment.py         
├── checkpoints/              
├── logs/                     
```

## 2. Check Pytorch and CUDA status

In [None]:
# Check PyTorch version and CUDA availability
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

## Import the necessary packages

In [None]:
import matplotlib.pyplot as plt
from PIL import Image
import torchvision.transforms as transforms

# Import custom modules
import config
from data_loader import get_dataset_stats
from models import FingerprintNet

## 4. Display Current Configuration

The model allows for configuration of hyperparameters. The hyperparameters that led to the best results are listed below.

In [None]:
# Hyperparameters
config.LEARNING_RATE = 5e-5
config.WEIGHT_DECAY = 1e-5
config.DROPOUT_RATE = 0.40

config.DATA_ROOT = "data"  # Path to your dataset directory
config.BATCH_SIZE = 16     # Adjust based on GPU memory
config.NUM_WORKERS = 4     # Number of data loading workers
config.NUM_EPOCHS = 20     # Number of training epochs
config.EARLY_STOPPING_PATIENCE = 5 # Number of times model will run with no metrics improvement before stopping
config.DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Display current configuration
print("Current Configuration:")
print(f"DATA_ROOT: {config.DATA_ROOT}")
print(f"INPUT_SIZE: {config.INPUT_SIZE}")
print(f"BACKBONE: {config.BACKBONE}")
print(f"BATCH_SIZE: {config.BATCH_SIZE}")
print(f"EARLY_STOPPING_PATIENCE: {config.EARLY_STOPPING_PATIENCE}")
print(f"LEARNING_RATE: {config.LEARNING_RATE}")
print(f"WEIGHT_DECAY: {config.WEIGHT_DECAY}")
print(f"NUM_EPOCHS: {config.NUM_EPOCHS}")
print(f"DROPOUT_RATE: {config.DROPOUT_RATE}")
print(f"DEVICE: {config.DEVICE}")
print(f"USE_AMP: {config.USE_AMP}")
print(f"CHECKPOINT_DIR: {config.CHECKPOINT_DIR}")
print(f"LOG_DIR: {config.LOG_DIR}")

## 6. Check Dataset Structure

In [None]:
# Check dataset structure
def check_dataset_structure():
    paths = [
        config.TRAIN_REAL_DIR,
        config.TRAIN_FAKE_DIR,
        config.VAL_REAL_DIR,
        config.VAL_FAKE_DIR,
        config.TEST_REAL_DIR,
        config.TEST_FAKE_DIR
    ]
    
    for path in paths:
        if not os.path.exists(path):
            print(f"❌ {path} does not exist!")
        else:
            print(f"✅ {path} exists with {len(os.listdir(path))} images")

check_dataset_structure()
get_dataset_stats()

## 7. Display Sample Images

In [None]:
# Display some sample images from the dataset
def show_samples(real_dir, fake_dir, n=5):
    transform = transforms.Compose([
        transforms.Resize(config.INPUT_SIZE),
        transforms.CenterCrop(config.INPUT_SIZE),
        transforms.ToTensor()
    ])
    
    # Check if directories exist
    if not os.path.exists(real_dir) or not os.path.exists(fake_dir):
        print(f"Error: One or more directories do not exist:\n{real_dir}\n{fake_dir}")
        return
    
    # Get image lists
    real_files = os.listdir(real_dir)
    fake_files = os.listdir(fake_dir)
    
    if not real_files or not fake_files:
        print("Error: One or more directories are empty")
        return
    
    real_images = [os.path.join(real_dir, f) for f in real_files[:n]]
    fake_images = [os.path.join(fake_dir, f) for f in fake_files[:n]]
    
    plt.figure(figsize=(15, 6))
    for i, img_path in enumerate(real_images + fake_images):
        img = Image.open(img_path).convert('RGB')
        img_tensor = transform(img)
        
        plt.subplot(2, n, i + 1)
        plt.imshow(img_tensor.permute(1, 2, 0))
        plt.title("Real" if i < n else "Fake")
        plt.axis('off')
    plt.tight_layout()
    plt.show()

# Show samples from training set
try:
    show_samples(config.TRAIN_REAL_DIR, config.TRAIN_FAKE_DIR)
except Exception as e:
    print(f"Error displaying samples: {e}")

## 8. Initialize the model with overview of trainable paramaters

In [None]:
# Initialize the model
model = FingerprintNet(backbone=config.BACKBONE)
model = model.to(config.DEVICE)

# Count model parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

# Print model architecture summary
print(model)
print(f"Total trainable parameters: {count_parameters(model):,}")

## 9. Training the model

In [None]:
from train import train as train_function

# Wrapper for the training function
class Args:
    def __init__(self, **kwargs):
        for key, value in kwargs.items():
            setattr(self, key, value)

def train_model(data_root=config.DATA_ROOT, 
                batch_size=config.BATCH_SIZE, 
                lr=config.LEARNING_RATE, 
                epochs=config.NUM_EPOCHS, 
                backbone=config.BACKBONE,
                no_amp=not config.USE_AMP, 
                resume_checkpoint=None):
    
    # Override config values if needed
    config.DATA_ROOT = data_root
    config.BATCH_SIZE = batch_size
    config.LEARNING_RATE = lr
    config.NUM_EPOCHS = epochs
    config.BACKBONE = backbone
    config.USE_AMP = not no_amp
    
    # Create args object
    args = Args(
        data_root=data_root,
        batch_size=batch_size,
        lr=lr,
        epochs=epochs,
        backbone=backbone,
        no_amp=no_amp,
        resume_checkpoint=resume_checkpoint
    )
    
    # Call the training function
    train_function(args)
    
    # Return the path to the best checkpoint
    return os.path.join(config.CHECKPOINT_DIR, f"ganfingerprint_best.pth")

# Train the model 
best_checkpoint = train_model()

# To resume training from a checkpoint (uncomment to run):
# best_checkpoint = train_model(resume_checkpoint="checkpoints/ganfingerprint_epoch10.pth")

## 10. Evaluating the trained model

In [None]:
from evaluate import evaluate as evaluate_function

# Wrapper for the evaluation function
def evaluate_model(checkpoint_path, output_dir="eval_results"):
    """
    Evaluate the model on the test set.
    
    Args:
        checkpoint_path: Path to the model checkpoint
        output_dir: Directory to save evaluation results
    """
    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)
    
    # Call the evaluation function
    evaluate_function(checkpoint_path, output_dir)
        
    # Display the generated images
    image_paths = [
        os.path.join(output_dir, "confusion_matrix.png"),
        os.path.join(output_dir, "roc_curve.png"),
        os.path.join(output_dir, "precision_recall_curve.png")
    ]
    

# Evaluate the model (Insert relative directory to trained model in checkpoints folder to run the evaluation)
evaluate_model("checkpoints\Run_19_learning_rate_0.00005\ganfingerprint_20250411_125832_best.pth")

## 11. Testing the model with actual images

After training and evaluating the model, we can try out the capabilities of the model by having it classify any image that is outside of the dataset. Lets see if it is able to classify images correctly!

### Single image inference

In [None]:
import torch.serialization
import os
from inference import run_inference

# Add numpy scalar to safe globals for PyTorch 2.6+ compatibility
torch.serialization.add_safe_globals(['numpy._core.multiarray.scalar'])

# Function to run single image inference
def run_single_inference(checkpoint_path, image_path, output_dir=None):
    """
    Run inference on a single image using functions from inference.py
    """
    # Fix path separators if needed
    checkpoint_path = checkpoint_path.replace('\\', '/')
    image_path = image_path.replace('\\', '/')
    if output_dir:
        output_dir = output_dir.replace('\\', '/')
        os.makedirs(output_dir, exist_ok=True)
    
    # Run inference
    print(f"Running inference on: {image_path}")
    print(f"Using checkpoint: {checkpoint_path}")
    run_inference(checkpoint_path, image_path, output_dir, batch_mode=False)

model_checkpoint = "checkpoints\Run_19_learning_rate_0.00005\Ganfingerprint_20250411_125832_best.pth"
test_image_path =  "data/test/fake/fake_23.jpg"

run_inference(model_checkpoint, test_image_path)

### Batch Inference

In [None]:
def run_batch_inference(checkpoint_path, image_dir, output_dir="inference_results"):
    """
    Run inference on a directory of images using functions from inference.py
    """
    # Fix path separators if needed
    checkpoint_path = checkpoint_path.replace('\\', '/')
    image_dir = image_dir.replace('\\', '/')
    output_dir = output_dir.replace('\\', '/')
    os.makedirs(output_dir, exist_ok=True)
    
    # Run inference
    print(f"Running batch inference on directory: {image_dir}")
    print(f"Using checkpoint: {checkpoint_path}")
    run_inference(checkpoint_path, image_dir, output_dir, batch_mode=True)

# Run inference on a single image (uncomment to run)
test_image_path = "test_images/fake_21.jpg"
checkpoint_path = "checkpoints/Run_19_learning_rate_0.00005/ganfingerprint_20250411_125832_best.pth"
run_single_inference(checkpoint_path, test_image_path, "inference_results")

# Run inference on a directory of images (uncomment to run)
# test_dir = "test_images"
# run_batch_inference(checkpoint_path, test_dir, "inference_results")

# Interactive Demo

In [None]:
# # Interactive image upload and prediction (using ipywidgets)
# from ipywidgets import FileUpload, Button, Output, HBox, VBox, HTML
# from IPython.display import display, clear_output
# import io

# def create_interactive_demo(checkpoint_path):
#     # Load model
#     model = FingerprintNet(backbone=config.BACKBONE)
#     checkpoint = torch.load(checkpoint_path, map_location=config.DEVICE, weights_only=False)
#     model.load_state_dict(checkpoint['model_state_dict'])
#     model = model.to(config.DEVICE)
#     model.eval()
    
#     # Transform for inference
#     transform = transforms.Compose([
#         transforms.Resize(config.INPUT_SIZE),
#         transforms.CenterCrop(config.INPUT_SIZE),
#         transforms.ToTensor(),
#         transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
#     ])
    
#     # Create widgets
#     file_upload = FileUpload(accept='image/*', multiple=False)
#     predict_button = Button(description='Predict')
#     output = Output()
    
#     # Define button click handler
#     def on_predict_button_clicked(b):
#         with output:
#             clear_output()
            
#             if not file_upload.value:
#                 print("Please upload an image first.")
#                 return
            
#             # Get the uploaded file
#             uploaded_file = list(file_upload.value.values())[0]
#             content = uploaded_file['content']
            
#             # Convert to PIL Image
#             image = Image.open(io.BytesIO(content)).convert('RGB')
            
#             # Preprocess and make prediction
#             image_tensor = transform(image).unsqueeze(0).to(config.DEVICE)
            
#             with torch.no_grad():
#                 if config.USE_AMP:
#                     with torch.cuda.amp.autocast():
#                         output = model(image_tensor)
#                 else:
#                     output = model(image_tensor)
            
#             # Get probability and class
#             prob = torch.sigmoid(output).item()
#             pred_class = "Real" if prob >= 0.5 else "Fake"
            
#             # Display results
#             color = 'green' if pred_class == 'Real' else 'red'
#             display(HTML(f"<h3 style='color:{color}'>Prediction: {pred_class} (Confidence: {prob:.4f})</h3>"))
            
#             # Display image
#             plt.figure(figsize=(8, 8))
#             plt.imshow(image)
#             plt.axis('off')
#             plt.title(f"Prediction: {pred_class} ({prob:.4f})", color=color, fontsize=16)
#             plt.show()
    
#     # Connect the button click event
#     predict_button.on_click(on_predict_button_clicked)
    
#     # Create UI layout
#     display(HTML("<h2>Deepfake Detection Demo</h2>"))
#     display(HTML("<p>Upload an image and click 'Predict' to detect if it's real or a GAN-generated fake.</p>"))
#     display(HBox([file_upload, predict_button]))
#     display(output)

# # Run the interactive demo (uncomment to run)
# create_interactive_demo("checkpoints\Run_19_learning_rate_0.00005\ganfingerprint_20250411_125832_best.pth")