# HAI Deepfake Detection - Colab Training Pipeline

This notebook is a pipeline for training models on **Google Colab** using GPU.
It fetches code developed locally via Git and executes training.

## 1. Mount Google Drive
Mount Google Drive to save trained models (Checkpoints) and datasets.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## 2. Project Setup (Git Clone/Pull)
Fetch the latest code from GitHub.

In [None]:
import os

# TODO: Change to your GitHub repository URL!
REPO_URL = "https://github.com/CBottle/HAI_Deepfake.git"
PROJECT_DIR = "/content/HAI_Deepfake"

if os.path.exists(PROJECT_DIR):
    print("Project already exists. Pulling latest code...")
    %cd {PROJECT_DIR}
    !git pull
else:
    print("Cloning project...")
    !git clone {REPO_URL}
    %cd {PROJECT_DIR}

print("Current working directory:", os.getcwd())

## 3. Environment Setup (Install Libraries)

In [None]:
!pip install -r env/requirements.txt

## 4. Prepare Dataset (Kaggle API)
Download dataset from Kaggle. (Requires kaggle.json)

**Usage:**
1. Place `kaggle.json` in your Google Drive at `/content/drive/MyDrive/Kaggle/kaggle.json`.
2. Run the cell below to set it up automatically.

In [None]:
import os

# Kaggle config directory
KAGGLE_DIR = '/root/.kaggle'
os.makedirs(KAGGLE_DIR, exist_ok=True)

# Copy key from Drive (Adjust path if needed)
DRIVE_KAGGLE_KEY = '/content/drive/MyDrive/Kaggle/kaggle.json'

if os.path.exists(DRIVE_KAGGLE_KEY):
    !cp {DRIVE_KAGGLE_KEY} {KAGGLE_DIR}/
    !chmod 600 {KAGGLE_DIR}/kaggle.json
    print("Kaggle API Key configured!")
    
    # Example: Download FaceForensics++ (Change dataset name)
    # !kaggle datasets download -d [dataset-name] -p ./train_data --unzip
else:
    print("kaggle.json not found in Drive. Skipping download.")
    print("Generating dummy data for testing.")
    !python create_dummy_data.py

## 5. Run Training
Start training using GPU.

In [None]:
# Check if device: cuda in config.yaml
!python train.py --config config/config.yaml

## 6. Backup Model
Copy trained model to Google Drive.

In [None]:
import shutil
from datetime import datetime

# Backup path (Google Drive)
BACKUP_DIR = f"/content/drive/MyDrive/HAI_Deepfake/models/{datetime.now().strftime('%Y%m%d_%H%M')}"
os.makedirs(BACKUP_DIR, exist_ok=True)

# Copy checkpoints
if os.path.exists('output'):
    !cp -r output/* {BACKUP_DIR}/
    print(f"Model backup completed: {BACKUP_DIR}")
else:
    print("No saved model found.")