# FFHQ Dataset Downloader for Google Colab

This notebook provides a streamlined process for downloading the **Flickr-Faces-HQ (FFHQ)** dataset directly within a Google Colab environment and transferring it to your Google Drive.

--- 
### ⚠️ **Important Considerations**

1.  **Large Dataset Size**: The FFHQ image dataset is approximately **88 GB**. Ensure you have sufficient space in your Google Drive.
2.  **Long Duration**: The entire download and transfer process can take **several hours**. It is highly recommended to use a **Colab Pro** subscription to prevent the session from timing out.
3.  **Colab Disk Space**: A standard, free Colab instance may not have enough temporary disk space to hold the entire dataset before transferring. Colab Pro provides more disk space.

---

## Step 1: Mount Google Drive

First, connect to your Google Drive. This is where the downloaded dataset will be stored. You will be asked to authorize the connection.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Step 2: Create the Target Directory in Google Drive

Let's create the folder structure in your Google Drive where the dataset will live. We will use `CodeFormer_Dataset/ffhq` as the destination.

In [None]:
# This is the path that your training script will expect.
target_drive_folder = "/content/drive/My Drive/CodeFormer_Dataset/ffhq"

!mkdir -p "{target_drive_folder}"

print(f"Successfully created directory: {target_drive_folder}")

## Step 3: Download and Transfer the Dataset

This is the main step. The following script will:
1. Clone the official FFHQ dataset repository.
2. Run the download script to fetch the 70,000 images (1024x1024 resolution) to the Colab's temporary local storage.
3. Copy all the downloaded images to the Google Drive folder you created.

**Note:** This is a very long process. Please be patient.

In [None]:
# Navigate out to the root /content directory for a clean workspace
%cd /content/

print("Cloning the official FFHQ dataset repository from NVIDIA...")
!git clone https://github.com/NVlabs/ffhq-dataset.git
%cd ffhq-dataset

# Run the official Python script to download the images.
# The '--wilds' flag downloads the 1024x1024 PNG images.
# The images will be saved locally inside '/content/ffhq-dataset/images1024x1024'.
print("\nStarting FFHQ dataset download. This will take several hours...")
!python download_ffhq.py --wilds

# After the download is complete, copy the images to your Google Drive.
print("\nDownload complete. Starting transfer to Google Drive. This will also take a long time...")
!cp -r /content/ffhq-dataset/images1024x1024/* "/content/drive/My Drive/CodeFormer_Dataset/ffhq/"

print("\n--- All Done! ---")
print("The FFHQ dataset has been successfully downloaded and transferred to your Google Drive.")

## Step 4: Verify the Transfer

Let's run a quick check to ensure the files are now in your Google Drive.

In [None]:
# List the first 10 files in the target directory
!ls -l "/content/drive/My Drive/CodeFormer_Dataset/ffhq" | head -n 10