<a href="https://colab.research.google.com/github/jeffheaton/present/blob/master/youtube/gan/colab_gan_train.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![Jeff Heaton](https://raw.githubusercontent.com/jeffheaton/present/master/images/github.jpg)

Copyright 2021 by [Jeff Heaton](https://www.youtube.com/channel/UCR1-GEpyOPzT2AO4D_eifdw), [released under Apache 2.0 license](https://github.com/jeffheaton/present/blob/master/LICENSE)
# Training StyleGAN2 in Google CoLab

GANs can be trained with either Google Colab Free or Pro.  The Pro version is reccomended due to better GPU instances, longer runtimes, and timeouts.  Make sure that you are running this notebook with a GPU runtime.

Your training data and trained neural networks will be stored to GDRIVE.  For GANs, I lay out my GDRIVE like this:

* ./data/gan/images - RAW images I wish to train on.
* ./data/gan/datasets - Actual training datasets that I convert from the raw images.
* ./data/gan/experiments - The output from StyleGAN2, my image previews and saved network snapshots.

The drive is mounted to the following location.

```
/content/drive/MyDrive/data
```


# What Sort of GPU do you Have?

The type of GPU assigned to you by Colab will greatly affect your training time. Some sample times that I achieved with Colab are given here.  I've found that Colab Pro generally starts you with a V100, however, if you run scripts non-stop for 24hrs straight for a few days in a row, you will generally be throttled back to a P100.

* 1024x1024 - V100 - 566 sec/tick (CoLab Pro)
* 1024x1024 - P100 - 1819 sec/tick (CoLab Pro)
* 1024x1024 - T4 - 2188 sec/tick (CoLab Free)

If you use Google CoLab Pro, generally, it will not disconnect before 24 hours, even if you (but not your script) are inactive.  Free CoLab WILL disconnect a perfectly good running script if you do not interact for a few hours.  The following describes how to circumvent this issue.

* [How to prevent Google Colab from disconnecting?](https://stackoverflow.com/questions/57113226/how-to-prevent-google-colab-from-disconnecting)


In [1]:
!nvidia-smi

Sat Dec 25 00:50:13 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P0    29W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# Set Up New Environment

You will likely need to train for >24 hours.  Colab will disconnect you.  You must be prepared to restart training when this eventually happens.  Training is divided into ticks, every so many ticks (50 by default) your neural network is evaluated and a snapshot is saved.  When CoLab shuts down, all training after the last snapshot is lost. It might seem desirable to snapshot after each tick; however, this snapshotting process itself takes nearly an hour.  It is important to learn an optimal snapshot size for your resolution and training data.

We will mount GDRIVE so that your snapshots are saved there.  You must also place your training images in GDRIVE.

In [2]:
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

Mounted at /content/drive
Note: using Google CoLab


You must also install NVIDIA StyleGAN2 ADA PyTorch.  We also need to downgrade PyTorch to a version that supports StyleGAN.

In [3]:
!pip install torch==1.8.1 torchvision==0.9.1
!git clone https://github.com/NVlabs/stylegan2-ada-pytorch.git
!pip install ninja

Collecting torch==1.8.1
  Downloading torch-1.8.1-cp37-cp37m-manylinux1_x86_64.whl (804.1 MB)
[K     |████████████████████████████████| 804.1 MB 2.6 kB/s 
[?25hCollecting torchvision==0.9.1
  Downloading torchvision-0.9.1-cp37-cp37m-manylinux1_x86_64.whl (17.4 MB)
[K     |████████████████████████████████| 17.4 MB 555 kB/s 
Installing collected packages: torch, torchvision
  Attempting uninstall: torch
    Found existing installation: torch 1.10.0+cu111
    Uninstalling torch-1.10.0+cu111:
      Successfully uninstalled torch-1.10.0+cu111
  Attempting uninstall: torchvision
    Found existing installation: torchvision 0.11.1+cu111
    Uninstalling torchvision-0.11.1+cu111:
      Successfully uninstalled torchvision-0.11.1+cu111
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchtext 0.11.0 requires torch==1.10.0, but you have torch 1.8.1 which is incom

# Find Your Files

The drive is mounted to the following location.

```
/content/drive/MyDrive/data
```

It might be helpful to use an ```ls``` command to establish the exact path for your images.

In [4]:
!ls /content/drive/MyDrive/data/gan/images

ls: cannot access '/content/drive/MyDrive/data/gan/images': No such file or directory


In [5]:
!ls

drive  sample_data  stylegan2-ada-pytorch


In [6]:
from google.colab import files
files.upload()  # Upload your kaggle.json here.

!pip install -q kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 /root/.kaggle/kaggle.json

!mkdir cat-human-faces
%cd cat-human-faces/
!kaggle datasets download -d vincenttu/cat-human-faces
!unzip -q cat-human-faces.zip
!rm cat-human-faces.zip
%cd ../

%cd cat-human-faces/
!unzip -q cat_human_faces.zip
!rm cat_human_faces.zip
%cd ../

Saving kaggle.json to kaggle.json
/content/cat-human-faces
Downloading cat-human-faces.zip to /content/cat-human-faces
100% 2.47G/2.48G [01:15<00:00, 19.2MB/s]
100% 2.48G/2.48G [01:15<00:00, 35.0MB/s]
/content
/content/cat-human-faces
/content


# Convert Your Images

In [8]:
import os
os.makedirs("/content/cat-human-faces/dataset", exist_ok=True)

In [9]:
source_dir = r"/content/cat-human-faces/input/cat_human_faces"
dest_dir = r"/content/cat-human-faces/dataset"

In [10]:
!python /content/stylegan2-ada-pytorch/dataset_tool.py --source {source_dir} --dest {dest_dir}

100% 11653/11653 [04:54<00:00, 39.58it/s]


The following command can be used to clear out the newly created dataset.  If something goes wrong and you need to clean up your images and rerun the above command, you should delete your partially created dataset directory.

In [11]:
#!rm -R /content/drive/MyDrive/data/gan/dataset/circuit/*

# Clean Up your Images

It is important that all images have the same dimensions and color depth.  This code can identify images that have issues.

In [12]:
from os import listdir
from os.path import isfile, join
import os
from PIL import Image
from tqdm.notebook import tqdm

IMAGE_PATH = '/content/cat-human-faces/input/cat_human_faces'
files = [f for f in listdir(IMAGE_PATH) if isfile(join(IMAGE_PATH, f))]

base_size = None
for file in tqdm(files):
  file2 = os.path.join(IMAGE_PATH,file)
  img = Image.open(file2)
  sz = img.size
  if base_size and sz!=base_size:
    print(f"Inconsistant size: {file2}")
  elif img.mode!='RGB':
    print(f"Inconsistant color format: {file2}")
  else:
    base_size = sz


  0%|          | 0/11653 [00:00<?, ?it/s]

# Perform Initial Training

In [None]:
import os

# Modify these to suit your needs
EXPERIMENTS = "/content/drive/MyDrive/data/StyleGAN2-ADA-cat-human-faces"
DATA = "/content/cat-human-faces/dataset"
SNAP = 10

# Build the command and run it
cmd = f"/usr/bin/python3 /content/stylegan2-ada-pytorch/train.py --snap {SNAP} --outdir {EXPERIMENTS} --data {DATA}"

!{cmd}


Training options:
{
  "num_gpus": 1,
  "image_snapshot_ticks": 10,
  "network_snapshot_ticks": 10,
  "metrics": [
    "fid50k_full"
  ],
  "random_seed": 0,
  "training_set_kwargs": {
    "class_name": "training.dataset.ImageFolderDataset",
    "path": "/content/cat-human-faces/dataset",
    "use_labels": false,
    "max_size": 11653,
    "xflip": false,
    "resolution": 512
  },
  "data_loader_kwargs": {
    "pin_memory": true,
    "num_workers": 3,
    "prefetch_factor": 2
  },
  "G_kwargs": {
    "class_name": "training.networks.Generator",
    "z_dim": 512,
    "w_dim": 512,
    "mapping_kwargs": {
      "num_layers": 2
    },
    "synthesis_kwargs": {
      "channel_base": 32768,
      "channel_max": 512,
      "num_fp16_res": 4,
      "conv_clamp": 256
    }
  },
  "D_kwargs": {
    "class_name": "training.networks.Discriminator",
    "block_kwargs": {},
    "mapping_kwargs": {},
    "epilogue_kwargs": {
      "mbstd_group_size": 4
    },
    "channel_base": 32768,
    "channel

In [None]:
!/usr/bin/python3 /content/stylegan2-ada-pytorch/train.py --snap 25 --resume /content/drive/MyDrive/data/gan/experiments/00007-circuit-auto1/network-snapshot-000500.pkl --outdir /content/drive/MyDrive/data/gan/experiments --data /content/drive/MyDrive/data/gan/dataset/circuit

# Resume Training

In [None]:
import os

# Modify these to suit your needs
EXPERIMENTS = "/content/drive/MyDrive/data/gan/experiments"
NETWORK = "network-snapshot-000100.pkl"
RESUME = os.path.join(EXPERIMENTS, "00008-circuit-auto1-resumecustom", NETWORK)
DATA = "/content/drive/MyDrive/data/gan/dataset/circuit"
SNAP = 10

# Build the command and run it
cmd = f"/usr/bin/python3 /content/stylegan2-ada-pytorch/train.py --snap {SNAP} --resume {RESUME} --outdir {EXPERIMENTS} --data {DATA}"
!{cmd}

# Our Own Exploration into the Train.py File

There's one file that we care about right now: the train.py. That's the central file in this notebook. 

*So what does it do?*

Well, the training process is extremely complicated but we will try to understand at least the interface it provides to us.

- `--snap`: the interval at which you save a snapshot of your model for future training
- `--resume`: model weights to resume training
- `--outdir`: directory to output generated samples, logs, and more
- `--data`: the directory of the data

This is a very brief and high-level understanding of how to use this interface. There are many more arguments, but for our colab environment training, these are the most essential and relevant!