# Pix2PixHD

This notebook was created by Doug Rosman, and uses code from [Doug's forked pix2pixHD repository](https://github.com/dougrosman/pix2pixHD). For a video tutorial showing how to use this notebook, visit this link here: [https://dougrosman.github.io/cvml-sp21/resources/pix2pixHD/](https://dougrosman.github.io/cvml-sp21/resources/pix2pixHD/)

**Last updated: May 9, 2021, 2:00pm CST**



## 1. Connect to a GPU Instance (required)

**Executing this cell will connect you to a GPU, and your 8-10 hours of free GPU time will begin.**

This will show you what GPU you've been randomly given for this instance. With a Google Colab Pro account ($9.99/mo, really worth it!), you're almost always guaranteed a **P100**, with a chance at getting a **V100**.
* **V100:** Best, (not available for free accounts)
* **P100:** Great
* **T4:** (untested) this likely will *not* work for training, but is great for generating.
* **K80:** (untested) this might work for training, but it will likely be very slow. Should be just fine for generating (though slow).

If you get a T4 or K80, I encourage you to terminate your session, wait 5-10 minutes, then try connecting to a GPU again. To terminate your session, at the top of your screen go to **Runtime** --> **Manage Sessions** --> **Terminate**


In [None]:
!nvidia-smi -L

## 2. Mount your Google Drive (required)

**Executing this cell will prompt you to mount your Google Drive.**

After executing, a link will show up. Click the link and follow the directions. Select the Google Drive you wish to use, then copy and paste the authorization key into the box below, and press 'Enter' or 'Return' on your keyboard.

If you have multiple Google Accounts, I recommend mounting whichever one has the most storage, since working with pix2pixHD requires a lot of storage for all your images and trained models. If you're generating images, make sure that the .pth files you want to generate from are in the Google Drive that you mount.



In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## 3. Install pix2pixHD repository OR change directory into repo (required)

**Executing this cell will either install the pix2pixHD repo into your Google Drive, or move you into the pix2pixHD repo if it already exists. This cell also installs a python dependency called _dominate_.**

##### **Case 1: Installing the repo**
If this is your first time using this notebook (or if you deleted a previously installed version of the pix2pixHD repo in you Google Drive), this cell will clone Doug Rosman's forked pix2pixHD repo into your Google Drive into a folder called 'colab-pix2pixHD'. After cloning, it will move you into the pix2pixHD folder.

##### **Case 2: Moving into the repo**
If this repo already exists in your Google drive, this cell will move you into the pix2pixHD folder so that you can execute the other cells in this notebook.


In [None]:
import os
if os.path.isdir("/content/drive/MyDrive/colab-pix2pixHD"):
    %cd "/content/drive/MyDrive/colab-pix2pixHD/pix2pixHD"
    !pip install dominate
    !pip install -r util/requirements.txt
elif os.path.isdir("/content/drive/"):
    #install script
    %cd "/content/drive/MyDrive/"
    !mkdir colab-pix2pixHD
    %cd colab-pix2pixHD
    !git clone https://github.com/dougrosman/pix2pixHD
    %cd pix2pixHD
    !mkdir generated_videos
    !pip install dominate
    #install python requirements for Derrick Schultz' dataset-tools.py
    #more info: https://github.com/dvschultz/dataset-tools
    !pip install -r util/requirements.txt
else:
    !git clone https://github.com/dougrosman/pix2pixHD
    %cd pix2pixHD
    !mkdir generated_videos
    !pip install dominate
    !pip install -r util/requirements.txt


## 4. Data Processing

This notebook includes the following commands to help you create your data set:

1. Create the required folders for organizing your training data
1. Extract frames from a video file using FFMPEG
1. Create Canny edge versions of your images for your input (train_A) images

### **Notes on preparing your dataset**

1. **Identical image quantities:** The number of input images (train_A) should be identical to the number of output images (train_B). Otherwise, your training data will be mismatched, and won't train well.
1. **Same order:** The images in your train_A folder should be in the same order as the images in your train_B folder. They are fed into pix2pixHD in alphatbetical/numerical order. Your images should automatically be in the same order, but keep this in mind just in case you notice mismatches in your results.
1. **Consistent image dimensions** of all your input images should have the same dimensions, and all your output images should have the same dimensions.
1. **Consistent input/output dimensions:** Technically, pix2pixHD can train datasets where the input dimensions are different from the output (like if you were training an image upscaler), but for most situations, your input and output dimensions should match.
1. **Dataset size:** Depending on your goals, and the type of data you're working with, the amount of images you need for a "successful" model will vary. Some general tips: I would aim for **_500 image pairs minimum, but more is better. Shoot for the 2,000-5,000 image pair range)._** For example, if your dataset has 1,000 images, that means it has 2x 1,000 images -- 1,000 input images (train_A), and 1,000 output images (train_B).
1. **Efficient dataset management:** Uploading and downloading thousands of images using Google Drive can be painfully slow. I recommend downloading and installing [Google Backup and Sync](https://www.google.com/drive/download/) (Google Drive's desktop application for automatic file syncing). If you set it up, I recommend only syncing the specific dataset folders in your Google Drive that you want to work with, otherwise you might end up downloading too much to your computer. Also, during the first step of the setup process, make sure to click 'Sign in with your browser.'

### 4a. Create necessary dataset folders

#### **Set your Dataset Name**

The dataset_name variable you set here will be used throughout the rest of the notebook. Any time you work with a new dataset, change the name here. **_Any instance of $dataset_name you encounter will refer to whatever you set in this code cell._**

In [4]:
dataset_name = 'my_dataset' # change 'dataset_name' to whatever you want to call your dataset

#### **Create your dataset folders**
Inside your datasets folder, you'll need a folder for each dataset you work with. The following command creates these folders:
  1. **_train_A_**, for your input images. Place your input images inside that folder.
  2. **_train_B_**, for your output images. Place your output images inside that folder.
  3. **_test_A_**, for you test input images. We won't use this folder until we get the generate steps after training.


In [None]:
# Run this cell after setting your dataset_name above

!mkdir ./datasets/$dataset_name
!mkdir ./datasets/$dataset_name/train_A
!mkdir ./datasets/$dataset_name/train_B
!mkdir ./datasets/$dataset_name/test_A

### 4b. Upload your images

If you prepared your dataset images outside this notebook, upload them to the correct folders. Otherwise, skip this step.

Upload your input images (if you have them) to the train_A folder (e.g., your canny edges), and your output images to the train_B folder (e.g., the real images).

I recommend doing this in a separate tab that is opened to your Google Drive folder. Uploading large amounts of files inside Colab doesn't work very well. Or, use Backup and Sync (details above in the 'Notes on preparing your dataset' section)

### 4c. Extract frames from video – create train_B (output) images
If you plan to create a dataset by extracting images from a video file, follow these steps.
First, upload the video file to the **dataset folder** that contains your train_A, train_B and test_A folders.

Make sure the name of your video file is 'input.mp4'

In [None]:
# change scale to your desired resolution to resize your images. (1280:-1 scales
  ## images to 1280 for the width, and the height is scaled to maintain the aspect ratio)
  ## change the width to whatever makes sense for your images (I haven't tested
  ## anything larger than 1280x720, but you may be able to go larger. If you experience
  ## an out of memory error, it might be because your images are too large.)
# change the fps (number of frames per second to extract);
  ## higher fps = more images to extract, 6-12 is a good range. You likely don't need every
  ## single frame from your video to make a good data set.

!ffmpeg \
 -i ./datasets/$dataset_name/input.mp4 \
 -vf scale=1280:-1,fps=8 \
 ./datasets/$dataset_name/train_B/output%5d.png

### 4d. Apply Canny Edge Detection - create train_A (input) images
This step uses [Canny Edge Detection](https://en.wikipedia.org/wiki/Canny_edge_detector) to find edges in your input images. The outlined images are rendered and stored in the train_A folder.

**Whether you uploaded your train_B images, or got your train_B images by extracting them from a video, you should run this step if you're creating canny edge images.**


In [None]:
# change blur amount if there are too many lines in your resulting Canny Edge
# images (odd numbers only; start with 5, then try 3 if 5 doesn't give you enough lines)

# change max_size to be the max dimension of your input images (e.g., if your
# input images are 1280x720, set max_size to '1280')

!python util/dataset-tools.py \
--input_folder ./datasets/$dataset_name/train_B/ \
--output_folder ./datasets/$dataset_name/train_A/ \
--process_type canny \
--blur_type gaussian \
--blur_amount 3 \
--max_size 1024 \
--verbose

## 5. Training

### **Some notes on training:** 

* **To stop your training manually**, click the stop button on the cell that's running your training.
* **You should only train using a P100 or a V100** (step 1 in the notebook tells you which care you have).
* **There's no set time for how much training your model needs to get the results you want**, but at least 60 epochs is ideal (more is likely better)
* **Watch your checkpoints folder as you train.** You can track quality of your training by viewing the sample images that are automatically generated during training. These live in the checkpoints folder, in the folder for your current training. Inside of there, go into web-->images to see your sample images. If it looks like your training is getting __*worse*__, then stop your training.
* **When you start your training, stick around for the first 10-15 minutes,** Google Colab sometimes checks to see if you're a robot around that time, so make sure you're there to confirm your humanity.
* **Don't close this tab!** You can do other things on your computer, and browse other tabs, but just don't close the tab!
* **Don't close your laptop!**
* **Don't let your computer fall asleep.** Go into your system settings to make sure your computer won't fall asleep.
* **On a free account, you'll get around ~7-10 hours of continuous training.**
* **On a pro account, you'll get around ~18-24 hours of continuous training.**
* **For free accounts, if you train for around ~40 hours or so in a single week, Google may "shadowban" you for a bit**, meaning you might not be able to connect to a GPU until after waiting a few hours (or sometimes an entire day). If you're running into these issues, I recommend Google Colab Pro (it's only $9.99 for the month, and totally worth it).
* **You can't do anything else in this notebook while training.** If you want to generate images while training, I recommend opening up a second Colab notebook in another Google account. Note, Google might be on to you if it finds you're using like, 5 Colab notebooks simultaneously. Proceed with this at your own risk. Just make sure you don't mount the same Drive folder in step 2 from multiple Colab notebooks.

### 5a. Setting your training variables (required for both training from scratch AND resuming training)

Set the following variables, whether you're training from scratch or resuming training. After change the variables, click the play button in this cell to save your values.

In [13]:
##### REQUIRED: edit these each time you train a new model from scratch!
name = 'training_name'    # can be whatever you want; name this based on your dataset (e.g. edges2cats)
loadSize = 1280     # The desired width of your outputs (note: images will be cropped to this) Default=1024 
fineSize = 720      # The desired height of your outputs
                    ## I haven't trained a model with images larger than 1280x720, though it may be possible!
which_epoch = 'latest' # The epoch you wish to resume training from. Keep this set to 'latest' if you want to
                        ## pick up from where you left off. Otherwise, put the number of the .pth file you want
                        ## to resume from (e.g. 10, 20, 30, etc.)

##### OPTIONAL: change these if needed
resize_or_crop = 'scale_width'  # keeping this unchanged will automatically resize
                                  ## your images to the loadSize and fineSize, then crop to
                                  ## those dimensions. Set to 'none' if your images are
                                  ## already the correct dimensions

display_freq = 200    # frequency of showing training results on screen
print_freq = 100      # frequency of showing training results on console
save_latest_freq = 1000     # frequency of saving the latest results
                              ##(lower = more frequent saving, 1000 ~ saves every 10 minutes)
save_epoch_freq = 10    # frequency of saving checkpoints at the end of epochs
                        # (1 epoch is completed after going through every image in your data set 1 time)

### 5b. Running the training command

#### **Option 1: Training a new model from scratch**

Unlike StyleGAN, where use transfer learning by resuming training from a pretrained model like FFHQ, with pix2pixHD, it's much more common to train a model from scratch.

In [None]:
# ONLY RUN THIS IF YOU ARE STARTING A NEW TRAINING
!python train.py --name=$name --dataroot=./datasets/$dataset_name --checkpoints_dir checkpoints --no_instance --label_nc 0 --resize_or_crop=$resize_or_crop --loadSize=$loadSize --fineSize=$fineSize

#### **Option 2: Resuming a training**

Run this command if you are resuming training. You should still set your variables above before running this command.

In [None]:
# ONLY RUN THIS IF YOU ARE RESUMING A TRAINING
!python train.py --name=$name --dataroot=./datasets/$dataset_name --checkpoints_dir checkpoints --no_instance --label_nc 0 --resize_or_crop=$resize_or_crop --continue_train --which_epoch=$which_epoch --loadSize=$loadSize --fineSize=$fineSize

## 6. Generating Images

In order to generate images with pix2pixHD, you need to feed the model with "test" images. Your test images should look like your input (train_A) images. For example, if your train_A images used Canny Edge images, then your test images should also be Canny Edge images.

pix2pixHD takes the images from your test_A folder and feeds them into your trained model. Any time you want to test new input images with your model, you'll need to replace the images in the test_A folder with your new images.

Inevitably, pix2pixHD will have you working with a large amount of images spread across a number of different folders, and your file organization can get out of hand if you don't plan ahead a bit.

The following code cells provide tools to prepare your test images. None are required, so read the description before each cell to see if that's something you want to do.

### 6a. Preparing your test images

#### **Set your experiment name**

The experiment_name variable you set here will be used throughout the rest of the notebook. Any time you work with a new experiment, change the name here. **_Any instance of $experiment_name you encounter will refer to whatever you set in this code cell._**

In [19]:
experiment_name = 'my_experiment_name' # change 'my_experiment_name' to whatever you want to call your experiment

#### **Set your Dataset Name**

(Note: if you're doing this step in the same session as your training, your dataset_name will have already been set. Feel free to set it again anytime you're generating images using a different dataset/trained model).

The dataset_name variable you set here will be used throughout the rest of the notebook. Any time you work with a new dataset, change the name here. **_Any instance of $dataset_name you encounter will refer to whatever you set in this code cell._**

In [None]:
dataset_name = 'my_dataset' # change 'dataset_name' to whatever you want to call your dataset

#### **Option 1:** Generating images from your original training data (the images from your train_A folder)

1. This cell removes any images from your test_A folder (if there are any), and... 
2. Copies all the images from your train_A folder to your test_A folder.
Testing your trained model with the original data set can be useful to see how accurately the model can recreate the training data.

In [None]:
# remove any images currently in test_A
!rm -v ./datasets/$dataset_name/test_A/*.png

# copy images from train_A to test_A
!cp -v ./datasets/$dataset_name/train_A/*.png ./datasets/$dataset_name/test_A

#### **Option 2: Testing with Canny Edge images from a new input source.**


This is where it starts to get important to stay organized. This command creates a folder inside of "input_test_images" (which lives in your datasets folder). **Do this for each new experiment you create.**

In [None]:
# create a folder + relevant subfolders for your new experiment
!mkdir ./datasets/input_test_images/$experiment_name
!mkdir ./datasets/input_test_images/$experiment_name/extracted_frames
!mkdir ./datasets/input_test_images/$experiment_name/canny_edges

#### **Upload your source video**
After running the above cell, upload the video to the [experiment_name] folder you created above.

Make sure the video file is called **input.mp4**.

#### **Extract frames from source video**

In [None]:
# change scale to your desired resolution to resize your images. (1280:-1 scales
  ## images to 1280 for the width; the height is scaled to maintain the aspect ratio)
  ## change the width to whatever makes sense for your images (I haven't tested
  ## anything larger than 1280, but you might be able to go up to 1440 or 1600)
# change the fps (number of frames per second to extract);
  ## this time, you probably want all the frames from the video, so set
  ## the fps to the fps of your input video.
# change output%5d.png to include a reference to your experiment_name

!ffmpeg \
 -i ./datasets/input_test_images/$experiment_name/input.mp4 \
 -vf scale=1280:-1,fps=30 \
 ./datasets/input_test_images/$experiment_name/extracted_frames/output%5d.png

#### **Apply Canny Edge Detection - create test_A images**
If your images are already prepared and don't need to be converted to Canny edges, you don't need to do this step. This step uses [Canny Edge Detection](https://en.wikipedia.org/wiki/Canny_edge_detector) to find edges in your input images. The outlined images are rendered and stored in the canny_edges folder.


In [None]:
# change blur amount if there are too many lines in your resulting Canny Edge
# images (odd numbers only; start with 5, then try 3 if 5 doesn't give you enough lines)

# change max_size to be the max dimension of your input images (e.g., if your
# input images are 1280x720, set max_size to '1280')

!python util/dataset-tools.py \
--input_folder ./datasets/input_test_images/$experiment_name/extracted_frames/ \
--output_folder ./datasets/input_test_images/$experiment_name/canny_edges \
--process_type canny \
--blur_type gaussian \
--blur_amount 3 \
--max_size 1280 \
--verbose

#### **Put your test images in the correct folders**
Run this cell to remove any images currently in test_A, then copy your new Canny Edge Images to the test_A folder.

In [None]:
# remove any images currently in test_A (change dataset_name to your dataset_name)
!rm -v ./datasets/$dataset_name/test_A/*.png

# copy your canny edge images into test_A (change experiment_name and dataset_name)
!cp -v ./datasets/input_test_images/$experiment_name/canny_edges/*.png ./datasets/$dataset_name/test_A

### 6b. Running the generate command

Change your variables as need below, then run the cell to save the changes.

In [17]:
##### REQUIRED: edit these each time each you generate images
loadSize = 1280 # set this to the width of your input images
fineSize = 720  # set this to the height of your input images
how_many = 400  # The number of images you wish to generate. Make sure this is
                ## greater than or equal to the number of images you're trying to generate

##### OPTIONAL: change these if needed
which_epoch = 'latest' # The epoch you wish to generate images from (e.g. '20') (Defult: latest. I recommend this)


In [None]:
# Run this cell to generate your images. This may take a few minutes.
!python test.py --name=$dataset_name --dataroot=./datasets/$dataset_name --checkpoints_dir checkpoints --results_dir=./results/$experiment_name --which_epoch=$which_epoch --how_many=$how_many --no_instance --label_nc 0 --loadSize=$loadSize --fineSize=$fineSize

## 7. Creating videos from your generated images

### 7a. Create video of your test input images
_Thread your test_A images together into a video. If your input images used Canny Edge, then this will be a video of your Canny Edge inputs._

1. **-i**: the input images _(the filepath to your synthesized images)_
1. **-r**: the framerate _(any value between 1-60)_
1. **-crf**: the compression quality of the output video _(lower is better, 17-25 is a good range)_
1. **_the output filename_**: the last argument; make sure to set this each time you create a new video

In [None]:
# Change 'input_images_sequence.mp4' to something more descriptive (e.g. edges2cats_input_edges.mp4).
# MAKE SURE TO CHANGE THIS EACH TIME TO AVOID OVERWRITING OTHER GENERATED VIDEOS
!ffmpeg \
   -pattern_type glob \
   -i "./results/$experiment_name/$dataset_name/test_latest/images/*_input_label.png" \
   -r 30 \
   -vcodec libx264 \
   -crf 23 \
   -pix_fmt yuv420p \
   ./generated_videos/input_images_sequence.mp4

### 7b. Create video of your synthesized images
_Thread your synthesized images together into a video._

1. **-i**: the input images _(the filepath to your synthesized images)_
1. **-r**: the framerate _(any value between 1-60)_
1. **-crf**: the compression quality of the output video _(lower is better, 17-25 is a good range)_
1. **_the output filename_**: the last argument; make sure to set this each time you create a new video

In [None]:
# Change 'synthesized_output' to something more related to this experiment (e.g. edges2cats_synthesized.mp4).
# MAKE SURE TO CHANGE THIS EACH TIME TO AVOID OVERWRITING OTHER GENERATED VIDEOS
!ffmpeg \
   -pattern_type glob \
   -i "./results/$experiment_name/$dataset_name/test_latest/images/*_synthesized_image.png" \
   -r 30 \
   -vcodec libx264 \
   -crf 23 \
   -pix_fmt yuv420p \
   ./generated_videos/synthesized_output.mp4