# Pix2PixHD

**Last updated: March 31, 2023**

This notebook was created by Doug Rosman, and uses code from [Doug's forked pix2pixHD repository](https://github.com/dougrosman/pix2pixHD). For a video tutorial showing how to use this notebook, visit this link here: [https://dougrosman.github.io/cvml-sp21/resources/pix2pixHD/](https://dougrosman.github.io/cvml-sp21/resources/pix2pixHD/) (Note, this video was created for a prior version of this notebook. Most of the functionality in the video matches this notebook, but there might be some differences in how files and folders are named and used.)

## Running this notebook locally

This notebook can be run locally (if you have an Nvidia GPU and are using Linux). The notebook functions basically the same whether you're running it locally or in Google Colab, but some of the setup steps are different for local (things are a bit easier). The first-time setup requires a couple steps, though. **These setup instructions assume you have your Nvidia and Cuda drivers installed, as well as Anaconda and VS Code with the necessary Python and Jupyter Notebook extensions. Setting up Ubuntu to run this code locally requires extra setup that, unfortunately, is beyond the scope of this notebook.**

### First-time setup for local use
1. If you haven't already, create a folder somewhere on your computer and put this notebook in it. Call the notebook whatever you want, maybe something like "pix2pixHD"
2. In VS Code, go to File-->Open Folder and open the folder where you placed this notebook
3. Run any of the code cells below (Step 1 is fine)
4. Select **Python Environments** for your kernel
5. Select **+ Create Python Environment**
6. Select **Conda**
7. Select **Python 3.9**

This Python environment will be saved in this folder, so you won't have to do these setup steps again

# Part 1: Setup

## 1a. Connect to a GPU Instance
Google Colab: Required if training or generating images

Local: Not required (but can be a useful way to initiate your Python environment for the first time you run this notebook.)

**In Google Colab, only run this step when you're ready to start training or generating images. Executing this cell will connect you to a GPU and officially start the clock on your instance, so you don't want to waste limited GPU time on tasks that don't require a GPU (like processing images, setting variable names, etc.)**


In [None]:
!nvidia-smi -L

## 1b. Mount your Google Drive

Google Colab: Required

Local: do not run

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## 1c. Cloning and/or changing directory into the pix2pixHD repo

Google Colab: Required

Local: Required

**Executing this cell will either install the pix2pixHD repo into your Google Drive, locally onto your computer, or change directory into the pix2pixHD repo if it already exists.**


In [None]:
import os

if os.path.isdir("/content/drive/MyDrive/colab-pix2pixHD"):
    %cd "/content/drive/MyDrive/colab-pix2pixHD/pix2pixHD"
elif os.path.isdir("pix2pixHD"):
    %cd "pix2pixHD"
elif os.path.isdir("/content/drive/"):
    #install script
    %cd "/content/drive/MyDrive/"
    !mkdir colab-pix2pixHD
    %cd colab-pix2pixHD
    !git clone https://github.com/dougrosman/pix2pixHD
    %cd pix2pixHD
   
else:
    !git clone https://github.com/dougrosman/pix2pixHD
    %cd pix2pixHD
    !mkdir generated_videos

## 1d. Installing dependencies

Google Colab: Required
Local: Required only once, the first time you ever run this notebook.

In [None]:
!pip install dominate

# install python requirements for Derrick Schultz' dataset-tools.py
# #more info: https://github.com/dvschultz/dataset-tools
!pip install -r util/requirements.txt

# Part 2: Data Set Preparation

This notebook includes the following commands to help you create your data set:

1. Create the required folders for organizing your training data
1. Extract frames from a video file using FFMPEG
1. Unzip a folder full of images you uploaded to Google Drive
1. Create Canny edge versions of your images for your input (train_A) images

#### **Notes on preparing your dataset**

1. **Identical image quantities:** The number of input images (train_A) should be identical to the number of output images (train_B) (e.g. 500 of each). Otherwise, your training data will be mismatched, and won't train well.
1. **Same order:** The images in your train_A folder should be in the same order as the images in your train_B folder. They are fed into pix2pixHD in alphatbetical/numerical order. Your images should automatically be in the same order, but keep this in mind just in case you notice mismatches in your results.
1. **Consistent image dimensions:** all your input images should have the same dimensions, and all your output images should have the same dimensions.
1. **Image dimensions that are divisible by 16** pix2pixHD is flexible with your image dimensions, but stick with a width and height where the values are divisible by 16 (e.g. 1024 x 512, 1280 x 720, 854 x 480, 512 x 512)
1. **Consistent input/output dimensions:** Technically, pix2pixHD can train datasets where the input dimensions are different from the output (like if you were training an image upscaler), but for most situations, your input and output dimensions should match.
1. **Dataset size:** It totally depends. Depending on your goals, and the type of data you're working with, the amount of images you need for a "successful" model will vary. Some general tips: I would aim for **_500 image pairs minimum, but more is better. Shoot for the 1,000-3,000 image pair range)._** For example, if your dataset has 1,000 images, that means it has 2x 1,000 images -- 1,000 input images (train_A), and 1,000 output images (train_B)

## 2a. Create your folders

Required any time you're going to train a new model

Inside your datasets folder, you'll need a folder for each data set, and each data set folder has a few necessary folders. 

1. **train_A**, for your input images (e.g. canny edges)
1. **train_B**, for your output images
1. **test_A**, for the images you'll use to generate images after training your model
1. **raw**, for unprocessed images or videos that you intend to process for training

The following command creates these folders inside the **datasets** folder.


In [None]:
dataset_name = 'my_dataset' # change 'my_dataset' to something specific to your dataset

!mkdir ./datasets/$dataset_name
!mkdir ./datasets/$dataset_name/train_A
!mkdir ./datasets/$dataset_name/train_B
!mkdir ./datasets/$dataset_name/test_A
!mkdir ./datasets/$dataset_name/raw

### 2b. Getting the right images into the right folders

With pix2pixHD, we're training a model to translate one type of image into another. Your "input" images go in the train_A folder, and the "output" images into train_B. There are a number of ways to prepare a data set, like scraping images from the web or extracting frames from video. Depending on how you're sourcing your images, these steps will differ. The important things are:

1. Your train_A and train_B folders should have the same amount of images in each
2. The images in train_A and train_B should be in the same order, so that the correct images are matched together during training.
3. The width and heights of each image should be consistent, and divisible by **16** (e.g. 1024 x 768, 1024 x 512, 1280 x 720, 512 x 512, 854 x 512, etc.)



This section has a couple useful commands for processing your images.

Because pix2pixHD has multiple folders for different components of your data set, there are different ways your images can end up in the right folders. The following steps provide different tools for processing your images. Moving data around can be one of the most annoying part of the machine learning process, so practice finding ways to make these steps more efficient. This is where it can be really useful to be comfortable with the command line.

#### Some example scenarios

##### **Your input and output images were processed locally on your computer** 
Name your input folder **train_A** and your output folder **train_B**. Zip up your folders individually (use Keka if you're on a Mac), and upload train_A.zip and train_B.zip to your Google Drive in the 'raw' folder for this dataset. Use the unzip and mv commands below to unzip your images and put them in the proper folders.

##### **You want to extract your input images from a video using FFMPEG**
Upload the video file to the 'raw' folder for this dataset, and run the FFMPEG command below to extract your frames.

### 4c. Unzip your input (train_A) folder
If you created your dataset outside of Google Drive and uploaded your train_A.zip and train_B.zip files to the 'raw' folder for this dataset, this command will **unzip your train_A folder**

In [None]:
#!unzip path_to_file.zip -d path_to_directory
!unzip ./datasets/$dataset_name/raw/train_A.zip -d ./datasets/$dataset_name/raw

### 4d. Unzip your output (train_B) folder
If you created your dataset outside of Google Drive and uploaded your train_A.zip and train_B.zip files to the 'raw' folder for this dataset, this command will **unzip your train_B folder**

In [None]:
#!unzip path_to_file.zip -d path_to_directory
!unzip ./datasets/$dataset_name/raw/train_B.zip -d ./datasets/$dataset_name/raw

### 4e. Extract frames from video – create output (train_B) images (if needed)
If you plan to create a dataset by extracting images from a video file, follow these steps.
First, upload the video file to the **dataset folder** that contains your train_A, train_B and test_A folders.

Make sure the name of your video file is 'input.mp4'

In [None]:
# Change scale to your desired resolution to resize your images. (1280:-1 scales
# images to 1280 for the width, and the height is scaled to maintain the aspect ratio)
# change the width to whatever makes sense for your images (I haven't tested
# anything larger than 1280x720, but you may be able to go larger. If you experience
# an out of memory error, it might be because your images are too large.)
# The resolution you can train with depends on the GPU you get.

# change the fps (number of frames per second to extract);
  ## higher fps = more images to extract, 3-6 is a good range. You likely don't need every
  ## single frame from your video to make a good data set. You can also use decimals

width = 854 # (854x480) the height will be scaled proportionally to maintain the aspect ratio
fps = 1/10 # how many frames per second to extract from the video


!ffmpeg \
 -i ./datasets/$dataset_name/raw/forest.mp4 \
 -q:v 2 \
 -vf scale=$width:-1,fps=$fps \
 ./datasets/$dataset_name/train_B/output%5d.jpg

### 4f. Resize your images (train_B) images (if needed)
If you uploaded images in a zip folder that aren't the correct size, you can resize them here.


In [None]:
# change max_size to be the max dimension of your input images (e.g., if your
# input images are 1280x720, set max_size to '1280')

!python util/dataset-tools.py \
--input_folder ./datasets/$dataset_name/train_B/ \
--output_folder ./datasets/$dataset_name/train_B/resized \
--max_size 1280 \
--verbose

### 4g. Apply Canny Edge Detection - create train_A (input) images
This step uses [Canny Edge Detection](https://en.wikipedia.org/wiki/Canny_edge_detector) to find edges in your input images. The outlined images are rendered and stored in the train_A folder.

**Whether you uploaded your train_B images, or got your train_B images by extracting them from a video, you should run this step if you're creating canny edge images.**


In [None]:
# change blur amount if there are too many lines in your resulting Canny Edge
# images (odd numbers only; start with 5, then try 3 if 5 doesn't give you enough lines)

# change max_size to be the max dimension of your input images (e.g., if your
# input images are 1280x720, set max_size to '1280')

!python util/dataset-tools.py \
--input_folder ./datasets/$dataset_name/train_B/ \
--output_folder ./datasets/$dataset_name/train_A/ \
--process_type canny \
--blur_type gaussian \
--blur_amount 3 \
--max_size 854 \
--verbose

## 5. Training

### **Some notes on training:** 

* **To stop your training manually**, click the stop button on the cell that's running your training.
* **There's no set time for how much training your model needs to get the results you want**, but shoot for at least 60 epochs (more is likely better)
* **Watch your checkpoints folder as you train.** You can track quality of your training by viewing the sample images that are automatically generated during training. These live in the checkpoints folder, in the folder for your current training. Inside of there, go into web-->images to see your sample images. If it looks like your training is getting __*worse*__, then stop your training.
* **When you start your training, stick around for the first 10-15 minutes,** Google Colab sometimes checks to see if you're a robot around that time, so make sure you're there to confirm your humanity.
* **Don't close this tab!** You can do other things on your computer, and browse other tabs, but just don't close the tab!
* **Don't close your laptop!**
* **Don't let your computer fall asleep.** Go into your system settings to make sure your computer won't fall asleep.
* **On a free account, you'll get around ~7-10 hours of continuous training.**
* **On a pro account, you'll get around ~18-24 hours of continuous training.**
* **For free accounts, if you train for around ~30 hours or so in a single week, Google may "shadowban" you for a bit**, meaning you might not be able to connect to a GPU until after waiting a few hours (or sometimes an entire day). If you're running into these issues, I recommend Google Colab Pro (it's only $9.99 for the month, and totally worth it).
* **You can't do anything else in this notebook while training.** If you want to generate images while training, I recommend opening up a second Colab notebook in another Google account. Note, Google might be on to you if it finds you're using like, 5 Colab notebooks simultaneously. Proceed with this at your own risk. Just make sure you don't mount the same Drive folder in step 2 from multiple Colab notebooks.

### 5a. Setting your training variables (required for both training from scratch AND resuming training)

Set the following variables, whether you're training from scratch or resuming training. After change the variables, click the play button in this cell to save your values.

In [None]:
##### REQUIRED: edit these each time you train a new model from scratch!
dataset_name = 'edges2forest' # this name needs to match the name of your folder
name = 'edges2forest'    # can be whatever you want; name this based on your dataset (e.g. edges2cats)
loadSize = 854     # The desired width of your outputs (note: images will be cropped to this) Default=1024 
fineSize = 480      # The desired height of your outputs
                    ## I haven't trained a model with images larger than 1280x720, though it may be possible!
which_epoch = 'latest' # The epoch you wish to resume training from. Keep this set to 'latest' if you want to
                        ## pick up from where you left off. Otherwise, put the number of the .pth file you want
                        ## to resume from (e.g. 10, 20, 30, etc.)

##### OPTIONAL: change these if needed
resize_or_crop = 'none'  # keeping this unchanged will automatically resize
                                  ## your images to the loadSize and fineSize, then crop to
                                  ## those dimensions. Set to 'none' if your images are
                                  ## already the correct dimensions
                                  ## scale_width

display_freq = 200    # frequency of showing training results on screen
print_freq = 100      # frequency of showing training results on console
save_latest_freq = 500     # frequency of saving the latest results
                              ##(lower = more frequent saving, 1000 ~ saves every 10 minutes)
save_epoch_freq = 4    # frequency of saving checkpoints at the end of epochs
                        # (1 epoch is completed after going through every image in your data set 1 time)

### 5b. Running the training command

#### **Option 1: Training a new model from scratch**

With StyleGAN2, we can train our models more quickly using transfer learning, where we resume from a pretrained model. With pix2pix, it's much more common to train a model from scratch.

In [None]:
# ONLY RUN THIS IF YOU ARE STARTING A NEW TRAINING
!python train.py --name=$name --dataroot=./datasets/$dataset_name --checkpoints_dir checkpoints --no_instance --label_nc 0 --resize_or_crop=$resize_or_crop --loadSize=$loadSize --fineSize=$fineSize

#### **Option 2: Resuming a training**

Run this command if you are resuming training. You should still set your variables above before running this command.

In [None]:
# ONLY RUN THIS IF YOU ARE RESUMING A TRAINING
!python train.py --name=$name --dataroot=./datasets/$dataset_name --checkpoints_dir checkpoints --no_instance --label_nc 0 --resize_or_crop=$resize_or_crop --continue_train --which_epoch=$which_epoch --loadSize=$loadSize --fineSize=$fineSize

## 6. Generating Images

In order to generate images with pix2pixHD, you need to feed the model with "test" images. Your test images should look like your input (train_A) images. For example, if your train_A images used Canny Edge images, then your test images should also be Canny Edge images.

pix2pixHD takes the images from your test_A folder and feeds them into your trained model. Any time you want to test new input images with your model, you'll need to replace the images in the test_A folder with your new images.

Inevitably, pix2pixHD will have you working with a large amount of images spread across a number of different folders, and your file organization can get out of hand if you don't plan ahead a bit.

The following code cells provide tools to prepare your test images. None are required, so read the description before each cell to see if that's something you want to do.

### 6a. Preparing your test images

#### **Set your experiment name**

The experiment_name variable you set here will be used throughout the rest of the notebook. Any time you work with a new experiment, change the name here. **_Any instance of $experiment_name you encounter will refer to whatever you set in this code cell._**

In [None]:
experiment_name = 'edges2forest-doug_face' # change to whatever you want to call your experiment.

#### **Set your Dataset Name**

(Note: if you're doing this step in the same session as your training, your dataset_name will have already been set. Feel free to set it again anytime you're generating images using a different dataset/trained model).

The dataset_name variable you set here will be used throughout the rest of the notebook. Any time you work with a new dataset, change the name here. **_Any instance of $dataset_name you encounter will refer to whatever you set in this code cell._**

In [None]:
dataset_name = 'edges2forest' # change to the name of the dataset you're working with

#### **Option 1:** Generating images from your original training data (the images from your train_A folder)

1. This cell removes any images from your test_A folder (if there are any), and... 
2. Copies all the images from your train_A folder to your test_A folder.
Testing your trained model with the original data set can be useful to see how accurately the model can recreate the training data.

In [None]:
# remove any images currently in test_A
!rm -v ./datasets/$dataset_name/test_A/*.png

# copy images from train_A to test_A
!cp -v ./datasets/$dataset_name/train_A/*.png ./datasets/$dataset_name/test_A

#### **Option 2: Testing with Canny Edge images from a new input source.**


This is where it starts to get important to stay organized. This command creates a folder inside of "input_test_images" (which lives in your datasets folder). **Do this for each new experiment you create.**

In [None]:
# create a folder + relevant subfolders for your new experiment
!mkdir ./experiments/$experiment_name
!mkdir ./experiments/$experiment_name/extracted_frames
!mkdir ./experiments/$experiment_name/canny_edges
!mkdir ./experiments/$experiment_name/raw

#### **Upload your source video**
After running the above cell, upload the video to the [experiment_name] folder you created above.

Make sure the video file is called **input.mp4**.

#### **Extract frames from source video**

In [None]:
!pwd

In [None]:
# change scale to your desired resolution to resize your images. (1280:-1 scales
  ## images to 1280 for the width; the height is scaled to maintain the aspect ratio)
  ## change the width to whatever makes sense for your images (I haven't tested
  ## anything larger than 1280, but you might be able to go up to 1440 or 1600)
# change the fps (number of frames per second to extract);
  ## this time, you probably want all the frames from the video, so set
  ## the fps to the fps of your input video.
# change output%5d.png to include a reference to your experiment_name

!ffmpeg \
 -i ./experiments/$experiment_name/raw/doug-face.mp4 \
 -vf scale=854:-1,fps=30 \
 ./experiments/$experiment_name/extracted_frames/output%5d.png

#### **Apply Canny Edge Detection - create test_A images**
If your images are already prepared and don't need to be converted to Canny edges, you don't need to do this step. This step uses [Canny Edge Detection](https://en.wikipedia.org/wiki/Canny_edge_detector) to find edges in your input images. The outlined images are rendered and stored in the canny_edges folder.


In [None]:
# change blur amount if there are too many lines in your resulting Canny Edge
# images (odd numbers only; start with 5, then try 3 if 5 doesn't give you enough lines)

# change max_size to be the max dimension of your input images (e.g., if your
# input images are 1280x720, set max_size to '1280')

!python util/dataset-tools.py \
--input_folder ./experiments/$experiment_name/extracted_frames/ \
--output_folder ./experiments/$experiment_name/canny_edges \
--process_type canny \
--blur_type gaussian \
--blur_amount 1 \
--max_size 854 \
--verbose

#### **Put your test images in the correct folders**
Run this cell to remove any images currently in test_A, then copy your new Canny Edge Images to the test_A folder.

In [None]:
# remove any images currently in test_A (change dataset_name to your dataset_name)
!rm -v ./datasets/$dataset_name/test_A/*.png

# copy your canny edge images into test_A (change experiment_name and dataset_name)
!mv -v ./experiments/$experiment_name/canny_edges/*.png ./datasets/$dataset_name/test_A

### 6b. Running the generate command

Change your variables as need below, then run the cell to save the changes.

In [None]:
##### REQUIRED: edit these each time each you generate images
loadSize = 854 # set this to the width of your input images
fineSize = 480  # set this to the height of your input images
how_many = 1400  # The number of images you wish to generate. Make sure this is
                ## greater than or equal to the number of images you're trying to generate

##### OPTIONAL: change these if needed
which_epoch = 'latest' # The epoch you wish to generate images from (e.g. '20') (Defult: latest. I recommend this)


In [None]:
# Run this cell to generate your images. This may take a few minutes.
!python test.py --name=$dataset_name --dataroot=./datasets/$dataset_name --checkpoints_dir checkpoints --results_dir=./results/$experiment_name --which_epoch=$which_epoch --how_many=$how_many --no_instance --label_nc 0 --loadSize=$loadSize --fineSize=$fineSize

## 7. Creating videos from your generated images

### 7a. Create video of your test input images
_Thread your test_A images together into a video. If your input images used Canny Edge, then this will be a video of your Canny Edge inputs._

1. **-i**: the input images _(the filepath to your synthesized images)_
1. **-r**: the framerate _(any value between 1-60)_
1. **-crf**: the compression quality of the output video _(lower is better, 17-25 is a good range)_
1. **_the output filename_**: the last argument; make sure to set this each time you create a new video

In [None]:
# Change 'input_images_sequence.mp4' to something more descriptive (e.g. edges2cats_input_edges.mp4).
# MAKE SURE TO CHANGE THIS EACH TIME TO AVOID OVERWRITING OTHER GENERATED VIDEOS
!ffmpeg \
   -pattern_type glob \
   -i "./results/$experiment_name/$dataset_name/test_latest/images/*_input_label.png" \
   -r 30 \
   -vcodec libx264 \
   -crf 24 \
   -pix_fmt yuv420p \
   ./generated_videos/input_images_sequence.mp4

### 7b. Create video of your synthesized images
_Thread your synthesized images together into a video._

1. **-i**: the input images _(the filepath to your synthesized images)_
1. **-r**: the framerate _(any value between 1-60)_
1. **-crf**: the compression quality of the output video _(lower is better, 17-25 is a good range)_
1. **_the output filename_**: the last argument; make sure to set this each time you create a new video

In [None]:
# Change 'synthesized_output' to something more related to this experiment (e.g. edges2cats_synthesized.mp4).
# MAKE SURE TO CHANGE THIS EACH TIME TO AVOID OVERWRITING OTHER GENERATED VIDEOS
!ffmpeg \
   -pattern_type glob \
   -i "./results/$experiment_name/$dataset_name/test_latest/images/*_synthesized_image.png" \
   -r 30 \
   -vcodec libx264 \
   -crf 23 \
   -pix_fmt yuv420p \
   ./generated_videos/doug2forest.mp4