<a href="https://colab.research.google.com/github/bkkaggle/pytorch-CycleGAN-and-pix2pix/blob/master/CycleGAN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Take a look at the [repository](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) for more information

# Install

In [None]:
!git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix

In [None]:
import os
os.chdir('pytorch-CycleGAN-and-pix2pix/')

In [None]:
!pip install -r requirements.txt

# Datasets

Download one of the official datasets with:

-   `bash ./datasets/download_cyclegan_dataset.sh [apple2orange, summer2winter_yosemite, horse2zebra, monet2photo, cezanne2photo, ukiyoe2photo, vangogh2photo, maps, cityscapes, facades, iphone2dslr_flower, ae_photos]`

Or use your own dataset by creating the appropriate folders and adding in the images.

-   Create a dataset folder under `/dataset` for your dataset.
-   Create subfolders `testA`, `testB`, `trainA`, and `trainB` under your dataset's folder. Place any images you want to transform from a to b (cat2dog) in the `testA` folder, images you want to transform from b to a (dog2cat) in the `testB` folder, and do the same for the `trainA` and `trainB` folders.

In [None]:
# !bash ./datasets/download_cyclegan_dataset.sh horse2zebra

### Code to move all HE files in HE directory and US files in US directory to trainA and trainB for CycleGAN training (Registered tiles, using the compositoin excel sheet):


In [None]:
import os
import shutil
from tqdm import tqdm

source_directory = r'\\shelter\Kyu\unstain2stain\tiles\unregistered_tiles\HE'
destination_directory = r'C:\Users\Kevin\PycharmProjects\pix2pix\pytorch-CycleGAN-and-pix2pix\datasets\trainfold_B'
count = 1

for subdir, dirs, files in os.walk(source_directory):
    for file in tqdm(files,colour='red'):
        if file.endswith(".png"):
            file_path = os.path.join(subdir, file)
            wsi_name = file_path.split("HE")[1].split("\\")[1]
            file_name = os.path.splitext(file)[0]
            file_extension = os.path.splitext(file)[1]
            destination_file_path = os.path.join(destination_directory, file_name + "_" + wsi_name + file_extension)
            shutil.copyfile(file_path, destination_file_path)

In [None]:
from tqdm import tqdm
import os
import shutil

source_directory = r'\\shelter\Kyu\unstain2stain\tiles\unregistered_tiles\US'
destination_directory = r'C:\Users\Kevin\PycharmProjects\pix2pix\pytorch-CycleGAN-and-pix2pix\datasets\trainfold_A'
count = 1

for subdir, dirs, files in os.walk(source_directory):
    for file in tqdm(files,colour='red'):
        if file.endswith(".png"):
            file_path = os.path.join(subdir, file)
            wsi_name = file_path.split("US")[1].split("\\")[1]
            file_name = os.path.splitext(file)[0]
            file_extension = os.path.splitext(file)[1]
            destination_file_path = os.path.join(destination_directory, file_name + "_" + wsi_name + file_extension)
            shutil.copyfile(file_path, destination_file_path)


### Code to move all HE files in HE directory and US files in US directory to trainA and trainB for CycleGAN training (Unregistered tiles):

In [None]:
import os
import shutil
from tqdm import tqdm

source_directory = r'\\shelter\Kyu\unstain2stain\tiles\unregistered_tiles\HE'
destination_directory = r'C:\Users\Kevin\PycharmProjects\pix2pix\pytorch-CycleGAN-and-pix2pix\datasets\trainfold_B'
count = 1

for subdir, dirs, files in os.walk(source_directory):
    for file in tqdm(files,colour='red'):
        if file.endswith(".png"):
            file_path = os.path.join(subdir, file)
            wsi_name = file_path.split("HE")[1].split("\\")[1]
            file_name = os.path.splitext(file)[0]
            file_extension = os.path.splitext(file)[1]
            destination_file_path = os.path.join(destination_directory, file_name + "_" + wsi_name + file_extension)
            shutil.copyfile(file_path, destination_file_path)

In [None]:
from tqdm import tqdm
import os
import shutil

source_directory = r'\\shelter\Kyu\unstain2stain\tiles\unregistered_tiles\US'
destination_directory = r'C:\Users\Kevin\PycharmProjects\pix2pix\pytorch-CycleGAN-and-pix2pix\datasets\trainfold_A'
count = 1

for subdir, dirs, files in os.walk(source_directory):
    for file in tqdm(files,colour='red'):
        if file.endswith(".png"):
            file_path = os.path.join(subdir, file)
            wsi_name = file_path.split("US")[1].split("\\")[1]
            file_name = os.path.splitext(file)[0]
            file_extension = os.path.splitext(file)[1]
            destination_file_path = os.path.join(destination_directory, file_name + "_" + wsi_name + file_extension)
            shutil.copyfile(file_path, destination_file_path)


# Pretrained models

Download one of the official pretrained models with:

-   `bash ./scripts/download_cyclegan_model.sh [apple2orange, orange2apple, summer2winter_yosemite, winter2summer_yosemite, horse2zebra, zebra2horse, monet2photo, style_monet, style_cezanne, style_ukiyoe, style_vangogh, sat2map, map2sat, cityscapes_photo2label, cityscapes_label2photo, facades_photo2label, facades_label2photo, iphone2dslr_flower]`

Or add your own pretrained model to `./checkpoints/{NAME}_pretrained/latest_net_G.pt`

In [None]:
# !bash ./scripts/download_cyclegan_model.sh horse2zebra

# Training

-   `python train.py --dataroot ./datasets/horse2zebra --name horse2zebra --model cycle_gan`

Change the `--dataroot` and `--name` to your own dataset's path and model's name. Use `--gpu_ids 0,1,..` to train on multiple GPUs and `--batch_size` to change the batch size. I've found that a batch size of 16 fits onto 4 V100s and can finish training an epoch in ~90s.

Once your model has trained, copy over the last checkpoint to a format that the testing model can automatically detect:

Use `cp ./checkpoints/horse2zebra/latest_net_G_A.pth ./checkpoints/horse2zebra/latest_net_G.pth` if you want to transform images from class A to class B and `cp ./checkpoints/horse2zebra/latest_net_G_B.pth ./checkpoints/horse2zebra/latest_net_G.pth` if you want to transform images from class B to class A.


In [None]:
# !python train.py --dataroot ./datasets/horse2zebra --name horse2zebra --model cycle_gan --display_id -1

In [6]:
!python train.py --dataroot C:\Users\Kevin\PycharmProjects\pix2pix\pytorch-CycleGAN-and-pix2pix\datasets --name unstain2stain_cyclegan --model cycle_gan --direction AtoB --display_id -1 --display_freq 400 --display_ncols 4 --update_html_freq 1000 --print_freq 100 --checkpoints_dir C:\Users\Kevin\PycharmProjects\pix2pix\pytorch-CycleGAN-and-pix2pix\checkpoints --save_epoch_freq 5 --n_epochs 100 --preprocess scale_width_and_crop --load_size 1024 --crop_size 360 --num_threads 0 --batch_size 4

----------------- Options ---------------
               batch_size: 4                             	[default: 1]
                    beta1: 0.5                           
          checkpoints_dir: C:\Users\Kevin\PycharmProjects\pix2pix\pytorch-CycleGAN-and-pix2pix\checkpoints	[default: ./checkpoints]
           continue_train: False                         
                crop_size: 360                           	[default: 256]
                 dataroot: C:\Users\Kevin\PycharmProjects\pix2pix\pytorch-CycleGAN-and-pix2pix\datasets	[default: None]
             dataset_mode: unaligned                     
                direction: AtoB                          
              display_env: main                          
             display_freq: 400                           
               display_id: -1                            	[default: 1]
            display_ncols: 4                             
             display_port: 8097                          
           display_server: 

Traceback (most recent call last):
  File "C:\Users\Kevin\.conda\envs\wsi_analysis1\lib\site-packages\PIL\ImageFile.py", line 242, in load
    s = read(self.decodermaxblock)
  File "C:\Users\Kevin\.conda\envs\wsi_analysis1\lib\site-packages\PIL\PngImagePlugin.py", line 936, in load_read
    cid, pos, length = self.png.read()
  File "C:\Users\Kevin\.conda\envs\wsi_analysis1\lib\site-packages\PIL\PngImagePlugin.py", line 177, in read
    length = i32(s)
  File "C:\Users\Kevin\.conda\envs\wsi_analysis1\lib\site-packages\PIL\_binary.py", line 85, in i32be
    return unpack_from(">I", c, o)[0]
struct.error: unpack_from requires a buffer of at least 4 bytes for unpacking 4 bytes at offset 0 (actual buffer size is 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Kevin\PycharmProjects\pix2pix\pytorch-CycleGAN-and-pix2pix\train.py", line 44, in <module>
    for i, data in enumerate(dataset):  # inner loop within one ep

# Testing

-   `python test.py --dataroot datasets/horse2zebra/testA --name horse2zebra_pretrained --model test --no_dropout`

Change the `--dataroot` and `--name` to be consistent with your trained model's configuration.

> from https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix:
> The option --model test is used for generating results of CycleGAN only for one side. This option will automatically set --dataset_mode single, which only loads the images from one set. On the contrary, using --model cycle_gan requires loading and generating results in both directions, which is sometimes unnecessary. The results will be saved at ./results/. Use --results_dir {directory_path_to_save_result} to specify the results directory.

> For your own experiments, you might want to specify --netG, --norm, --no_dropout to match the generator architecture of the trained model.

### Code to move the registered tiles that need to be inferred to local directory for inference:

In [4]:
import os
import shutil
from tqdm import tqdm
registrated_ns_tiles_src = r'\\shelter\Kyu\unstain2stain\tiles\registrated_tiles\Unstained\OTS_14832_5'
dst_src = r'C:\Users\Kevin\PycharmProjects\pix2pix\pytorch-CycleGAN-and-pix2pix\datasets\test_cyclegan'
imlist = [x for x in os.listdir(registrated_ns_tiles_src) if x.endswith('.png')]
imlist1 = [os.path.join(registrated_ns_tiles_src,x) for x in imlist]

for filename in tqdm(imlist1, desc="Number of images processed", colour = 'red'):
    shutil.copy(filename,dst_src)
print("Images successfully copied!")

Number of images processed: 100%|[31m██████████[0m| 9890/9890 [42:58<00:00,  3.84it/s]  

Images successfully copied!





In [None]:
# # test if this multiprocessing function makes copying faster. Original one takes ~40 minutes
# import os
# import shutil
# from tqdm import tqdm
# from pathos.multiprocessing import ProcessingPool as Pool
#
# registrated_ns_tiles_src = r'\\shelter\Kyu\unstain2stain\tiles\registrated_tiles\Unstained\OTS_14832_4'
# dst_src = r'C:\Users\Kevin\PycharmProjects\pix2pix\pytorch-CycleGAN-and-pix2pix\datasets\test_cyclegan'
# imlist = [x for x in os.listdir(registrated_ns_tiles_src) if x.endswith('.png')]
# imlist1 = [os.path.join(registrated_ns_tiles_src,x) for x in imlist]
#
# def copy_file(filename):
#     shutil.copy(filename,dst_src)
#
# if __name__ == '__main__':
#     with Pool() as pool:
#         pool.map(copy_file, imlist1)
#
# print("Images successfully copied!")

In [5]:
#check if all files copied:
print(len(imlist1) == len(os.listdir(dst_src)))
print(len(imlist1))

True
9890


In [6]:
print(len(os.listdir(dst_src)))

9890


In [7]:
import time
start = time.time()

In [8]:
# run inference:
!python test.py --dataroot C:\Users\Kevin\PycharmProjects\pix2pix\pytorch-CycleGAN-and-pix2pix\datasets\test_cyclegan --name unstain2stain_cyclegan --model test --no_dropout --direction AtoB --load_size 1024 --crop_size 1024 --num_test 9890

----------------- Options ---------------
             aspect_ratio: 1.0                           
               batch_size: 1                             
          checkpoints_dir: ./checkpoints                 
                crop_size: 1024                          	[default: 256]
                 dataroot: C:\Users\Kevin\PycharmProjects\pix2pix\pytorch-CycleGAN-and-pix2pix\datasets\test_cyclegan	[default: None]
             dataset_mode: single                        
                direction: AtoB                          
          display_winsize: 256                           
                    epoch: latest                        
                     eval: False                         
                  gpu_ids: 0                             
                init_gain: 0.02                          
                init_type: normal                        
                 input_nc: 3                             
                  isTrain: False                       

In [9]:
end = time.time()
infer_sec = round(end-start)
infer_min = round((end-start)/60)
print("time it took for inferring {} images is {} minutes, or {} seconds.".format(len(imlist),infer_min,infer_sec))
infer_rate = round((len(imlist)/infer_sec),3)
print("time it took to infer one image of 1024 x 1024 is: {} seconds per image".format(infer_rate))

time it took for inferring 9890 images is 152 minutes, or 9091 seconds.
time it takes to infer one image of 1024 x 1024 is: 1.088 seconds per image
