<a href="https://colab.research.google.com/github/TobiasSunderdiek/cartoon-gan/blob/master/CartoonGAN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CartoonGAN

This notebook contains the implementation of the cartoon GAN model. It is implemented with PyTorch. See README [here](https://github.com/TobiasSunderdiek/cartoon-gan/blob/master/README.md) for more details.

## Generate dataset

## Transfer data via google drive
- all image data in this notebook is expected to be zipped to files on local computer as described in README of this project [here](https://github.com/TobiasSunderdiek/cartoon-gan/blob/master/README.md)
- create folder `cartoonGAN` in `My Drive` in google drive
- copy .zip-files `coco.zip`, `safebooru.zip` and `safebooru_smoothed.zip` to google drive `My Drive`/`cartoonGAN`
- mount google drive in this notebook by executing cell below

In [0]:
from google.colab import drive
drive.mount('/content/data')

### cartoons images

- cartoon images are located in file `content/data/My Drive/cartoonGAN/safebooru.zip` of this notebook
- extract images and place in folder `cartoons` by executing cell below

In [0]:
!mkdir cartoons
!mkdir cartoons/1
!unzip -n -q /content/data/My\ Drive/cartoonGAN/safebooru.zip -d cartoons/1/ #extract to subfolder due to DataLoader needs subdirectories

##### data-loader

As mentioned in the paper, the used image size is 256x256 pixel.

The Generator uses relu as activation function, which generates values in [0.0, 1.0]. As the ToTensor()-method changes the range of the input image from RGB [0, 255] to [0.0, 1.0], we get the same range for all images.

In [0]:
import torch
from torch.utils.data import DataLoader
from torchvision.datasets import ImageFolder
from torchvision import transforms
from torch.utils.data import random_split
import math

image_size = 256
batch_size = 16

transformer = transforms.Compose([
    transforms.CenterCrop(image_size),
    transforms.ToTensor() # ToTensor() changes the range of the values from [0, 255] to [0.0, 1.0]
])

cartoon_dataset = ImageFolder('cartoons/', transformer)
len_training_set = math.floor(len(cartoon_dataset) * 0.9)
len_valid_set = len(cartoon_dataset) - len_training_set

training_set, _ = random_split(cartoon_dataset, (len_training_set, len_valid_set))
cartoon_image_dataloader_train = DataLoader(training_set, batch_size, shuffle=True, num_workers=0)

#### show examples

In [0]:
import matplotlib.pyplot as plt
import numpy as np

def show_sample_image(dataloader):
  iterator = iter(dataloader)
  sample_batch, _ = iterator.next()
  first_sample_image_of_batch = sample_batch[0]
  print(first_sample_image_of_batch.size())
  print("Current range: {} to {}".format(first_sample_image_of_batch.min(), first_sample_image_of_batch.max()))
  plt.imshow(np.transpose(first_sample_image_of_batch.numpy(), (1, 2, 0)))

show_sample_image(cartoon_image_dataloader_train)

### edge-smoothed cartoons images

- edge-smoothed cartoon images are located in file `content/data/My Drive/cartoonGAN/safebooru_smoothed.zip` of this notebook
- extract images and place in folder `cartoons_smoothed` by executing cell below

In [0]:
!mkdir cartoons_smoothed
!mkdir cartoons_smoothed/1
!unzip -n -q /content/data/My\ Drive/cartoonGAN/safebooru_smoothed.zip -d cartoons_smoothed/1/ #extract to subfolder due to DataLoader needs subdirectories

##### data-loader

same configuration as cartoon data loader above

In [0]:
smoothed_cartoon_dataset = ImageFolder('cartoons_smoothed/', transformer)
len_training_set = math.floor(len(smoothed_cartoon_dataset) * 0.9)
len_valid_set = len(smoothed_cartoon_dataset) - len_training_set
training_set, _ = random_split(smoothed_cartoon_dataset, (len_training_set, len_valid_set))
smoothed_cartoon_image_dataloader_train = DataLoader(training_set, batch_size, shuffle=True, num_workers=0)

#### show examples

In [0]:
show_sample_image(smoothed_cartoon_image_dataloader_train)

### photos

- photos are located in file `content/data/My Drive/cartoonGAN/coco.zip` of this notebook
- extract images and place in folder `photos` by executing cell below

In [0]:
!mkdir photos
!mkdir photos/1
!unzip -n -q /content/data/My\ Drive/cartoonGAN/coco.zip -d photos/1 #extract to subfolder due to DataLoader needs subdirectories

##### data-loader
same configuration as cartoon data loader above

In [0]:
photo_dataset = ImageFolder('photos/', transformer)
len_training_set = math.floor(len(photo_dataset) * 0.9)
len_valid_set = len(photo_dataset) - len_training_set
training_set, validation_set = random_split(photo_dataset, (len_training_set, len_valid_set))
photo_dataloader_train = DataLoader(training_set, batch_size, shuffle=True, num_workers=0)
photo_dataloader_valid = DataLoader(validation_set, batch_size, shuffle=True, num_workers=0)

#### show examples

In [0]:
show_sample_image(photo_dataloader_train)

## Setup tensorboard

Use tensorboard to have an eye on weights and losses.

In [0]:
!mkdir /content/data/My\ Drive/cartoonGAN/tensorboard/

In [0]:
%tensorflow_version 1.x

from torch.utils.tensorboard import SummaryWriter

tensorboard_logdir = '/content/data/My Drive/cartoonGAN/tensorboard'
writer = SummaryWriter(tensorboard_logdir)

##Define model

The information about the model structure is given in the paper.

For the `zero padding` of the convolutional layers, I use the following formula:

$$Height x Width_{output} = \frac{HeightxWidth_{input} - kernel size + 2 padding}{stride} + 1$$

e.g:

- `conv_1` layer of generator: $HxW$ should stay the same as input size, which is 256x256 and `stride = 1`

$$256 = \frac{256-7+2padding}{1}+1$$

$$padding = 3$$

In case of a fraction as a result, I choose to ceil:

- `conv_2` layer of generator: $\frac{H}{2} x \frac{W}{2}$ is output with `stride=2`

$$128 = \frac{256-3+2padding}{2}+1$$

$$padding= \frac{1}{2} \Rightarrow padding=1$$

###Generator

In the up-convolutional part of the paper, two layers (conv_6 and conv_8 in my implementation) have a stride of 0.5. As pytorchs `conv2D()` does not allow floating point stride, I use a stride of 1 in both cases. Therefore even for the padding calculation I go with a stride of 1.

##### Learnings

After implementing the generator and getting some results out of the training, the generated images have a wide grey margin, like in this example image:

![failure image](https://github.com/TobiasSunderdiek/cartoon-gan/raw/master/assets/grey_margin.jpg)

I re-checked my implementation of the generator and stumbled across my interpretation of the stride for _conv_6_ and _conv_8_, which is _1/2_ in the paper.
Maybe I got the part of the stride wrong, and this is not _0.5_, but a tuple of (1,2)?
Then, I did a wrong padding calculation. This are the only layers where I calculated a very large padding of 33 and 65, which looks suspicious now.
Testing this, I also ended up with very high values for the padding.

The next problem was, that I used _Conv2d_ for up-sampling, but _Conv2d_ is for down-sampling. _ConvTranspose2d_ is for up-sampling, see [1]. I corrected my implementation accordingly.

By using _ConvTranspose2d_ with the values for _stride_ (1 or (1,2)) and _kernelsize_ and playing with _padding_, the resulting image keeps nearly the same dimension, shrinks or gets uneven dimensions. 

$$ HeightxWidth_{Output} = stride (HeightxWidth_{Input} - 1) + kernelsize - 2*padding$$

$$ HeightxWidth_{Output} = 1 * (64 - 1) + 3 - 2 * padding = 66 - 2 * padding = \bigg\{^{66, p = 0}_{<0, p < 0}$$

Uneven dimensions: I tested stride (1,2) with padding (3,2) and got 60x125.

But as mentioned in the paper, I need to scale from _H/4_ up to _H/2_, which is from 64 to 128, and then up to 256 in _conv_8_ and _conv_9_.

Therefore I decided to use _stride=2_ and _padding=1_ in _conv_6_, and _stride=2_ and _padding_=1_ in _conv_8_. To add the last pixel, I add an _output_padding_ of 1.

_conv_6_: $2*(64-1)+3-(2*1)=127 + 1$ _(outer_padding)_

_conv_8_: $2*(128-1)+3-(2*1)=255 + 1$ _(outer_padding)_

My implementation differs from the paper in the mentioned layers.

[1] https://medium.com/activating-robotic-minds/up-sampling-with-transposed-convolution-9ae4f2df52d0

[2] https://towardsdatascience.com/is-the-transposed-convolution-layer-and-convolution-layer-the-same-thing-8655b751c3a1

In [0]:
import torch.nn as nn
import torch.nn.functional as F
from torch import sigmoid

class ResidualBlock(nn.Module):
  def __init__(self):
    super(ResidualBlock, self).__init__()
    self.conv_1 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1)
    self.conv_2 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1)
    self.norm_1 = nn.BatchNorm2d(256)
    self.norm_2 = nn.BatchNorm2d(256)

  def forward(self, x):
    output = self.norm_2(self.conv_2(F.relu(self.norm_1(self.conv_1(x)))))
    return output + x #ES

class Generator(nn.Module):
    def __init__(self):
      super(Generator, self).__init__()
      self.conv_1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=1, padding=3)
      self.norm_1 = nn.BatchNorm2d(64)
      
      # down-convolution #
      self.conv_2 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=2, padding=1)
      self.conv_3 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1)
      self.norm_2 = nn.BatchNorm2d(128)
      
      self.conv_4 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=2, padding=1)
      self.conv_5 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1)
      self.norm_3 = nn.BatchNorm2d(256)
      
      # residual blocks #
      residualBlocks = []
      for l in range(8):
        residualBlocks.append(ResidualBlock())
      self.res = nn.Sequential(*residualBlocks)
      
      # up-convolution #
      self.conv_6 = nn.ConvTranspose2d(in_channels=256, out_channels=128, kernel_size=3, stride=2, padding=1, output_padding=1)
      self.conv_7 = nn.ConvTranspose2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1)
      self.norm_4 = nn.BatchNorm2d(128)

      self.conv_8 = nn.ConvTranspose2d(in_channels=128, out_channels=64, kernel_size=3, stride=2, padding=1, output_padding=1)
      self.conv_9 = nn.ConvTranspose2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1)
      self.norm_5 = nn.BatchNorm2d(64)
      
      self.conv_10 = nn.Conv2d(in_channels=64, out_channels=3, kernel_size=7, stride=1, padding=3)

    def forward(self, x):
      x = F.relu(self.norm_1(self.conv_1(x)))
      
      x = F.relu(self.norm_2(self.conv_3(self.conv_2(x))))
      x = F.relu(self.norm_3(self.conv_5(self.conv_4(x))))
      
      x = self.res(x)
      x = F.relu(self.norm_4(self.conv_7(self.conv_6(x))))
      x = F.relu(self.norm_5(self.conv_9(self.conv_8(x))))

      x = self.conv_10(x)

      x = sigmoid(x)

      return x

In [0]:
G = Generator()
print(G)

### Discriminator

In [0]:
class Discriminator(nn.Module):
  def __init__(self):
     super(Discriminator, self).__init__()
     self.conv_1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1)
      
     self.conv_2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=2, padding=1)
     self.conv_3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1)
     self.norm_1 = nn.BatchNorm2d(128)
      
     self.conv_4 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=2, padding=1)
     self.conv_5 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=1, padding=1)
     self.norm_2 = nn.BatchNorm2d(256)
    
     self.conv_6 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1)
     self.norm_3 = nn.BatchNorm2d(256)
    
     self.conv_7 = nn.Conv2d(in_channels=256, out_channels=1, kernel_size=3, stride=1, padding=1)

  def forward(self, x):
    x = F.leaky_relu(self.conv_1(x))
    x = F.leaky_relu(self.norm_1(self.conv_3(F.leaky_relu(self.conv_2(x)))), negative_slope=0.2)
    x = F.leaky_relu(self.norm_2(self.conv_5(F.leaky_relu(self.conv_4(x)))), negative_slope=0.2)
    x = F.leaky_relu(self.norm_3(self.conv_6(x)), negative_slope=0.2)
    x = self.conv_7(x)
    x = sigmoid(x)
    
    return x

In [0]:
D = Discriminator()
print(D)

### use device CPU or GPU

In [0]:
import torch

device = torch.device('cpu')

if torch.cuda.is_available():
  device = torch.device('cuda')
  print("Train on GPU.")
else:
  print("No cuda available")

G.to(device)
D.to(device)

## Loss function
$\mathcal{L}(G, D) = \mathcal{L}_{adv}(G, D) + ω\mathcal{L}_{con}(G, D)$ with ω = 10

This loss is used to train the discriminator and the generator. In the adversarial part, the discriminator tries to classify the generated images as fakes.

During the generator training, the generator tries to minimize the classifications, where the discriminator classifies the generated image as fake. The generator has only affect on the parts of the formula where $G()$ is involved, so the generator tries to minimize this part. Additionally, the loss is not directly calculated from the generator output, but from the discriminator output. Due to the fact that the generator output is the input for the discriminator output in the generator training, the generator is in the chain of the backpropagation, when the loss from the discriminator output is backprogagated all the way back through the discriminator model and generator model to the photo image input data [1], [2].

[1] https://developers.google.com/machine-learning/gan/generator

[2] https://towardsdatascience.com/only-numpy-implementing-gan-general-adversarial-networks-and-adam-optimizer-using-numpy-with-2a7e4e032021

### Adversarial loss

The adversarial loss  $\mathcal{L}_{adv}(G, D)$ which drives the generator to transform photo to comic style of the image. Its value indicates if the output looks like a cartoon image or not. The paper highlights, that a characteristic part of cartoons images are the clear edges, which are a small detail of the image, must be preserved to generate clear edges in the result. In the paper, this is solved by training not only with cartoon images but additionaly by training with the same cartoon images with smoothed edges so that the discriminator can distinguish between clear and smooth edges. For achieving this the authors define the edge-promoting adversarial loss function:

$\mathcal{L}_{adv}(G, D) = \mathbb{E}_{ci∼S_{data}(c)}[log D(c_i)]
+ \mathbb{E}_{ej∼S_{data}(e)}[log(1 − D(e_j))]
+ \mathbb{E}_{pk∼S_{data}(p)}[log(1 − D(G(p_k)))]$

- for the discriminator, this is the formula for the loss function, because output of the Discriminator plays no role within the content loss part of the loss function.

- for the initialization phase of the generator, this part of the formula is not used as described in the paper.

- for the training phase of the generator, only the part of the formula is used within the generator loss function, which the generator can affect: $\mathbb{E}_{pk∼S_{data}(p)}[log(1 − D(G(p_k)))]$

### Content loss
The content loss $ω\mathcal{L}_{con}(G, D)$ which preserves the semantic content during transformation. To calculate this, in the paper the high-level feature maps of the VGG network is used, in particular the layer ($l$) `conv4_4`. The output of the layer $l$ for the original photo is subtracted from the output of the layer $l$ of the generated image. The result is regularized using the $\mathcal{L_1}$ spare regularization ($||...||_1$):

$\mathcal{L}_{con}(G, D)= \mathbb{E}_{pi~S_{data}(p)}[||VGG_l(G(p_i))-VGG_l(p_i)||_1]$

This part of the formula plays a role in the loss function for the generator, not for the discriminator, because only the generator is used within this formula.

More info about $\mathcal{L_1}$ regularization:

https://medium.com/mlreview/l1-norm-regularization-and-sparsity-explained-for-dummies-5b0e4be3938a

https://medium.com/@montjoile/l0-norm-l1-norm-l2-norm-l-infinity-norm-7a7d18a4f40c

### VGG-16
Load already downloaded vgg-16 weights from drive, or download and save to drive.

In [0]:
from torchvision import models

path_to_pretrained_vgg16 = '/content/data/My Drive/cartoonGAN/vgg16-397923af.pth'

try:
  pretrained = torch.load(path_to_pretrained_vgg16)
  vgg16 = models.vgg16(pretrained=False)
  vgg16.load_state_dict(pretrained)
  vgg16 = vgg16.to(device)
except FileNotFoundError:
  vgg16 = models.vgg16(pretrained=True)
  torch.save(vgg16, path_to_pretrained_vgg16)

print(vgg16)

# due VGG16 has 5 pooling-layer, I assume conv4_4 is the 4th pooling layer
# (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
feature_extractor = vgg16.features[:24]
for param in feature_extractor.parameters():
  param.require_grad = False

print(feature_extractor)

### Two loss functions

- discriminator loss
- generator initialization phase loss and generator loss

##### Learnings

At this section I will describe my learnings during implementing/testing the loss functions.

At my first implementation, I took the output of the discriminator with size _batch_size x 1 x 64 x 64_ as input to the discriminator loss (to be precise, all three outputs of D, from D(cartoon_image), D(smoothed_cartoon_image) and D(G(photo))).

As the adversarial loss outputs a probability which indicates if the input is detected as fraud or not, it returns a single value. To reach this, I took the input tensor with shape _batch_size x 1 x 64 x 64_, and implemented the loss function 

$\mathcal{L}_{adv}(G, D) = \mathbb{E}_{ci∼S_{data}(c)}[log D(c_i)]
+ \mathbb{E}_{ej∼S_{data}(e)}[log(1 − D(e_j))]
+ \mathbb{E}_{pk∼S_{data}(p)}[log(1 − D(G(p_k)))]$

manually as 

$torch.log(torch.abs(D(...)) + torch.log(torch.abs(1 - D(...)) + torch.log(torch.abs(1 - D(...))$

As the discriminator output sometimes contains negative values, calling _log()_ directly with this value causes an error. Therefore I wrapped _abs()_ around the input of _log()_.

As my training results weren't as expected, I came back to the loss functions. As an adversarial loss outputs a probability, thus a single value, my discriminator outputs a tensor with shape _batch_size x 1 x 64 x 64_.

So to get a probability out of this tensor, either an activation function is needed, or I made a mistake in implementing the discriminator and it should output a probability. After thinking about this, I went to use an activation function. To reach this, I can either activate the output of the last layer of the discriminator, or I can use a loss function like BCEWithLogitsLoss, which combines activation function and loss.

But which activation function to use?

As the discriminator should give a probability and only has two classes as outputs, _real_ or _fake_, using sigmoid or softmax is a good choice. Softmax can be used for binary classification as well as classification of _n_-classes.

First, I decided to use a loss function, which combines activation and loss function, and this gave me the choice between:

- _BCEWithLogitsLoss_: Sigmoid and binary cross entropy loss

- _CrossEntroyLoss_: Softmax and negative log likelihood loss

For solving a minimax-problem, which loss to choose?

"_If [minimax] implemented directly, this would require changes be made to model weights using stochastic ascent rather than stochastic descent.
It is more commonly implemented as a traditional binary classification problem with labels 0 and 1 for generated and real images respectively._" see [1].

Therefore I choosed _BCEWithLogitsLoss_.

As _BCEWithLogitsLoss_ has two parameters, one for the input and one for the target, I used _BCEWithLogitsLoss_ three times, one for every different input, and added the values up.

But, after trying to go with this solution, the generator produces values lower than zero. This lead to problems when trying to map these values to RGB. Therefore I decide to not combine activation and loss function, and use sigmoid in the generator as well as in the discriminator directly and use _BCELoss_ as loss function.

[1] https://machinelearningmastery.com/generative-adversarial-network-loss-functions/

In [0]:
from torchvision import models
from torch.nn import BCELoss

class DiscriminatorLoss(torch.nn.Module):
  def __init__(self):
      super(DiscriminatorLoss, self).__init__()
      self.bce_loss = BCELoss()

  def forward(self, discriminator_output_of_cartoon_input,
              discriminator_output_of_cartoon_smoothed_input,
              discriminator_output_of_generated_image_input,
              epoch,
              write_to_tensorboard=False):

    return self._adversarial_loss(discriminator_output_of_cartoon_input,
                     discriminator_output_of_cartoon_smoothed_input,
                     discriminator_output_of_generated_image_input,
                     epoch,
                     write_to_tensorboard)

  def _adversarial_loss(self, discriminator_output_of_cartoon_input,
                     discriminator_output_of_cartoon_smoothed_input,
                     discriminator_output_of_generated_image_input,
                     epoch,
                     write_to_tensorboard):

    # define ones and zeros here instead within __init__ due to have same shape as input
    # due to testing different batch_sizes, sometimes the "last batch" has < batch_size elements
    actual_batch_size = discriminator_output_of_cartoon_input.size()[0]
    zeros = torch.zeros([actual_batch_size, 1, 64, 64]).to(device)
    ones = torch.ones([actual_batch_size, 1, 64, 64]).to(device)

    d_loss_cartoon = self.bce_loss(discriminator_output_of_cartoon_input, ones)
    d_loss_cartoon_smoothed = self.bce_loss(discriminator_output_of_cartoon_smoothed_input, zeros)
    d_loss_generated_input = self.bce_loss(discriminator_output_of_generated_image_input, zeros)

    d_loss = d_loss_cartoon + d_loss_cartoon_smoothed + d_loss_generated_input

    if write_to_tensorboard:
      writer.add_scalar('d_loss_cartoon', d_loss_cartoon,epoch)
      writer.add_scalar('d_loss_cartoon_smoothed', d_loss_cartoon_smoothed, epoch)
      writer.add_scalar('d_loss_generated_input', d_loss_generated_input, epoch)
      writer.add_scalar('d_loss', d_loss, epoch)

    return d_loss

#### Hyperparameter omega ($\omega$)

Initially, I set $\omega$, which is a weight to balance the style and the content preservation, to the value given in the paper, which is 10. After running 210 epochs, the content preservation was very good, but the generated images do not have cartoon styles. Maybe this is a problem with my input data, where I use different cartoon styles from different artists instead from one single artist, as used in the paper.

Examine the parts of the generator loss over time, the following values are observable:

<table>
<tr>
<td>
<img src="https://github.com/TobiasSunderdiek/cartoon-gan/raw/master/assets/g_content_loss.png" alt="g_content_loss" width="400">
</td><td>
<img src="https://github.com/TobiasSunderdiek/cartoon-gan/raw/master/assets/g_adversarial_loss.png" alt="g_content_loss" width="400">
</td>
</tr>
</table>

So the content loss is magnitudes higher than the adversarial loss.
Maybe my calculation of the content loss is wrong? Should it be much lower? As the generated images preserve the content very good, I concentrate on the adversarial loss.

As the adversarial loss is responsible for the comic-effect, I try a much lower $\omega$, to balance the values of g_content_loss and g_adversarial_loss on an equal level.

As g_content_loss has values of $4e+5$, I choose $\omega=0.00001$.

In [0]:
class GeneratorLoss(torch.nn.Module):
  def __init__(self):
      super(GeneratorLoss, self).__init__()
      self.w = 0.00001
      self.bce_loss = BCELoss()
      self.feature_extractor = vgg16.features[:24]
      for param in self.feature_extractor.parameters():
        param.require_grad = False

  def forward(self, discriminator_output_of_generated_image_input,
              generator_input,
              generator_output,
              epoch,
              is_init_phase=False,
              write_to_tensorboard=False):
    if is_init_phase:
      g_content_loss = self._content_loss(generator_input, generator_output)
      g_adversarial_loss = 0.0
      g_loss = g_content_loss
    else:
      g_adversarial_loss = self._adversarial_loss_generator_part_only(discriminator_output_of_generated_image_input)
      g_content_loss = self._content_loss(generator_input, generator_output)
      g_loss = g_adversarial_loss + self.w * g_content_loss

    if write_to_tensorboard:
      writer.add_scalar('g_adversarial_loss', g_adversarial_loss, epoch)
      writer.add_scalar('g_content_loss', g_content_loss, epoch)
      writer.add_scalar('g_loss', g_loss, epoch)

    return g_loss

  def _adversarial_loss_generator_part_only(self, discriminator_output_of_generated_image_input):
    actual_batch_size = discriminator_output_of_generated_image_input.size()[0]
    ones = torch.ones([actual_batch_size, 1, 64, 64]).to(device)
    return self.bce_loss(discriminator_output_of_generated_image_input, ones)

  def _content_loss(self, generator_input, generator_output):
    return (self.feature_extractor(generator_output) - self.feature_extractor(generator_input)).norm(p=1)

In [0]:
discriminatorLoss = DiscriminatorLoss()
generatorLoss = GeneratorLoss()

## Optimizer
In the paper, the used optimizer is not mentioned, I decide to choose adam.

For hyperparameter-tuning, I decided to go with the same parameters mentioned in the DCGAN-paper[1].

[1] https://arxiv.org/pdf/1511.06434.pdf

In [0]:
import torch.optim as optim

lr = 0.0002
beta1 = 0.5
beta2 = 0.999

d_optimizer = optim.Adam(D.parameters(), lr, [beta1, beta2])
g_optimizer = optim.Adam(G.parameters(), lr, [beta1, beta2])

## Saving
To make training resumeable, I save some checkpoints to google drive and load them, if existing, before run the training.

I also save weights and bias from generator and discriminator to tensorboard.

For checking some intermediate images of the generator, I save them to google drive.

In [0]:
!mkdir /content/data/My\ Drive/cartoonGAN/checkpoints/
!mkdir -p /content/data/My\ Drive/cartoonGAN/intermediate_results/training/
intermediate_results_training_path = "/content/data/My Drive/cartoonGAN/intermediate_results/training/"

In [0]:
def save_training_result(input, output):
  # input/output has batch-size number of images, get first one and detach from tensor
  image_input = input[0].detach().cpu().numpy()
  image_output = output[0].detach().cpu().numpy()
  # transponse image from torch.Size([3, 256, 256]) to (256, 256, 3)
  image_input = np.transpose(image_input, (1, 2, 0))
  image_output = np.transpose(image_output, (1, 2, 0))

  # generate filenames as timestamp, this orders the output by time
  filename = str(int(time.time()))
  path_input = intermediate_results_training_path + filename + "_input.jpg"
  path_output = intermediate_results_training_path + filename + ".jpg"
  plt.imsave(path_input, image_input)
  plt.imsave(path_output, image_output)

In [0]:
def write_model_weights_and_bias_to_tensorboard(prefix, state_dict, epoch):
  for param in state_dict:
      writer.add_histogram(f"{prefix}_{param}", state_dict[param], epoch)

## Training

In [0]:
import time

def train(_num_epochs, checkpoint_dir, best_valid_loss, epochs_already_done, losses, validation_losses):
  init_epochs = 10
  print_every = 100
  start_time = time.time()

  for epoch in range(_num_epochs - epochs_already_done):
    epoch = epoch + epochs_already_done

    for index, ((photo_images, _), (smoothed_cartoon_images, _), (cartoon_images, _)) in enumerate(zip(photo_dataloader_train, smoothed_cartoon_image_dataloader_train, cartoon_image_dataloader_train)):
      batch_size = photo_images.size(0)
      photo_images = photo_images.to(device)
      smoothed_cartoon_images = smoothed_cartoon_images.to(device)
      cartoon_images = cartoon_images.to(device)

      # train the discriminator
      d_optimizer.zero_grad()
      
      d_of_cartoon_input = D(cartoon_images)
      d_of_cartoon_smoothed_input = D(smoothed_cartoon_images)
      d_of_generated_image_input = D(G(photo_images))

      write_only_one_loss_from_epoch_not_every_batch_loss = (index == 0)

      d_loss = discriminatorLoss(d_of_cartoon_input,
                                 d_of_cartoon_smoothed_input,
                                 d_of_generated_image_input,
                                 epoch,
                                 write_to_tensorboard=write_only_one_loss_from_epoch_not_every_batch_loss)

      d_loss.backward()
      d_optimizer.step()

      # train the generator
      g_optimizer.zero_grad()

      g_output = G(photo_images)
      #save some intermediate results during training
      if (index % 10) == 0:
        save_training_result(photo_images, g_output)

      d_of_generated_image_input = D(g_output)

      if epoch < init_epochs:
        # init
        init_phase = True
      else:
        # train
        init_phase = False

      g_loss = generatorLoss(d_of_generated_image_input,
                              photo_images,
                              g_output,
                              epoch,
                              is_init_phase=init_phase,
                              write_to_tensorboard=write_only_one_loss_from_epoch_not_every_batch_loss)

      g_loss.backward()
      g_optimizer.step()

      if (index % print_every) == 0:
        losses.append((d_loss.item(), g_loss.item()))
        now = time.time()
        current_run_time = now - start_time
        start_time = now
        print("Epoch {}/{} | d_loss {:6.4f} | g_loss {:6.4f} | time {:2.0f}s | total no. of losses {}".format(epoch+1, _num_epochs, d_loss.item(), g_loss.item(), current_run_time, len(losses)))
    
    # write to tensorboard
      write_model_weights_and_bias_to_tensorboard('D', D.state_dict(), epoch)
      write_model_weights_and_bias_to_tensorboard('G', G.state_dict(), epoch)

    # validate
    with torch.no_grad():
      D.eval()
      G.eval()

      for batch_index, (photo_images, _) in enumerate(photo_dataloader_valid):
        photo_images = photo_images.to(device)

        g_output = G(photo_images)
        d_of_generated_image_input = D(g_output)
        g_valid_loss = generatorLoss(d_of_generated_image_input,
                                      photo_images,
                                      g_output,
                                      epoch,
                                      is_init_phase=init_phase,
                                      write_to_tensorboard=write_only_one_loss_from_epoch_not_every_batch_loss)

        if batch_index % print_every == 0:
          validation_losses.append(g_valid_loss.item())
          now = time.time()
          current_run_time = now - start_time
          start_time = now
          print("Epoch {}/{} | validation loss {:6.4f} | time {:2.0f}s | total no. of losses {}".format(epoch+1, _num_epochs, g_valid_loss.item(), current_run_time, len(validation_losses)))

    D.train()
    G.train()
    
    if(g_valid_loss.item() < best_valid_loss):
      print("Generator loss improved from {} to {}".format(best_valid_loss, g_valid_loss.item()))
      best_valid_loss = g_valid_loss.item()
  
    # save checkpoint
    checkpoint = {'g_valid_loss': g_valid_loss.item(),
                  'best_valid_loss': best_valid_loss,
                  'losses': losses,
                  'validation_losses': validation_losses,
                  'last_epoch': epoch+1,
                  'd_state_dict': D.state_dict(),
                  'g_state_dict': G.state_dict(),
                  'd_optimizer_state_dict': d_optimizer.state_dict(),
                  'g_optimizer_state_dict': g_optimizer.state_dict()
                }
    print("Save checkpoint for validation loss of {}".format(g_valid_loss.item()))
    torch.save(checkpoint, checkpoint_dir + '/checkpoint_epoch_{:03d}.pth'.format(epoch+1))
    if(best_valid_loss == g_valid_loss.item()):
      print("Overwrite best checkpoint")
      torch.save(checkpoint, checkpoint_dir + '/best_checkpoint.pth')
    
  return losses, validation_losses

In [0]:
from os import listdir

checkpoint_dir = '/content/data/My Drive/cartoonGAN/checkpoints'
checkpoints = listdir(checkpoint_dir)
num_epochs = 200 + 10 # training + init phase
epochs_already_done = 0
best_valid_loss = math.inf
losses = []
validation_losses = []

if(len(checkpoints) > 0):
  last_checkpoint = sorted(checkpoints)[-1]
  checkpoint = torch.load(checkpoint_dir + '/' + last_checkpoint, map_location=torch.device(device))
  best_valid_loss = checkpoint['best_valid_loss']
  epochs_already_done = checkpoint['last_epoch']
  losses = checkpoint['losses']
  validation_losses = checkpoint['validation_losses']
  
  D.load_state_dict(checkpoint['d_state_dict'])
  G.load_state_dict(checkpoint['g_state_dict'])
  d_optimizer.load_state_dict(checkpoint['d_optimizer_state_dict'])
  g_optimizer.load_state_dict(checkpoint['g_optimizer_state_dict'])
  print('Load checkpoint {} with g_valid_loss {}, best_valid_loss {}, {} epochs and total no of losses {}'.format(last_checkpoint, checkpoint['g_valid_loss'], best_valid_loss, epochs_already_done, len(losses)))


In [0]:
losses, validation_losses = train(num_epochs, checkpoint_dir, best_valid_loss, epochs_already_done, losses, validation_losses)

In [0]:
%matplotlib inline
%config InlineBackend.figure.format = 'retina'

import matplotlib.pyplot as plt

d_losses = [x[0] for x in losses]
g_losses = [x[1] for x in losses]
plt.plot(d_losses, label='Discriminator training loss')
plt.plot(g_losses, label='Generator training loss')
plt.plot(validation_losses, label='Generator validation loss')
plt.legend(frameon=False)

### Show results in tensorboard

In [0]:
%load_ext tensorboard
%tensorboard --logdir='/content/data/My Drive/cartoonGAN/tensorboard'

### Plot losses

Losses after 210 epochs ![losses_after_210_epochs](https://github.com/TobiasSunderdiek/cartoon-gan/raw/master/assets/losses_after_210_epochs.png)

Epoch 210/210 | d_loss 0.0000 | g_loss 615573.0625 | time 194s | total no. of losses 648

Epoch 210/210 | validation loss 552576.4375 | time 45s | total no. of losses 216

### some generated results

#### direct after start, one of the first epochs

![some of the first epochs](https://github.com/TobiasSunderdiek/cartoon-gan/raw/master/assets/no_margin.jpg)

#### direct after init-phase is completed
**Epoch 10/210 | d_loss 0.0124 | g_loss 226864.062**

**Epoch 10/210 | validation loss 182869.5312**

Photo input ![epoch_10_photo_input](https://github.com/TobiasSunderdiek/cartoon-gan/raw/master/assets/epoch_10_photo_input.jpg) Generated image ![epoch_10_generated_image](https://github.com/TobiasSunderdiek/cartoon-gan/raw/master/assets/epoch_10_generated_image.jpg)

#### direct at the beginning of epoch 11 with use of full generator loss instead of init loss. These results seem to be outliers at this stage of training due to the next outputs look more similar like the inputs

Photo input ![epoch_11_photo_input](https://github.com/TobiasSunderdiek/cartoon-gan/raw/master/assets/epoch_11_photo_input.jpg) Generated image ![epoch_11_generated_image](https://github.com/TobiasSunderdiek/cartoon-gan/raw/master/assets/epoch_11_generated_image.jpg)

Photo input ![epoch_11_photo_input](https://github.com/TobiasSunderdiek/cartoon-gan/raw/master/assets/epoch_11_photo_input_2.jpg) Generated image ![epoch_11_generated_image](https://github.com/TobiasSunderdiek/cartoon-gan/raw/master/assets/epoch_11_generated_image_2.jpg)

#### Epoch 210

After training has finished 210 epochs, the output looks like this:


Photo input ![input_210_epochs](https://github.com/TobiasSunderdiek/cartoon-gan/raw/master/assets/input_210_epochs.jpg) Generated image ![generated_after_210_epochs.jpg](https://github.com/TobiasSunderdiek/cartoon-gan/raw/master/assets/generated_after_210_epochs.jpg)

##Inference

In [0]:
checkpoint = torch.load(checkpoint_dir + '/best_checkpoint.pth')
G_inference = Generator()
G_inference.load_state_dict(checkpoint['g_state_dict'])

In [0]:
test_images = iter(photo_dataloader_valid).next()[0]
result_images = G_inference(test_images)
print(result_images[0])
plt.imshow(np.transpose(result_images[0].detach().numpy(), (1, 2, 0)))

## Notes/next steps
- alternative lib for image processing: https://github.com/albu/albumentations
- figure out which variant of VGG to use (VGG-16?), and if the pre-training in the referenced paper is the same as the pre-trained pytorch version
- do I use correct normalization-method in content loss
- in which order is the discriminator trained regarding photos, cartoons with smoothed edges and then genereated images?
- evaluate result with existing model http://cg.cs.tsinghua.edu.cn/people/~Yongjin/CartoonGAN-Models.rar ?
- did I split the loss function correctly for the D and G model, and content loss only for G?
- plot results directly from vars via method
- in the paper 6.000 photo images and 2.000 - 4.000 cartoon images are used for training, how is this done with unbalanced datasets?
- is batch_size of 16 correct? Tried 32 before, but got CUDA OOM
- for image downloader: catch exception if image is truncated/check if zipping adds additional folder within .zip in create_smoothed_images.py
- upgrade to tensorboard 2