# CycleGAN for Faces Dataset

**Objective:** Implement CycleGAN for faces dataset.

Dataset Location:  https://susanqq.github.io/UTKFace/

Original CycleGAN Implementation in PyTorch is available at https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix

## Import Dataset

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [0]:
!cp '/content/drive/My Drive/App/CycleGAN/Faces/young.npz' /content
!cp '/content/drive/My Drive/App/CycleGAN/Faces/old.npz'   /content

## Imports

In [0]:
import numpy as np
import os
from PIL import Image

import matplotlib.pyplot as plt

%matplotlib inline
import datetime

## Load the dataset

In [8]:
# load the compressed data
young = np.load('/content/young.npz')
young_images = young['arr_0']
print(young_images.shape)

(1056, 256, 256, 3)


In [9]:
# load the old faces data
old = np.load('/content/old.npz')
old_images = old['arr_0']
print(old_images.shape)

(1056, 256, 256, 3)


## Rescale the values to -1 to 1

In [0]:
def rescale_input(data_arr):
  res_arr = (data_arr - 127.5) / 127.5
  return res_arr

## Install external packages

In [11]:
!pip install git+https://www.github.com/keras-team/keras-contrib.git

Collecting git+https://www.github.com/keras-team/keras-contrib.git
  Cloning https://www.github.com/keras-team/keras-contrib.git to /tmp/pip-req-build-qjgea1ao
  Running command git clone -q https://www.github.com/keras-team/keras-contrib.git /tmp/pip-req-build-qjgea1ao
Building wheels for collected packages: keras-contrib
  Building wheel for keras-contrib (setup.py) ... [?25l[?25hdone
  Created wheel for keras-contrib: filename=keras_contrib-2.0.8-cp36-none-any.whl size=101066 sha256=139eb35b977bedd695b79fc2e4e3fe666792f65a19b6d838ba4daa05844d145a
  Stored in directory: /tmp/pip-ephem-wheel-cache-71_309hw/wheels/11/27/c8/4ed56de7b55f4f61244e2dc6ef3cdbaff2692527a2ce6502ba
Successfully built keras-contrib
Installing collected packages: keras-contrib
Successfully installed keras-contrib-2.0.8


## Build CycleGAN

In [12]:
# keras layers
from keras.layers import Input, Dense, Reshape, Flatten, Dropout, Concatenate, Lambda, add
from keras.layers import BatchNormalization, Activation, ZeroPadding2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D, Conv2DTranspose
from keras.initializers import RandomNormal

# from keras_contrib
from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization

from keras.models import Sequential, Model

from keras.optimizers import Adam

Using TensorFlow backend.


### Discriminator

For Discriminator:
*  Use PatchGAN - only penalizes the structure at the scale of patches.
* PatchGAN classifies the NxN patch is real or fake
*  They have fewer parameters than the full image discriminator
* PatchGAN are used in [Image to Image translation](https://arxiv.org/pdf/1611.07004.pdf)

In [0]:
# Discriminator layer has the following
#  * Conv2D - filter size: 4x4, strides:2
#  * LeakyReLU
#  * InstanceNormalization
#
def d_layer(layer_input, filters, f_size=4, normalization=True):
  d = Conv2D(filters, kernel_size=f_size, strides=2, padding='same')(layer_input)
  d = LeakyReLU(alpha=0.2)(d)
  if normalization:
      d = InstanceNormalization()(d)
  return d

In [0]:
# build discriminator uses PatchGAN
# Uses the patch to classify the image is fake or real.
# PatchGAN uses 
#  * kernel size 4x4
#  * num filters double at each stage
def build_discriminator(image_shape, num_start_filters=64):
  img = Input(image_shape)
  
  d1 = d_layer(img, num_start_filters, normalization=False)
  d2 = d_layer(d1, num_start_filters*2)
  d3 = d_layer(d2, num_start_filters*4)
  d4 = d_layer(d3, num_start_filters*8)

  validity = Conv2D(1, kernel_size=4, strides=1, padding='same')(d4)
  
  model = Model(img, validity)
  
  # compile model
  model.compile(loss='mse', optimizer=Adam(lr=0.0002, beta_1=0.5), loss_weights=[0.5])
  
  return model

### Generator

Generator can be one of the following two things:

     * Encoder : Decoder combo (UNet - uses skip connections)
     or
     * Encoder : Transformer : Decoder (Uses Residual blocks)
     
 The Encoder shrinks the input image. Uses Conv layers (with strides:2).
 
 The Transformer uses residual blocks
 
 The Decoder expands the image with transpose Conv.
 
 Note: each layer will use LeakyReLU and InstanceNormalization

#### Resnet block

Original paper uses **reflection padding**. Let's use **same** padding for simplicity.

In [0]:
'''
def resnet_block(r_i, layer_output, ks=3, s=1):
    r = Lambda(lambda x: tf.pad(x, [[0,0],[1,1],[1,1],[0,0]],'REFLECT'))(r_i)
    #r = ReflectionPadding2D(padding=(1,1))(r_i)
    r = conv2d(r,layer_output,ks,s,padding= 'VALID')
    r = InstanceNormalization()(r)
    
    r = Lambda(lambda x: tf.pad(x, [[0,0],[1,1],[1,1],[0,0]],'REFLECT'))(r)
    #r = ReflectionPadding2D(padding=(1,1))(r)
    r = conv2d(r,layer_output,ks,s,padding= 'VALID')
    r = InstanceNormalization()(r)
    
    return add([r_i , r])
'''
  
def resnet_block(n_filters, input_layer):
  # first layer convolutional layer
	g = Conv2D(n_filters, (3,3), padding='same')(input_layer)
	g = InstanceNormalization(axis=-1)(g)
	g = Activation('relu')(g)
  
	# second convolutional layer
	g = Conv2D(n_filters, (3,3), padding='same')(g)
	g = InstanceNormalization(axis=-1)(g)
  
	# concatenate merge channel-wise with input layer
	g = Concatenate()([g, input_layer])
	return g


In [0]:
# define  generator model
def build_generator(image_shape, n_resnet=9):
	 
	# image input
	in_image = Input(shape=image_shape)
  
  ## 
	# c7s1-64
	g = Conv2D(64, (7,7), padding='same')(in_image)
	g = InstanceNormalization(axis=-1)(g)
	g = Activation('relu')(g)
  
  ## down sample
	# d128
	g = Conv2D(128, (3,3), strides=(2,2), padding='same')(g)
	g = InstanceNormalization(axis=-1)(g)
	g = Activation('relu')(g)
	# d256
	g = Conv2D(256, (3,3), strides=(2,2), padding='same')(g)
	g = InstanceNormalization(axis=-1)(g)
	g = Activation('relu')(g)
  
	# R256 - resnet blocks
	for _ in range(n_resnet):
		g = resnet_block(256, g)
    
  ## upsample
	# u128
	g = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same')(g)
	g = InstanceNormalization(axis=-1)(g)
	g = Activation('relu')(g)
	# u64
	g = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same')(g)
	g = InstanceNormalization(axis=-1)(g)
	g = Activation('relu')(g)
	# c7s1-3
	g = Conv2D(3, (7,7), padding='same')(g)
	g = InstanceNormalization(axis=-1)(g)
	out_image = Activation('tanh')(g)
  
	# define model
	model = Model(in_image, out_image)
	return model

### Build combined model

In [0]:
## build combined model generator1, discrimator1 and generator2
##
## 1. The cycle loss is given more weightage 10 time more than
##    the adversarial loss.
##
## 2. The identity loss is half the weightage of the cycle loss
##    so, it is 5

def build_combined_model(image_shape, g_model1, d_model, g_model2):
  # update the trainable flag
  g_model1.trainable = True
  d_model.trainable = False
  g_model2.trainable = False
  
  # discriminator elelemnt
  input_gen = Input(shape=image_shape)
  
  gen1_out = g_model1(input_gen)
  output_d = d_model(gen1_out)
  
  # identity element
  input_id = Input(shape=image_shape)
  output_id = g_model1(input_id)
  
  # forward cycle
  output_f = g_model2(gen1_out)
  
  # backward cycle
  gen2_out = g_model2(input_id)
  output_b = g_model1(gen2_out)
  
  #define combined model
  model = Model([input_gen, input_id], [output_d, output_id, output_f, output_b])
  
  #define the optimizer
  opt = Adam(lr=0.0002, beta_1=0.5)
  
  # compile model with weighting of least squares loss and L1 loss
  ## Cycle loss is 10 times more than adv. loss
  ## ident loss is half the cylce loss
  model.compile(loss=['mse', 'mae', 'mae', 'mae'], loss_weights=[1, 5, 10, 10], optimizer=opt)
  return model

## Training

In [0]:
# generate real samples
def gen_real_samples(dataset, n_samples, patch_size):
  # get random image
  idx = np.random.randint(0, dataset.shape[0], n_samples)
  
  X = dataset[idx]
  
  # generate labels as well.
  y = np.ones((n_samples, patch_size, patch_size, 1))
  return X, y

  

In [0]:
# generate fake images from generator
def gen_fake_samples(g_model, images, patch_size):
  # use model predictor
  fake_images = g_model.predict(images)
  
  # mark the labels as zero for fake.
  y = np.zeros((len(fake_images), patch_size, patch_size, 1))
  return fake_images, y

In [0]:
# save models to file
def save_models(epoch, directory, gen_AB, gen_BA):
  file_name1 = 'gen_modelAB_e%03d.hdf5' % (epoch)
  path1 = os.path.join(directory, file_name1)
  
  file_name2 = 'gen_modelBA_e%03d.hdf5' % (epoch)
  path2 = os.path.join(directory, file_name2)
  
  gen_AB.save(path1)
  gen_BA.save(path2)
  return

In [0]:
# sample images at regular intervals to see the progress
def sample_images(epoch, step, gen_model, trainX, prefix, directory, n_samples=5):
  # get the real samples
  X_in, _ = gen_real_samples(trainX, n_samples, 0)
  
  # get the generated / translated images
  X_out, _ = gen_fake_samples(gen_model, X_in, 0)
  
  ## rescale [-1,1] to [0,1]
  X_in = (X_in + 1) / 2.0
  X_out = (X_out + 1) / 2.0
  
  ## generate the plot fig to save.
  ## plt.subplot(nrows, ncols, index)
  n_rows = 2
  n_cols = n_samples
  
  ## original images
  for i in range(n_samples):
    plt.subplot(n_rows, n_cols, i+1)
    plt.axis('off')
    plt.imshow(X_in[i])
    
  ## translated images in the second row
  for i in range(n_samples):
    plt.subplot(n_rows, n_cols, n_samples+i+1)
    plt.axis('off')
    plt.imshow(X_out[i])
    
  ## save the plot
  filename = '%s_plot_e%02d_%04d.png' % (prefix, epoch, step+1)
  path = os.path.join(directory, filename)
  
  plt.savefig(path)
  plt.close()
  return

In [0]:
# define training
def train(n_epochs, n_batch_size, sample_interval, dis_A, dis_B, gen_AB, gen_BA, comb_AB, comb_BA, trainA, trainB):
  # get the patch size
  n_patch = dis_A.output_shape[1]
  
  # number of batches per epoch
  n_bat_per_epoch = int(len(trainA) / n_batch_size)
  
  # total number of steps to go thru
  #n_steps = n_bat_per_epoch * n_epochs
  
  # get the start time
  start_time = datetime.datetime.now()

  for epoch in range(1, n_epochs+1):
    #
    # go thru each step for training
    for i in range(n_bat_per_epoch):
      # get the real samples from both domains
      X_realA, y_realA = gen_real_samples(trainA, n_batch_size, n_patch)
      X_realB, y_realB = gen_real_samples(trainB, n_batch_size, n_patch)

      # get the translataed images for both domains
      X_fakeA, y_fakeA = gen_fake_samples(gen_BA, X_realB, n_patch)
      X_fakeB, y_fakeB = gen_fake_samples(gen_AB, X_realA, n_patch)

      # original paper, updates the pool for fake images. TODO.

      # Train generator_BA
      gA_loss = comb_BA.train_on_batch([X_realB, X_realA], [y_realA, X_realA, X_realB, X_realA])

      #--------------------
      # Train Discriminator A
      #--------------------
      dA_loss_real = dis_A.train_on_batch(X_realA, y_realA)
      dA_loss_fake = dis_A.train_on_batch(X_fakeA, y_fakeA)

      dA_loss = 0.5 * np.add(dA_loss_real, dA_loss_fake)

      # Train generator_AB
      gB_loss = comb_AB.train_on_batch([X_realA, X_realB], [y_realB, X_realB, X_realA, X_realB])

      #--------------------
      # Train Discriminator B
      #--------------------    
      dB_loss_real = dis_B.train_on_batch(X_realB, y_realB)
      dB_loss_fake = dis_B.train_on_batch(X_fakeB, y_fakeB)
      dB_loss = 0.5 * np.add(dB_loss_real, dB_loss_fake)

      # Total disciminator loss
      d_loss = 0.5 * np.add(dA_loss, dB_loss)

      # get the time
      elapsed_time = datetime.datetime.now() - start_time

      if ( i % sample_interval == 0):
        print ("[Epoch %d/%d] [Batch %d] [D loss: %f] [GA loss: %05f, GB loss: %05f ] time: %s " \
                                                                        % ( epoch, n_epochs,
                                                                            i,  
                                                                            d_loss,
                                                                            gA_loss[0],
                                                                            gB_loss[0],                                                
                                                                            elapsed_time))
        
        sample_images(epoch, i, gen_AB, trainA, 'AtoB', '/content/gen_images')
        sample_images(epoch, i, gen_BA, trainB, 'BtoA', '/content/gen_images')
        
    # every epoch takes more than one hour is GPU
    # so save models
    #if ( epoch % 5 == 0):
    save_models(epoch, '/content/drive/My Drive/App/CycleGAN/Faces/models', gen_AB, gen_BA, )
        

In [0]:
# then normalize the data
trainA = rescale_input(young_images)
trainB = rescale_input(old_images)

In [0]:
#trainA.shape
#trainA[1]
image_shape = trainA.shape[1:]

In [25]:
# define generators
gen_AB = build_generator(image_shape)
gen_BA = build_generator(image_shape)

W0807 02:18:59.987335 139849894594432 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0807 02:19:00.035341 139849894594432 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0807 02:19:00.043441 139849894594432 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.



In [26]:
# define discriminators
dis_A = build_discriminator(image_shape)
dis_B = build_discriminator(image_shape)

W0807 02:19:02.131406 139849894594432 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.



In [0]:
# define the combined model
combined_model_AB = build_combined_model(image_shape, gen_AB, dis_B, gen_BA)
combined_model_BA = build_combined_model(image_shape, gen_BA, dis_A, gen_AB)

In [0]:
!mkdir /content/gen_images
!mkdir /content/models

In [0]:
# perform training
n_epochs = 10
n_batch_size = 1
#n_patch_size = dis_A.output_shape[1]
sample_interval = 200

In [34]:
train(n_epochs, n_batch_size, sample_interval,
     dis_A, dis_B,
     gen_AB, gen_BA,
     combined_model_AB, combined_model_BA,
     trainA, 
     trainB)

  'Discrepancy between trainable weights and collected trainable'


[Epoch 1/25] [Batch 0] [D loss: 29.779428] [GA loss: 56.909424, GB loss: 43.933846 ] time: 0:00:04.822698 
[Epoch 1/25] [Batch 200] [D loss: 0.221264] [GA loss: 7.319561, GB loss: 8.447161 ] time: 0:15:50.743221 
[Epoch 1/25] [Batch 400] [D loss: 0.181200] [GA loss: 6.389479, GB loss: 4.979431 ] time: 0:31:36.329928 
[Epoch 1/25] [Batch 600] [D loss: 0.077233] [GA loss: 5.844939, GB loss: 5.453511 ] time: 0:47:21.937351 
[Epoch 1/25] [Batch 800] [D loss: 0.115275] [GA loss: 6.541596, GB loss: 7.084942 ] time: 1:03:12.232367 
[Epoch 1/25] [Batch 1000] [D loss: 0.158471] [GA loss: 5.558513, GB loss: 6.144868 ] time: 1:19:01.240060 
[Epoch 2/25] [Batch 0] [D loss: 0.163456] [GA loss: 4.463553, GB loss: 4.766647 ] time: 1:24:35.127520 
[Epoch 2/25] [Batch 200] [D loss: 0.162577] [GA loss: 5.211803, GB loss: 4.741393 ] time: 1:40:23.742493 
[Epoch 2/25] [Batch 400] [D loss: 0.083734] [GA loss: 3.419703, GB loss: 3.092314 ] time: 1:56:10.838608 
[Epoch 2/25] [Batch 600] [D loss: 0.042899] [G

KeyboardInterrupt: ignored