**Mario Fiorino 1871233 - Project NN 2020**

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Paper : https://arxiv.org/abs/1703.10593

Original implementations : https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix

Bibliographical references for TensorFlow : Nishant Shukla - Machine Learning with TensorFlow

# Introduction to Generative Adversarial Networks (GAN)

Within the machine learning, the GAN is certainly the most interesting idea of the last 10 years, which has shown remarkable results and a wide versatility. Basically, Generative Adversarial Networks consist in two networks, called generator and discriminator: generator tries to riproduce samples from the true data distribution, while the discriminators try to classify a sample, if it comes from true data distribution or is produced by the generator. The ottimal effect of this is: the generator learns to approximate the true distribution completely, and the discriminator is not able to distinguish between the two distribution, so it is left guessing randomly.

 



# Image-to-image translation and the problem of paired data

Image-to-image translation is a class of computer vision , its goal is to learn how transform an image from one representation into another, preserving the semantic of input. Many problems in computer vision can be posed as a translation problem: style transfer,colorizing, semantic segmentation,photo enhancement....

So far the most interesting and promising results obtained in this field come from the application of generative adversarial networks. One of the first works that has used this type of approach was Pix2Pix ( presented by Phillip Isola 2016 https://arxiv.org/abs/1611.07004 ). The model works in this way : the generator receives in  input image A, and  translate it in the domain B, producing "gen_B" . The discriminator is fed with two input: (A, gen_B), and yields the probability that, given A, "gen_B" is the real mapping of A in the domain B . 

Pix2Pix has produced very good results, but each image from the  first domain must have a corresponding image in the second domain: the pair (A,B). So here the challenge is in training dataset : a paired training dataset is often more difficult and expensive to obtain than an unpaired one, and for many tasks, it is just not feasible! This is where the CycleGAN comes in.

# Purpose and key idea of CycleGAN


This paper, exploiting the adversarial training, present a method that take the characteristics of one image domain and translated these into another image domain (preserving the semantic of input), all in the absence of any paired training examples.

Basically the model consists:

– two generators: G_AtoB and G_BtoA to translate images of two domain, A → B and vice versa.

– two discriminators: D_A and D_B that learns to differentiate between real image to generated image.

The key idea is: given in input the image A , applying the generator G_AtoB that generate the image: " G_AtoB(A) ". Pass this on second generator G_BtoA that yield the image: "G_BtoA(G_AtoB(A))" , that is expected to be similar to input A. Schematically:

A -> G_AtoB(A) -> G_BtoA (G_AtoB(A)) ≈ A

B -> G_BtoA (B) -> G_AtoB(G_BtoA (B)) ≈ B

A sort of language translation: translating a sentence from Italian to French, and then translates it back, the resulting is expected to be very close to original sentence.

In [0]:
# Import TensorFlow and other libraries

import numpy as np
import tensorflow as tf
import random
import collections
from glob import glob
import matplotlib.pyplot as plt
from imageio import imread
from PIL import Image
import os 
from functools import partial
import imgaug as ia
import imgaug.augmenters as iaa



In [0]:
from google.colab import drive
drive.mount('/content/drive')

In [0]:
# Activation function
def relu(x):
    return tf.nn.relu(features=x) # rectif : R -> max(O,x)
def leakyRelu(x):
    return tf.nn.leaky_relu(features=x) # leaky rectif : R -> max(0.1x,x)
def tanh(x):
    return tf.nn.tanh(x=x)  # tanh : R -> [-1,1]
 
    
# Add
def add(x,y):
    return tf.add(x,y)


# Convolution and De-covoluton
def conv2d(x, filter, kernel, stride, padding):
    return tf.layers.conv2d(inputs=x, filters=filter, kernel_size=kernel, strides=stride, padding=padding) 
def conv2dTranspose(x, filter, kernel, stride, padding):
    return tf.layers.conv2d_transpose(inputs=x, filters=filter, kernel_size=kernel, strides=stride, padding=padding, use_bias=False)


# Normalization
# The basic idea behind the normalization is to limit internal covariate shift 
# This, allows each layer to learn on a more stable distribution of inputs, and accelerate the training of the network. A recent paper claims 
# it is not effective because it reduces internal covariate shift, but because it makes the error function more smooth.
def batchNormalization(x):
    return tf.layers.batch_normalization(inputs=x, axis=3, momentum=0.9, epsilon=1e-5)
#Instance normalization : works similar to batch normalization, but the parameters (mean and variance) are calculated for the single channel, rather than for the entire batch.
def instanceNormalization(x):
    return tf.contrib.layers.instance_norm(inputs=x, center=True, scale=True)

#  Discriminator’s network 
---

Input: image  (256,256, 3) -> output: label image real/generated (32,32, 1)

----
           

Convolutional Neural Networks 

The  discriminator is classically a convolutional network that categorize the images fed to it.
A convolutional layer uses small(compared to the input) filters , or also called kernel matrix, which are convolved with the image. For each convolutional step,the dot product of the kernel with the overlapping part of the image is computed. The value ​​obtained goes into the feature map of the filter. Performing multiple convolutions on an input, each using a different filter, producing multiple and different feature maps. The weights of the filter are parameters which are learned while training.

 

A mathematical formulation of 2D convolution is ginven:

input image: x[i, j]

kernel: w[i, j]

output convolution  :  ∑_k1∑_k2( w[k1, k2] · x[i - k1, j - k2] ) 

Note:
The area on input where the convolution operation takes place is called the receptive field.

_

PatchGAN, the model used

Instead of classifying the entire image, PatchGAN
ranks if N × N portion of the input image are real or not.This is
done by stacking convolutional layers after each other, producing a prediction matrix, such that every value of the output has a receptive field of N × N. Each of these values maps to a different N×N patch of the input image. 

Note :  patchGAN discriminators is fully convolutional, so to process images of different sizes.

*

Basically the network works in this way:

Extract features from the image: 


*   1 block :  Convolut -> LeakyReLU 
*   3 hidden block : Convolut -> Instance normaliz -> LeakyReLU 

Decisional output: 

*   last block :  Convolut 

-

Layers : [Number of filters x  size] stride

* Convol:   [64x4x4]s2 -> [128x4x4]s2 ->[256x4x4]s2 ->[512x4x4]s1 -> [1x4 x4]s1

Total Trainable params  ≅  6.9 · 10^6

-











In [0]:
def build_discriminator(image,name): 
       
    with tf.variable_scope(name+ '_discriminator', reuse=tf.AUTO_REUSE ):
        # 1st Convolutional block
        l_conv1 = leakyRelu(conv2d(image, 64, 4, 2, "same")) # input = image 
                                                             # number of filters = 64
                                                             # size kernel 4x4
                                                             # stride = 2 is the number of pixels by which the filter moves across the image_shifts every time. 
                                                             # “same” padding specifies that the output size should be the same as the input size; , there is a one-pixel-width padding around the image, and the filter slides outside the image into this padding area
              
        # 3 Hidden Convolution blocks
        hidd_conv1 = leakyRelu(instanceNormalization(conv2d(l_conv1, 128, 4, 2, "same")))
        hidd_conv2 = leakyRelu(instanceNormalization(conv2d(hidd_conv1, 256, 4, 2, "same"))) 
        hidd_conv3 = leakyRelu(instanceNormalization(conv2d(hidd_conv2, 512, 4, 1, "same")))
               
        # Last layer
        output = conv2d(hidd_conv3, 1, 4, 1, "same")
        
        return output

# Generator's network 
-----

Input: image in domain (256,256, 3) -> output: image in opposite domain (256,256, 3)

----
The generator is trained to generate data according to the true data distribution of the training data set. In the field of artificial neural networks , there are several different models which are capable of such tasks, e.g. restricted Boltzmann machines, variational autoencoders,auto-regressive networks,...

Due to the peculiarity of the goal of paper, it was used an autoencoder that has special internal structure: it map
an image to itself via an intermediate representation that
is a translation of the image into another domain. The authors write " ... can also be seen as a special case of adversarial autoencoders, which use an adversarial loss to train the
bottleneck layer of an autoencoder to match an arbitrary target distribution. In our case, the target distribution for the
X → X autoencoder is that of the domain Y ".

To be clear, the model uses a sequence of downsampling convolutional blocks to encode the input image, a number of residual network convolutional blocks to transform the image ( translation of the image into another domain), and a number of upsampling convolutional blocks to generate the output image.


*


Basically the network works in this way:

*Encoder*

*  3 block : Convolut -> Instance normaliz -> ReLU

*Residul Block*

*   use 9 residual blocks  (for 256x256 pixel training images)


*Decoder*

*  2 block :  Transposed Conv -> Instance normaliz  -> ReLU 
*  last one :  Convolut -> Tang_hyper


-

Layesr : [Number of filters x Kernel size] stride





* Convol: [64x7x7]s1 -> [128x3x3]s2 -> [256x3x3]s2

* 9 ( Res-Block : [256x3x3]s1-> [256x3x3]s1 )

* De-convol: [128x3x3]s2 -> [64x3x3]s2 -> Convolution: [3x7x7]s1

Total Trainable params  ≅ 35 · 10^6 














# Residual network


----

Input: feature vector/encoding (64,64,256) -> output: feature vector/encoding (64,64,256)

-----


The basic concept of residual block:

Processing an input 𝑥 in the sequence : " Convol ->ReLU -> Convol " obtaining a certain F(x), and then add this result to the same x. So the final output is H(x) = F(x) + x.  (in this way the output do “not change much” from original input).

NOTE: In a traditional forward CNN instead would have H(x) = F(x)



In [0]:
def residual_block(x): 
    
    res = tf.pad(x, [ [0, 0], [1, 1], [1, 1], [0, 0] ], "REFLECT")  # Reflection padding was used to reduce artifacts.
    res = relu( batchNormalization( conv2d(res, 256, 3, 1, "valid") ) )
    res = tf.pad(res, [ [0, 0], [1, 1], [1, 1], [0, 0] ], "REFLECT" )
    res = batchNormalization( conv2d(res, 256, 3, 1, "valid") )
    return add(res, x)



def build_generator(image, name): 
  
    with tf.variable_scope(name+'_generator', reuse=tf.AUTO_REUSE):
        input = tf.pad(image, [[0, 0], [3, 3], [3, 3], [0, 0]], "CONSTANT")  
    
        # Downsampling
        h_conv1 = relu( instanceNormalization( conv2d(input, 64, 7, 1, 'valid') ) ) 
        h_conv2 = relu( instanceNormalization( conv2d(h_conv1, 128, 3, 2, 'same') ) )
        h_conv3 = relu( instanceNormalization( conv2d(h_conv2, 256, 3, 2,'same') ) )
        
        # Residual blocks
        residual1 = residual_block(h_conv3)
        residual2 = residual_block(residual1)
        residual3 = residual_block(residual2)
        residual4 = residual_block(residual3)
        residual5 = residual_block(residual4)
        residual6 = residual_block(residual5)
        residual7 = residual_block(residual6)
        residual8 = residual_block(residual7)
        residual9 = residual_block(residual8)
           
        # Upsampling blocks
        h_deconv4 = relu( instanceNormalization( conv2dTranspose(residual9, 128, 3, 2, 'same') ) ) 
                                               # Conv2dTransp - namely : deconvolution...
        h_deconv5 = relu( instanceNormalization( conv2dTranspose(h_deconv4, 64, 3, 2, 'same') ) )
       

        # Last Convolution layer
        output = tanh( conv2d(h_deconv5, 3, 7, 1, 'same') )  # Reconstruction image opposite domain
        
        
        return output 

# Load data and image augmentation procedure

The images are fished randomly ( by uniform distribution) from the respective dataset folders. Then is applied a procedure for image augmentation: 

random cropping (50% probability to apply) 

flipping left/right(50% probability to apply).

 


 



# Dataset

•	The dataset uses for experimental phase is «vangogh2photo».
 Downloaded from: people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets.

•	Content RGB images of Van Gogh paintings and photo of different landscape in different angulation, lights, contexts.
 With resolution 256 x 256 pixels. 

•	The training set size of each class: 400 (Van Gogh) and 6287(landscape).

•	Test set size : 400 (VG) and 750 (land)


In [0]:
def Proc_image(data_dir,imagesA,imagesB):

    A = np.random.choice(imagesA)  # randomly from dataset A
    B = np.random.choice(imagesB)  # randomly from dataset B
  
    A = np.array(Image.fromarray(imread(A, pilmode='RGB')).resize((256, 256)))   
    B = np.array(Image.fromarray(imread(B, pilmode='RGB')).resize((256, 256)))

    
    # Cropped  A
    if np.random.random() > 0.5:
         dc =  random.uniform(0.0, 0.2) 
         crop = iaa.Crop(percent=(0, dc)) 
         A = crop.augment_image(A)
    
    # Flip in the left/right direction 
    if np.random.random() > 0.5:
        A = np.fliplr(A) 

    #Cropped B
    if np.random.random() > 0.5:
         dc =  random.uniform(0.0, 0.1)
         crop = iaa.Crop(percent=(0, dc)) 
         B = crop.augment_image(B)
    
    # Flip in the left/right direction 
    if np.random.random() > 0.5:
        B = np.fliplr(B)     

    # Normaliz
    A =  (A/ 127.5) - 1.0
    B =  (B/ 127.5) - 1.0
     
    return A , B 


# Save training summary (Tensorboard)
def Train_summary(writer,T_name, value, global_step): 
    summary = tf.Summary(value=[tf.Summary.Value(tag=T_name, simple_value=value)])
    writer.add_summary(summary, global_step=global_step)



# Loss functions


Discriminator adversarial loss 

The discriminator must be trained in such a way as to recognize the original images and reject those generated. 
So, to be clearer, taken the case of the discriminator D_A  ( the same procedure goes for D_B  as well ), showing with "D_A(image input)" his output, the operations for train the network :

1.	minimize   *E* [(D_A (A) – 1)^2] ; namely : given in input the original image A the discriminator should recognize it ,getting a value close to 1 (means " true"). 

2.	minimize   *E* [( D_A (G_BtoA (B) )^2] ; namely :  the discriminator should predict 0 (" false") for the " generated images A " produced by the generator G_BtoA.





Generator adversarial loss 

Generator should eventually be able to fool the discriminator about the authenticity of it's generated images. If the generator is performing well, the discriminator will classify the fake images as real (or 1). So:


1. minimize  *E* [(D_A (G_BtoA (B) - 1 )^2] ; namely: evaluation of discriminator for the generated image "G_BtoA (B)" is as close as possible to 1 



The two procedures above are typical of the GAN;  the one that follows, instead, characterizes the cycleGAN ; ***to define a meaningful mapping ,in absent of unpaired dataset, the authors introduce the constraint of cycle consistency.***

Cyclic loss:
1.	minimize  *E* [ | G_AtoB(G_BtoA (B)) – B | ] + *E* [ | G_BtoA (G_AtoB(A)) – A | ] ; namely : the difference between the original image and the cyclic generate  image should be as small as possible


.



Identity loss:

Identity loss says that, if you fed image B to generator G_AtoB , it should yield the real image B  or something close to image B. 

1.   minimize  *E* [ | G_AtoB(B) – B | ] + *E* [ | G_BtoA(A) – A | ]

.


Note, have been used : 

Mean squared error for the scalars (discriminator probabilities)

Mean absolute error for images ( reconstructed and identity-mapped)




# Optimization algorithm




The method of gradient descent is used to change the weights in the direction of the minimum of the loss function. To apply gradient descent we need the gradient of the loss function with respect
to the weights of every layer: dE/dWn( in simple word : We want to know how much a change in Wn affects the total error).The backpropagation algorithm uses the chain
rule to compute this gradient. The chain rule is a formula to compute the derivative of a composite function; simplifying the process : derivative of the error wrt the activation * derivative of the activation wrt net input * derivative of the net input with respect to  weight n. By this way, the weights of each layer can be update  reducing,gradually ,total error. 
 Easy update equation : Wn = Wn - learning rate * dE/dWn .

*


Adam (Adaptive moment estimation)

" Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods". Kingma, Ba

In my code ,the process of optimizing of the neural network uses algorithm Adam, it combines the advantages of two stochastic gradient descent: AdaGrad and RMSProp.

AdaGrad adapt the learning rate at each time step for every parameter, based on the gradient of the previous steps ( it is well-suited for sparse data):

Wn = Wn - ( lr /  √ G + ε ) * g

For the readable of the formula,  using notation g for gradient dE/dWn at current time step. 


G is the sum of the squares of the gradients wrt Wn up to the present time step. Note : this factor attenuates the learning rate of parameters subject to abrupt updates, and gives greater importance to the learning rate of parameters with infrequent updates.

ε is a smoothing term that avoids division by zero, usually on the order of 10^−8

In AdaGrad, since G is positive, it continues to grow with each iteration, and the learning rate decreases to 0, leading to an arrest of learning. The solution proposed for this problem, in RMSProp,  was redefine G as an exponential moving average at time t :

 E[g^2]_t =  γ E [ g^2]_t-1  +  (1-γ) g^2.


γ determines the importance  of past iterations in calculating the moving average: a value cloe to zero means that past iterations are ignored and only the gradient to the current iteration is used


Adam, in addition to storing an exponentially decaying average of past squared gradients E[g^2],  also keeps an exponentially decaying average of past gradients E[g]. To be precise:

m_t = E[g]_t   = β1 * m_t-1 + (1-β1)g

v_t = E[g^2]_t =  β2 * v_t-1 + (1-β2)g^2

m_t and v_t are estimates of the first moment and the second moment of the gradients. Since both are initialized to 0, their values tend to remain very small, especially in the early stages of training; therefore,the correct estimators  are :

m_t = m_t / 1 - β1

v_t = v_t / 1 - β2

Adam update rule :  
Wn = Wn - (lr / √ v_t + ε ) * m_t

*

In Tensorflow for the training op to minimize the loss function it is possible to use a predefined Optimizer : tf.train.AdamOptimizer().minimize(loss)




In [0]:
def CycleGAN(): 

    
    generatorAToB = partial(build_generator, name='AToB')
    generatorBToA = partial(build_generator, name='BToA')
        
    discriminatorA = partial(build_discriminator, name='A')
    discriminatorB = partial(build_discriminator, name='B')


    # Define input in flow graph    
    real_imageA = tf.placeholder("float", shape=[None, 256, 256, 3], name="Image_real_A")    
    real_imageB = tf.placeholder("float", shape=[None, 256, 256, 3], name="Image_real_B")
    
    # Generated images using both of the generator networks
    simulate_imageA = generatorBToA(real_imageB)
    simulate_imageB = generatorAToB(real_imageA)
    
    # Reconstruct images back to original images
    reconstructedA = generatorBToA(simulate_imageB)
    reconstructedB = generatorAToB(simulate_imageA)
    
    # Same generated for identity loss
    sameB_gen = generatorAToB(real_imageB)
    sameA_gen = generatorBToA(real_imageA)
    
     # Discriminator
    decision_r_A = discriminatorA(real_imageA)  
    decision_s_A = discriminatorA(simulate_imageA)

    decision_r_B = discriminatorB(real_imageB)
    decision_s_B = discriminatorB(simulate_imageB)

 

    with tf.variable_scope("LossDiscriminatorA"):
        dA_loss_real = tf.losses.mean_squared_error(labels= decision_r_A, predictions=tf.ones_like(decision_r_A)) 
        # Note : array of ones,since these are the real images
        dA_loss_gBtoA = tf.losses.mean_squared_error(labels= decision_s_A, predictions=tf.zeros_like(decision_s_A))
        # Note : array of zeros,since these are the fake images
        dA_loss =  (dA_loss_real + dA_loss_gBtoA)   # Total discriminator A
       
    with tf.variable_scope("LossDiscriminatorB"):
        dB_loss_real = tf.losses.mean_squared_error(labels=decision_r_B, predictions= tf.ones_like(decision_r_B))
        dB_loss_gAtoB = tf.losses.mean_squared_error(labels=decision_s_B, predictions=tf.zeros_like(decision_s_B))
        dB_loss = (dB_loss_real + dB_loss_gAtoB)

    with tf.variable_scope('LossCyclic'):    
        # Cyclic loss
        cyc_loss_A = tf.losses.absolute_difference(real_imageA, reconstructedA)
        cyc_loss_B = tf.losses.absolute_difference(real_imageB, reconstructedB) 
        cyc_loss = cyc_loss_A + cyc_loss_B 

    with tf.variable_scope('IdentityLoss'):    
        # Identity loss
        ide_loss_A = tf.losses.absolute_difference(real_imageA, sameA_gen)
        ide_loss_B = tf.losses.absolute_difference(real_imageB, sameB_gen)  
       
             
    with tf.variable_scope('LossGeneratorAtoB'):
        g_loss_AtoB = tf.losses.mean_squared_error(labels=decision_s_B, predictions=tf.ones_like(decision_s_B))
        # Total generator loss  = adversarial loss + cycle loss + identity
        gAtoB_loss = g_loss_AtoB + cyc_loss * 10.0 + ide_loss_B * 4.0
        # Note : The multiplicative factor of 10 for cyc_loss assigns more importance to cyclic loss, 
        #        for identity loss is 4 
          
    with tf.variable_scope('LossGeneratorBtoA'):  
        g_loss_BtoA = tf.losses.mean_squared_error(labels=decision_s_A, predictions=tf.ones_like(decision_s_A))
        gBtoA_loss  = g_loss_BtoA + cyc_loss * 10.0 + ide_loss_A * 4.0 

    with tf.variable_scope("Train"):    
        dA_vars = [var for var in tf.trainable_variables() if 'A_discriminator' in var.name]
        dB_vars = [var for var in tf.trainable_variables() if 'B_discriminator' in var.name]
        gAtoB_vars = [var for var in tf.trainable_variables() if 'AToB_generator' in var.name]
        gBtoA_vars = [var for var in tf.trainable_variables() if 'BToA_generator' in var.name]
        
        adam = tf.train.AdamOptimizer(learning_rate=0.0002, beta1=0.5)  
        
        train_dA = adam.minimize(dA_loss, var_list=dA_vars)  
        train_dB = adam.minimize(dB_loss, var_list=dB_vars)
        train_gAtoB = adam.minimize(gAtoB_loss, var_list=gAtoB_vars) 
        train_gBtoA = adam.minimize(gBtoA_loss, var_list=gBtoA_vars)
    
    return real_imageA, real_imageB, simulate_imageA, simulate_imageB, reconstructedA, reconstructedB, dA_loss, dB_loss, gAtoB_loss, gBtoA_loss, train_dA, train_dB, train_gAtoB, train_gBtoA



# CycleGAN  training algorithm 


0: Initialisation phase

for number of training iterations do:

for batch size do:

1:  Sample a image from distribution p_data_A

2:  Sample a image from distribution p_data_B 

3: Generate  prediction  :  A →  G_AtoB(A)
 
4: Generate  prediction  :  B → G_BtoA(B)

5: Generate reconstructed sample A :  G_AtoB(A) →  G_BtoA ( G_AtoB(A) )

6: Generate reconstructed sample B : G_BtoA(B)  →  G_AtoB ( G_BtoA(B) )

7: Optimizing G_AtoB network by Adam computing 'LossGeneratorAtoB' : adversarial generator loss + 10.0 cycle loss + 4.0 identity loss  


8: Optimizing D_B network by Adam computing  'LossDiscriminatorB' : adversarial discriminator loss


9:  Optimizing  G_BtoA network by Adam computing 'LossGeneratorBtoA' : adversarial generator loss + 10.0 cycle loss + 4.0 identity loss

10: Optimizing D_A network by Adam computing 'LossDiscriminatorA' : adversarial discriminator loss

end for

end for. 



NOTE : Rows from 3 to 6 are opportunely recalculated for loss functions in each Opt phase; and not only once time, as the algorithm scheme may suggest.


In [0]:
epochs = 701

data_dir = "/content/drive/My Drive/vangogh2photo"
log_dir =  "/content/drive/My Drive/logs"
model_path = "/content/drive/My Drive/model/model.ckpt"
meta_graph_path = "/content/drive/My Drive/meta/model.ckpt.meta"
checkpoint_save_path = "/content/drive/My Drive/meta/model.ckpt"


#Load dataset training
imagesA =  glob(data_dir + '/trainA/*.*')
imagesB =  glob(data_dir + '/trainB/*.*')
TA =len(imagesA)
TB =len(imagesB)
print('Number of files in dataset training A and B ',TA,' and ', TB)
print('')

tf.reset_default_graph()
g = tf.Graph()

with g.as_default():  
    real_imageA, real_imageB, simulate_imageA, simulate_imageB, reconstructedA, reconstructedB, dA_loss, dB_loss, gAtoB_loss, gBtoA_loss, train_dA, train_dB, train_gAtoB, train_gBtoA = CycleGAN()
    saver = tf.train.Saver() 
    
with tf.Session(graph=g) as sess:
     # Restoring process
     if tf.train.latest_checkpoint('/content/drive/My Drive/meta/'):
        print("Restoring model")
        saver.restore(sess, tf.train.latest_checkpoint('/content/drive/My Drive/meta/'))
     else :   
        print('Initializating')
        sess.run(tf.local_variables_initializer())
        sess.run(tf.global_variables_initializer()) 
    
     train_writer = tf.summary.FileWriter(log_dir, sess.graph)    
     print('') 

     # Number of images processed for epoch
     bs = int(TA/4)    #Dataset A : len(imagesA) = 400 painting, so bs = 100 painting processed each epoch   
      
     for epoch in range(600,epochs):
        print("Epoch:{}".format(epoch))

        for nb in range(bs):   

            # Image A and B random fished by their respective dataset and then processed : image augmentation techniques              
            imageA, imageB = Proc_image(data_dir,imagesA,imagesB)
                                               

            # Optimizing:  
            
            #G_AtoB network 
            gAtoB_loss_val, _ = sess.run([gAtoB_loss, train_gAtoB], feed_dict={real_imageA:imageA.reshape(1,256,256,3), real_imageB:imageB.reshape(1,256,256,3)})
            
            #Discriminator B 
            d_B_loss_val, _ = sess.run([dB_loss, train_dB], feed_dict={real_imageA:imageA.reshape(1,256,256,3), real_imageB:imageB.reshape(1,256,256,3)})
                       
            # G_BtoA network
            gBtoA_loss_val, _ = sess.run([gBtoA_loss, train_gBtoA], feed_dict={real_imageA:imageA.reshape(1,256,256,3), real_imageB:imageB.reshape(1,256,256,3)})
            
            #Discriminator A 
            d_A_loss_val, _ = sess.run([dA_loss, train_dA], feed_dict={real_imageA:imageA.reshape(1,256,256,3), real_imageB:imageB.reshape(1,256,256,3)})
            
            
           
        # Save losses to Tensorboard after each epoch
        Train_summary(train_writer, "gAtoB_loss", gAtoB_loss_val, epoch)
        Train_summary(train_writer, "gBtoA_loss", gBtoA_loss_val, epoch)
        Train_summary(train_writer, "dA_loss", d_A_loss_val, epoch)
        Train_summary(train_writer, "dB_loss", d_B_loss_val, epoch)
        
        print('gAtoB_loss  ',round(gAtoB_loss_val,2),'      gBtoA_loss  ',round(gBtoA_loss_val,2),'  |   dB_loss  ',round(d_B_loss_val,2),  '      dA_loss   ',round(d_A_loss_val,2))
        print('')
                      
            
        # Save after every ... epochs
        if epoch % 100 == 0: 
            save_path = saver.save(sess, checkpoint_save_path)
                              
     save_path = saver.save(sess, model_path)
     train_writer.close() 
     print()
     print('End of training')


Number of files in dataset training A and B  400  and  6287

Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
Instructions for updating:
Please use `layer.__call__` method instead.
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
Use keras.layers.BatchNormalization instead.  In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.batch_normalization` documentation).
Instructions for updating:
Use `tf.keras.layers.Conv2DTranspose` instead.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Restoring model
INFO:tensorflow:Restoring para

In [0]:
%load_ext tensorboard 
%tensorboard --logdir "/content/drive/My Drive/logs"