# Final Submission - Gender Swap
## Adam Fábry, Dominik Feješ

### Motivation
The motivation behind this project is that a lot of things surounding photos can be achieved with neural networks. One particular thing that can be easily achieved is a gender swap. The idea behind this project is that it might be fun to experiment with a neural network trying to change the appearence of people's faces, as well as seeing how would people look if they were the opposite gender.

### Related work
One very similar task was discussed in a paper: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks by Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros at Berkeley AI Research (BAIR) laboratory, UC Berkeley. The authors present way to translate source image into desired output without paired examples, which is very similar to the problem we are trying to solve, as in their paper they had source images that had to be translated to a similar image with different details (which in our case will be the gender swap).

### Datasets
The internet is filled with a lot of images and datasets consisting of male and female faces. Most datasets that distinguish the difference between male and female faces are the datasets that are used for gender classification. We are considering using the following datasets:

CelebA: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
Men/Women Classification Dataset: https://www.kaggle.com/playlist/men-women-classification
IMDB-WIKI: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/

### High-level solution proposal
We are not really sure how we are going to solve this project, but we have a general idea. The first one is using GANs (Generative neural networks).

# Architecture
We have used CycleGAN for GenderSwap. The architecture is comprised of four models, two discriminator models, and two generator models.

![Diagram](../img/GanDiagram.png)

![Diagram](../img/GanDiagram2.png)

## Implementation

The <b>discriminator</b> is a deep convolutional neural network that performs image classification. It predicts the likelihood of wheter the input image is real or fake image. We use 2 discriminator models, one for domainA - male photos, and one for domainB - female photos.

In [None]:
from keras.initializers import RandomNormal
from keras.models import Input
from keras.models import Model
from keras.layers import Conv2D
from keras.layers import LeakyReLU
from keras.layers import Activation
from keras.layers import Concatenate
from keras.layers import Conv2DTranspose
from keras.optimizers import Adam
from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization

def discriminator(image_shape):
    init = RandomNormal(stddev=0.02)
    in_image = Input(shape=image_shape)
    
    layer = Conv2D(64, (4,4), strides=(2,2),padding='same',kernel_initializer=init)(in_image)
    layer = LeakyReLU(alpha=0.2)(layer)

    layer = Conv2D(128, (4,4), strides=(2,2),padding='same',kernel_initializer=init)(layer)
    layer = InstanceNormalization(axis=-1)(layer)
    layer = LeakyReLU(alpha=0.2)(layer)

    layer = Conv2D(256, (4,4), strides=(2,2),padding='same',kernel_initializer=init)(layer)
    layer = InstanceNormalization(axis=-1)(layer)
    layer = LeakyReLU(alpha=0.2)(layer)

    layer = Conv2D(512, (4,4), strides=(2,2),padding='same',kernel_initializer=init)(layer)
    layer = InstanceNormalization(axis=-1)(layer)
    layer = LeakyReLU(alpha=0.2)(layer)

    layer = Conv2D(512, (4,4), padding='same',kernel_initializer=init)(layer)
    layer = InstanceNormalization(axis=-1)(layer)
    layer = LeakyReLU(alpha=0.2)(layer)

    patch_out = Conv2D(1,(4,4), padding='same', kernel_initializer=init)(layer)

    model = Model(in_image, patch_out)

    model.compile(loss='mse', optimizer=Adam(lr=0.0002,beta_1=0.5),loss_weights=[0.5])
    
    return model

The <b>generator</b> takes care of generating the target image (for example generating a female photo from male photo). The generator will generate new fake images, that will be fed to descriminator mentioned above.

In [None]:
def resnet_block(n_filters, input_layer):
    init = RandomNormal(stddev=0.02)

    layer = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(input_layer)
    layer = InstanceNormalization(axis=-1)(layer)
    layer = Activation('relu')(layer)

    layer = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(layer)
    layer = InstanceNormalization(axis=-1)(layer)

    layer = Concatenate()([layer, input_layer])

    return layer

def generator(image_shape, n_resnet=9):
    init = RandomNormal(stddev=0.02)
    in_image = Input(shape=image_shape)

    layer = Conv2D(64, (7,7), padding='same', kernel_initializer=init)(in_image)
    layer = InstanceNormalization(axis=-1)(layer)
    layer = Activation('relu')(layer)

    layer = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(layer)
    layer = InstanceNormalization(axis=-1)(layer)
    layer = Activation('relu')(layer)

    layer = Conv2D(256, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(layer)
    layer = InstanceNormalization(axis=-1)(layer)
    layer = Activation('relu')(layer)

    for _ in range(n_resnet):
        layer = resnet_block(256,layer)
    
    layer = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(layer)
    layer = InstanceNormalization(axis=-1)(layer)
    layer = Activation('relu')(layer)

    layer = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(layer)
    layer = InstanceNormalization(axis=-1)(layer)
    layer = Activation('relu')(layer)

    layer = Conv2D(3, (7,7), padding='same', kernel_initializer=init)(layer)
    layer = InstanceNormalization(axis=-1)(layer)
    out_image = Activation('tanh')(layer)

    model = Model(in_image, out_image)
    return model


The generator models are trained with the associated discriminator. They are trying to generate an image, that will predicted as real by the descriminator.

In [None]:
def composite_model(g_model_1,d_model,g_model_2,image_shape):
    g_model_1.trainable = True
    d_model.trainable = False
    g_model_2.trainable = False

    input_gen = Input(shape=image_shape)
    gen1_out = g_model_1(input_gen)
    output_d = d_model(gen1_out)

    input_id = Input(shape=image_shape)
    output_id = g_model_1(input_id)

    output_f = g_model_2(gen1_out)

    gen2_out = g_model_2(input_id)
    output_b = g_model_1(gen2_out)

    model = Model([input_gen, input_id], [output_d, output_id, output_f, output_b])

    opt = Adam(lr=0.0002, beta_1=0.5)

    model.compile(loss=['mse', 'mae', 'mae', 'mae'], loss_weights=[1, 5, 10, 10], optimizer=opt)
    return model

Working with data:

In [None]:
from numpy import load
from numpy import ones
from numpy import zeros
from numpy import asarray
from numpy.random import randint
from matplotlib import pyplot
from os import listdir
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import load_img


def load_dataset(path):
    entriesA = listdir(path+'male/')
    entriesB = listdir(path+'female/')
    domainA = list()
    domainB = list()
    print('Loading male images')
    lenA = len(entriesA)
    lenB = len(entriesB)
    for e in entriesA:
        img = load_img(path+'male/'+e)
        img = img_to_array(img)
        domainA.append(img)
    print('Loading female images')
    for e in entriesB:
        img = load_img(path+'female/'+e)
        img = img_to_array(img)
        domainB.append(img)
    return asarray(domainA), asarray(domainB)

def load_dataset_test(path):
    entriesA = listdir(path+'testMale/')
    entriesB = listdir(path+'testFemale/')
    domainA = list()
    domainB = list()
    print('Loading male test images')
    lenA = len(entriesA)
    lenB = len(entriesB)
    for e in entriesA:
        img = load_img(path+'testMale/'+e)
        img = img_to_array(img)
        domainA.append(img)
    print('Loading female test images')
    for e in entriesB:
        img = load_img(path+'testFemale/'+e)
        img = img_to_array(img)
        domainB.append(img)
    return asarray(domainA), asarray(domainB)

def generate_real_samples(dataset,n_samples, patch_shape):
    ix = randint(0, dataset.shape[0],n_samples)
    X = dataset[ix]
    Y = ones((n_samples,patch_shape,patch_shape,1))
    return X, Y

def generate_fake_samples(g_model, dataset, patch_shape):
    X = g_model.predict(dataset)
    Y = zeros((len(X),patch_shape,patch_shape,1))
    return X, Y

def save_models(epoch, g_model_AtoB, g_model_BtoA):
    path = '../../models/'
    AtoB = 'g_model_AtoB_%03d.h5' % (epoch+1)
    BtoA = 'g_model_BtoA_%03d.h5' % (epoch+1)
    g_model_AtoB.save(path+AtoB)
    g_model_BtoA.save(path+BtoA)

def summarize_performance(epoch, g_model, trainX, name, n_samples=5):
    path = '../../performance/'
    X_in, _ = generate_real_samples(trainX, n_samples, 0)
    X_out, _ = generate_fake_samples(g_model, X_in, 0)
    X_in = (X_in + 1) / 2.0
    X_out = (X_out + 1) / 2.0
    for i in range(n_samples):
        pyplot.subplot(2, n_samples, 1 + i)
        pyplot.axis('off')
        pyplot.imshow(X_in[i])
    for i in range(n_samples):
        pyplot.subplot(2, n_samples, 1 + n_samples + i)
        pyplot.axis('off')
        pyplot.imshow(X_out[i])
    filename1 = '%s_generated_plot_%03d.png' % (name, (epoch+1))
    pyplot.savefig(path+filename1)
    pyplot.close()

In [None]:
def update_image_pool(pool, images, max_size=50):
    selected = list()
    for image in images:
        if len(pool) < max_size:
            pool.append(image)
            selected.append(image)
        elif random() < 0.5:
            selected.append(image)
        else:
            ix = randint(0, len(pool))
            selected.append(pool[ix])
            pool[ix] = image
    return asarray(selected)

The train() function takes all six models (two discriminators, two generators, and two composite models) as arguments along with the dataset and trains the models.</p>
The train() funciton uses the step() function (where the training itself is done) on all six models for a number of times (number of epochs * number of steps in an epoch), and after each epoch, performance of generators is summarized as an .png file (5 examples of MtoF transformation and vice versa) alongside a saved model.

In [None]:
def train(d_model_A, d_model_B, g_model_AtoB, g_model_BtoA, c_model_AtoB, c_model_BtoA, domainA, domainB):
	n_epochs, n_batch, = 35, 1
	n_patch = d_model_A.output_shape[1]
	#trainA, trainB = dataset
	poolA, poolB = list(), list()
	n_steps = int(len(domainA) / n_batch)

	for i in range(n_epochs):
		print('Epoch:',i)
		for j in range(n_steps):
			X_realA, y_realA = generate_real_samples(domainA, n_batch, n_patch)
			X_realB, y_realB = generate_real_samples(domainB, n_batch, n_patch)

			X_fakeA, y_fakeA = generate_fake_samples(g_model_BtoA, X_realB, n_patch)
			X_fakeB, y_fakeB = generate_fake_samples(g_model_AtoB, X_realA, n_patch)

			X_fakeA = update_image_pool(poolA, X_fakeA)
			X_fakeB = update_image_pool(poolB, X_fakeB)

			g_loss2, _, _, _, _  = c_model_BtoA.train_on_batch([X_realB, X_realA], [y_realA, X_realA, X_realB, X_realA])

			dA_loss1 = d_model_A.train_on_batch(X_realA, y_realA)
			dA_loss2 = d_model_A.train_on_batch(X_fakeA, y_fakeA)

			g_loss1, _, _, _, _ = c_model_AtoB.train_on_batch([X_realA, X_realB], [y_realB, X_realB, X_realA, X_realB])

			dB_loss1 = d_model_B.train_on_batch(X_realB, y_realB)
			dB_loss2 = d_model_B.train_on_batch(X_fakeB, y_fakeB)
			print('Step: ',j+1, '\ndA[',dA_loss1,dA_loss2,']\ndB[',dB_loss1,dB_loss2,']\ng[',g_loss1,g_loss2,']\n-------------------------')

		summarize_performance(i, g_model_AtoB, domainA, 'AtoB')
		summarize_performance(i, g_model_BtoA, domainB, 'BtoA')  
		save_models(i, g_model_AtoB, g_model_BtoA)

### Testing
We can test the saved models with this code. We only need an input image of size 256x256. The syntax is: python3 generate.py <filename_of_AtoB_model> <filenamename_of_BtoA_model> <filename_of_picture> <m/f>.

In [None]:
import sys
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import load_img
from keras.models import load_model
from numpy import vstack, expand_dims
from loadingdata import load_dataset
from matplotlib import pyplot
from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization


def image_to_translate(path):
    img = load_img(path)
    img = img_to_array(img)
    img = expand_dims(img, 0)
    img = (img - 127.5) / 127.5
    return img

def translate(domain, img, AtoB, BtoA):
    if domain == 'm':
        generated = AtoB.predict(img)
        reconstructed = BtoA.predict(generated)
    else:
        generated = BtoA.predict(img)
        reconstructed = AtoB.predict(generated)
    return generated, reconstructed

def save_img(img, generated, reconstructed,path):
    images = vstack((img,generated,reconstructed))
    titles = ['Input','Generated','Reconstructed']
    images = (images + 1) / 2.0
    for i in range(len(images)):    
        pyplot.subplot(1,len(images),i+1)
        pyplot.axis('off')
        pyplot.imshow(images[i])
        pyplot.title(titles[i])
    pyplot.savefig(path+'translation.png')

# Generate translation with: python3 generate.py [name of AtoB model] [name of BtoA model] [picture to translate] [m/f]
# Models should be placed in models folder in the repo and image to translate should be placed in translate folder in repo.

path_models = '../../models/'
path_image = '../../translate/'

cust = {'InstanceNormalization': InstanceNormalization}
AtoB = load_model(path_models+sys.argv[1],cust)
BtoA = load_model(path_models+sys.argv[2],cust)

img = image_to_translate(path_image+sys.argv[3])

generated, reconstructed = translate(sys.argv[4],img,AtoB,BtoA)
save_img(img,generated,reconstructed,path_image)

### Training 

We tried training this model with the whole dataset, but it did not work, and we also tried lowering the number of pictures to take. The first acceptable number was around 3500 pictures for each domain. We chose to keep the number of pictures to 3000.

The training was supposed to take 35 epochs, with batchsize 1 (more than that did not work, and the memory was not big enough). However, one epoch took around 2,5 hours so in the end we managed to train the model for 18 epochs. We have generator models saved from each epoch as well as the performance of the generator model, but the generator models however were too big to even push to repo.

To see a quick sneak peak of how well the generator was trained (as we cannot determine it any other way reliably) we chose to have 5 pictures translated from each domain at the end of each epoch.

### Results
Results after first epoch:
![generated_plot](../performance/AtoB_generated_plot_001.png)
![generated_plot](../performance/BtoA_generated_plot_001.png)

Results after 5th epoch:
![generated_plot](../performance/AtoB_generated_plot_005.png)
![generated_plot](../performance/BtoA_generated_plot_005.png)

We can see the first features of genderswap developping (applying make-up to male faces, and removing make-up from female faces)

Results after 10th epoch:
![generated_plot](../performance/AtoB_generated_plot_010.png)
![generated_plot](../performance/BtoA_generated_plot_010.png)

We were satisfied with the results after this epoch, because we can also see the network trying to create artificial facial hair on female faces (first easily recognizable transformation on female faces).

Results after 15th epoch:
![generated_plot](../performance/AtoB_generated_plot_015.png)
![generated_plot](../performance/BtoA_generated_plot_015.png)

We can see that the men are still being transformed with makeup application. For female, we can see that the application of facial hair decreased, but the network still applied it on some faces.

Results after 18th and the last epoch:
![generated_plot](../performance/AtoB_generated_plot_018.png)
![generated_plot](../performance/BtoA_generated_plot_018.png)

The results of this epoch show that not only the network tries to remove the facial hair from males and apply make-up at the same time. However, the facial hair is only removed around the mouth. For female images it is the complete opposite: removing make-up and adding facial hair (surprisingly). We can see a better application of facial hair in these examples.

### Evaluation
The results are better than we expected on such a small dataset. To get the best results, we would need to use the whole dataset or at least a half of it, however it would take much longer to train even one epoch. The training took a long time so we did not even have time to experiment.

We did not even manage to save logs, as we were struggling with getting it to work at all.