# Real Time Style Transfer with TensorFlow and Keras
<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/abrampers/real-time-style-transfer-tensorflow/blob/master/Real%20Time%20Style%20Transfer%20-%20TensorFlow.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/abrampers/real-time-style-transfer-tensorflow/blob/master/Real%20Time%20Style%20Transfer%20-%20TensorFlow.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>
<br>

In this notebook, we'll implement a network that performs __neural style transfer__ based on paper by [Justin Johnson, et al](https://cs.stanford.edu/people/jcjohns/eccv16/).
>[Justin Johnson's paper](https://cs.stanford.edu/people/jcjohns/eccv16/) states that using this method is giving similar qualitative results but is three orders of magnitude faster than optimization technique outlined in [Leon A. Gatys' paper, A Neural Algorithm of Artistic Style](https://arxiv.org/abs/1508.06576).

## Overview
Neural style transfer is an optimization technique used to take three images, a __content__ image, a __style reference__ image (such as an artwork by a famous painter), and the input image you want to style -- and blend them together such that the input image is transformed to look like the content image, but “painted” in the style of the style image.

In this paper, style transfer is done by training a deep convolutional neural network using a pretrained deep convolutional neural network. In this case, we're using [VGG16](https://arxiv.org/abs/1409.1556) pretrained on imagenet dataset.

# TODO: masukin gambar arsitektur networknya

## TODOS:
1. Create keras.Layers class instead of functions
2. Search for style images

### List of style images
1. starry night
2. hockney
3. monet
4. rain princess
5. the scream
6. udnie


In [None]:
# import resources
import tensorflow as tf

from PIL import Image

import matplotlib.pyplot as plt
import os
import numpy as np

tf.enable_eager_execution()

## Load in the Content and Style images
Here, we create function to load image and do VGG16 standard preprocessing using `tf.keras.applications.vgg16.preprocess_input`.

In [None]:
def load_image(image_path, target_size=None):
    """ Load image from path and do VGG16 standard preprocessing with tf mode.
        Returns a tensor representation of the image.
    """
    if target_size is None:
        image = tf.keras.preprocessing.image.load_img(image_path)
    else:
        image = tf.keras.preprocessing.image.load_img(image_path, target_size=target_size)
    image = tf.keras.preprocessing.image.img_to_array(image)
    image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
    image = tf.keras.applications.vgg16.preprocess_input(image, mode='tf')
    image = tf.convert_to_tensor(image)
    return image

We're going to implement the function to load image from the MS COCO dataset given the image path.

In [None]:
def load_train_image(image_path):
    """ Mapping function to load train image from path
    """
    img = tf.read_file(image_path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize_images(img, (480, 640))
    image_shape = tf.shape(img)
    img = tf.reshape(img, [image_shape[0], image_shape[1], image_shape[2]])
    img = tf.keras.applications.vgg16.preprocess_input(img, mode='tf')
    return img

Here's a few helper to show and save the image from a `tf.Tensor`.

In [None]:
def imshow(image, title=None, denormalize=True):
    """ Showing image tensor
    """
    image = image.numpy()
    if denormalize:
        image = (image + 1.) * 127.5
    # Remove the batch dimension
    image = np.squeeze(image)
    if title is not None:
        plt.title(title)
    plt.imshow(image/255.)
    plt.show()

In [None]:
def save_image(image, image_path, denormalize=True):
    """ Save image from tensor
    """
    if denormalize:
        image = (image + 1) * 127.5
    image = tf.cast(image, tf.uint8)
    image = tf.squeeze(image, axis=0)
    image = tf.image.encode_jpeg(image)
    tf.io.write_file(image_path, image)

Next, we're loading the content image by the name of the file and show the image.

In [None]:
content_image = load_image("images/content/mug.jpg")
imshow(content_image)
content_shape = tf.shape(content_image)
_, img_height, img_width, _ = content_shape

We're going to load and show the style image and force the style image to match the size of the image.

In [None]:
style_image = load_image("images/styles/wave_crop.jpg", target_size=(img_height, img_width))
imshow(style_image)

## Load in the Training images
Here, we load [Microsoft's COCO dataset](https://arxiv.org/pdf/1405.0312.pdf) for training the network.

In [None]:
name_of_zip = 'train2014.zip'
name_of_folder = 'train2014'
if not os.path.exists(os.path.abspath('.') + '/' + name_of_folder):
    image_zip = tf.keras.utils.get_file(name_of_zip, 
                                      cache_subdir=os.path.abspath('.'),
                                      origin = 'http://images.cocodataset.org/zips/train2014.zip',
                                      extract = True)
    mscoco_path = os.path.dirname(image_zip)+'/train2014/'
else:
    mscoco_path = os.path.abspath('.')+'/train2014/'

In [None]:
train_batchsize = 4
mscoco_path

In [None]:
# Get filenames of the training images
mscoco_filenames = tf.constant(mscoco_path) + os.listdir('train2014')

In [None]:
# Create a `tf.data.Dataset` filenames
dataset = tf.data.Dataset.from_tensor_slices(mscoco_filenames)
# Load all the data from filenames
dataset = dataset.map(load_train_image)
# Create dataset with train_batchsize
dataset = dataset.batch(train_batchsize)

Make sure that the data is loaded

In [None]:
# Test and show the first batch of the dataset
batch_image = next(iter(dataset))
for image in batch_image:
    imshow(image)

## Define the network with TensorFlow
Below is where we'll define the network as in the paper of [Justin Johnson, et al](https://cs.stanford.edu/people/jcjohns/eccv16/).

<img src="images/assets/model_architecture.png" width=1000px>

Next, we'll use TensorFlow to define the architecture of the network. We start by defining the layers and operations we want. Then, define a method for the forward pass.

In [None]:
def reflection_padding():
    """Reflection padding layer for output size to match the input size"""
    def f(inputs):
        return tf.pad(inputs, [[0, 0], [40, 40], [40, 40], [0, 0]], "REFLECT")
    return f

def conv_layer(n_channels, kernel_size, strides, padding="same", relu=True):
    """Convolutional layer wrapper"""
    def f(inputs):
        conv = tf.keras.layers.Conv2D(filters=n_channels, 
                                      kernel_size=kernel_size, 
                                      strides=strides, 
                                      padding=padding)(inputs)
        bn = tf.keras.layers.BatchNormalization()(conv)
        if relu:
            return tf.keras.layers.Activation("relu")(bn)
        else:
            return bn
        
    return f

def conv_transpose_layer(n_channels, kernel_size, strides, padding="same", relu=True):
    """Convolutional transpose layer to upsample the image"""
    def f(inputs):
        conv = tf.keras.layers.Conv2DTranspose(n_channels, 
                                               kernel_size=kernel_size, 
                                               strides=strides, 
                                               padding=padding)(inputs)
        bn = tf.keras.layers.BatchNormalization()(conv)
        if relu:
            return tf.keras.layers.Activation("relu")(bn)
        else:
            return bn
        
    return f

def residual_block(n_channels, kernel_size=3, strides=1, padding='valid'):
    """Residual Block. Center cropped the input to match output size"""
    def f(inputs):
        inputs_shape = inputs.get_shape().as_list()
        residual = tf.image.resize_image_with_crop_or_pad(inputs, inputs_shape[1] - 4, inputs_shape[2] - 4)
        conv_1 = tf.keras.layers.Conv2D(filters=n_channels, 
                                        kernel_size=kernel_size, 
                                        strides=strides, 
                                        padding=padding)(inputs)
        bn_1 = tf.keras.layers.BatchNormalization()(conv_1)
        relu_1 = tf.keras.layers.Activation("relu")(bn_1)
        conv_2 = tf.keras.layers.Conv2D(filters=n_channels, 
                                        kernel_size=kernel_size, 
                                        strides=strides, 
                                        padding=padding)(relu_1)
        bn_2 = tf.keras.layers.BatchNormalization()(conv_2)
        return tf.keras.layers.add([bn_2, residual])

    return f

In [None]:
class StyleTransferModel(tf.keras.Model):
    """Style Transfer Model class"""
    def __init__(self):
        super(StyleTransferModel, self).__init__(name='style_transfer_model')
        
        # Layers
        self.pad = reflection_padding()
        self.conv_1 = conv_layer(32, 9, 1)
        self.conv_2 = conv_layer(64, 3, 2)
        self.conv_3 = conv_layer(128, 3, 2)
        self.res_1 = residual_block(128, 3, 1)
        self.res_2 = residual_block(128, 3, 1)
        self.res_3 = residual_block(128, 3, 1)
        self.res_4 = residual_block(128, 3, 1)
        self.res_5 = residual_block(128, 3, 1)
        self.conv_4 = conv_transpose_layer(64, 3, 2)
        self.conv_5 = conv_transpose_layer(32, 3, 2)
        self.conv_6 = conv_layer(3, 9, 1, relu=False)


        
    def call(self, inputs):
        # (width x height x channels)
        # 256 x 256 x 3

        # 336 x 336 x 3
        padded = self.pad(inputs)
        # 336 x 336 x 32
        conv_1_out = self.conv_1(padded)
        # 168 x 64 x 64
        conv_2_out = self.conv_2(conv_1_out)
        # 84 x 84 x 128
        conv_3_out = self.conv_3(conv_2_out)
        # 80 x 80 x 128
        res_1_out = self.res_1(conv_3_out)
        # 76 x 76 x 128
        res_2_out = self.res_2(res_1_out)
        # 72 x 72 x 128
        res_3_out = self.res_3(res_2_out)
        # 68 x 68 x 128
        res_4_out = self.res_4(res_3_out)
        # 64 x 64 x 128
        res_5_out = self.res_5(res_4_out)
        # 128 x 128 x 64
        conv_4_out = self.conv_4(res_5_out)
        # 256 x 256 x 32
        conv_5_out = self.conv_5(conv_4_out)
        # 256 x 256 x 3
        conv_6_out = self.conv_6(conv_5_out)
        
        tanh_out = (conv_6_out + 1) * 255. / 2
        return tanh_out
    
    def preprocess_input(x):
        """ Denormalizes input
        """
        return (x / 127.5) - 1

In [None]:
net = StyleTransferModel()

Try forward pass the content_image

In [None]:
y_hat = net(content_image)

In [None]:
y_hat.dtype

In [None]:
def get_style_features(model, image):
    """ Run an image forward through a model and get the features for 
        a set of style layers.
        Returns a dictionary of the layer name and the activations.
    """
    style_layers = ['block1_conv2', 'block2_conv2', 'block3_conv3', 'block4_conv3']
    
    features = {}
    x = image
    # model._modules is a dictionary holding each module in the model
    for layer in model.layers:
        x = layer(x)
        if layer.name in style_layers:
            features[layer.name] = x
            if layer.name == 'block4_conv3':
                break
            
    return features

def get_content_feature(model, image):
    """ Run an image forward through a model and get the features for 
        a set of conent layers.
        Returns the activation of the content layer
    """
    style_layers = ['block3_conv3']
    
    x = image
    # model._modules is a dictionary holding each module in the model
    for layer in model.layers:
        x = layer(x)
        if layer.name in style_layers:
            features = x
            break
            
    return features

In [None]:
def gram_matrix(x):
    """ Compute gram matrix of a 3 dimensional convolution
    """
    b, h, w, c = tf.shape(x)
    x = tf.reshape(x, [b, c, -1])
    size = tf.to_float(c * h * w)
    return  tf.matmul(x, tf.transpose(x, perm=[0, 2, 1])) / size

In [None]:
model = VGG16()

In [None]:
CONTENT_WEIGHT = 7.5e0
STYLE_WEIGHT = 1e2

In [None]:
def loss(y, content_image, style_image, content_weight, style_weight):
    """ Compute loss of output with respect to content and style image
    """
    # Pretrained VGG16 on imagenet
    model = tf.keras.applications.vgg16.VGG16()
    
    # Style features of output
    output_style_features = get_style_features(model, y)
    # Content features of output
    output_content_feature = get_content_feature(model, y)
    
    # Style features of style image
    style_features = get_style_features(model, style_image)
    # Content features of content image
    content_feature = get_content_feature(model, content_image)
    
    # Compute content loss
    # (output - content )/(Cj * Hj * Wj)
    content_loss = content_weight * tf.reduce_mean(tf.math.square(output_content_feature - content_feature))
    
    # Compute style loss
    # Gram matrix of output features
    output_grams = [gram_matrix(x) for _, x in output_style_features.items()]
                                                  
    # Gram matrix of style features
    style_grams = [gram_matrix(x) for _, x in style_features.items()]
    
    style_losses = [tf.square(tf.norm(output_gram - style_gram)) for output_gram, style_gram in zip(output_grams, style_grams)]
    style_loss = style_weight * tf.reduce_sum(tf.convert_to_tensor(style_losses)) / 4.
                                                  
    # TODO: Add total variation regularization
    
    total_loss = content_loss + style_loss
    return total_loss, content_loss, style_loss

In [None]:
<tf.Tensor: id=18761, shape=(), dtype=float32, numpy=4519431000000.0>

In [None]:
<tf.Tensor: id=12538, shape=(), dtype=float32, numpy=2355534000000.0>

In [None]:
loss(y_hat, content_image, style_image, 1, 0.001)