# Image Style Transfer

### What is Image Style Transfer ?

Remember using applications like Prisma, Snapseed, Lucid etc.?
 Ever wondered how these things works? 

We give a photo from our camera roll, then we select a design to combine both the images and we get a resultant new image which has the **_content_** of our input image and **_style_** of the design image. In the world of deep learning this is called **“Style transfer”**

The **_style_** of a painting is: the way the painter used brush strokes; how these strokes form objects; texture of objects; color palette used.

The **_content_** of the image is what objects are present in this image (person, face, dog, eyes, etc.) and their relationships in space.


#### Some examples:
<img src="Collage.png" alt="Drawing" style="width: 1000px;"> 

### What papers are the study based on?
- Leon A. Gatys : A Neural Algorithm of Artistic Style
- Justin Johnson : Perceptual Losses for Real-Time Style Transfer and Super-Resolution


### What are the requirements?
- Pre-trained VGG19 network 
- TensorFlow (v 1.9.0)
- Numpy & Scipy
- Pillow or PIL (Python Imaging Library) – library for image manipulation

### How do we separate style and content of an image?
#### By using convolutional neural networks(CNNs)
_Wiki defn. a convolutional neural network (CNN) is a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery. CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing. They were inspired by biological process, namely the connectivity pattern between neurons resembles the organization of the animal visual cortex._

- filters in the first layers in CNNs recognize simple patterns, brush strokes, textures, etc.
- filters in the intermediate layers happen to locate and recognize major objects in the image, such as a dog, a building or a mountain.

The reason why a pre-trained network is being used is that, when we take a convolutional neural network that has already been trained to recognize objects within images then that network will have developed some internal independent representations of the content and style contained within a given image. Here in a VGG net, shallow layers learns low level features and as we go deeper into the network these convolutional layers are able to represent much larger scale features 

### Import necessary libraries and dependencies:

In [1]:
import tensorflow as tf
import numpy as np
import scipy.misc
import tensorflow as tf
import math
import os
from argparse import ArgumentParser
from PIL import Image
from sys import stderr
import scipy.io
from functools import reduce
import warnings
warnings.filterwarnings('ignore')
import imageio
from skimage import transform
import skimage 

  from ._conv import register_converters as _register_converters


### Initialize known hyperparameters and other parameters necessary for the style transfer:

In [2]:
# Content and style ratio 5e0:5e2
# Amout of content image in the result image reconstruction. Default = 5e0.
CONTENT_WEIGHT = 5e0

# Content and style ratio. Amout of style image in the result image reconstruction. Default = 1e2.
STYLE_WEIGHT = 5e2 

# Weight of total-variation (TV) regularization; this helps to smooth the image. Default is 1e-3.
TV_WEIGHT = 1e2 

# Used to tweak how "abstract" the style transfer should be. Lower values mean that 
# style transfer of a finer features will be favored over style transfer of a more coarse features,and vice versa.
# Default value is 1
STYLE_LAYER_WEIGHT_EXP = 1 
                           
# Specifies the coefficient of content transfer layers. Default value = 1. 
CONTENT_WEIGHT_BLEND = 1 

STYLE_BLEND_WEIGHTS = (0.2,0.2) #''''The weight for blending the style of multiple style images, 
                                #as a comma-separated list, such as -style_blend_weights 3,7. 
                                #By default all style images are equally weighted.''''
# Doubling style has better effect than halving content
# Halving the content weight will result in lower absolute values for the loss function. 


#Optimize Parameters
LEARNING_RATE = 1e1 # Learning rate to use with ADAM optimizer. Default is 1e1. 
BETA1 = 0.9
BETA2 = 0.999
EPSILON = 1e-08
STYLE_SCALE = 1 # Scale at which to extract features from the style image. Default is 1.0.
ITERATIONS = 5 # Controls no of iterations, more the iteration, better the resultant image
VGG_PATH = './imagenet-vgg-verydeep-19.mat'
STYLE_PATH = './rain-princess-aframov.jpg' # The styling image 
CONTENT_PATH = './content2.jpg' # The content image 
OUTPUT = './result.jpg' # Result after style transfer 
POOLING = 'max' #''''Allows to select which pooling layers to use (specify either max or avg). 
                 #   The outputs are perceptually differnt, max pool in 
                  #  general tends to have finer detail style transfer, but could have troubles at lower-freqency detail level''''
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.4

#Other Setting
PRESERVE_COLORS = False

logs_path = 'C:/Users/andre/Documents/Jupyter Notebooks Summer Semester/tensor_logs/'

### The VGG19 layers:

In [3]:
# the VGG19 latyers
#''''vgg19 returns a pretrained VGG-19 model. This model is trained on a subset of the ImageNet database [1], 
 #   which is used in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) [2]. 
  #  VGG-19 is trained on more than a million images and can classify images into 1000 object categories. 
   # For example, keyboard, mouse, pencil, and many animals. As a result, the model has learned rich 
    #feature representations for a wide range of images.''''

VGG19_LAYERS = (
    'conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1',
    'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2',
    'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 
    'conv3_3', 'relu3_3', 'conv3_4', 'relu3_4', 'pool3',
    'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 
    'conv4_3', 'relu4_3', 'conv4_4', 'relu4_4', 'pool4',
    'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 
    'conv5_3', 'relu5_3', 'conv5_4', 'relu5_4'
)

### Selection of required Relu layers for Content and Style layers from the VGG19 network:

In [4]:
# Using the default layers as described in the paper
CONTENT_LAYERS = ('relu4_2', 'relu5_2')
STYLE_LAYERS   = ('relu1_1', 'relu2_1', 'relu3_1', 'relu4_1', 'relu5_1') 

In [5]:
if not os.path.isfile(VGG_PATH):
        print("Network %s does not exist. (VGG19 File not found)" % VGG_PATH)


### Loading the pre-trained VGG19 network:

In [6]:
def load_net(data_path):
    data = scipy.io.loadmat(data_path)
    if not all(i in data for i in ('layers', 'classes', 'normalization')):
        raise ValueError("Wrong VGG19 data. Please download the correct data.")
    mean = data['normalization'][0][0][0]
    mean_pixel = np.mean(mean, axis=(0, 1))
    weights = data['layers'][0]
    return weights, mean_pixel

def preprocess(image, mean_pixel):
    return image - mean_pixel

def unprocess(image, mean_pixel):
    return image + mean_pixel

### Methods to define the network layers according to the layer type.

In [7]:
# In progress - The code yet to be fully Modified. 
def net_preloaded(weights, input_image, pooling):
    net = {}
    current = input_image
    for i, name in enumerate(VGG19_LAYERS):
        kind = name[:4]
        if kind == 'conv':
            kernels, bias = weights[i][0][0][0][0]
            # matconvnet: weights are [width, height, in_channels, out_channels]
            # tensorflow: weights are [height, width, in_channels, out_channels]
            kernels = np.transpose(kernels, (1, 0, 2, 3))
            bias = bias.reshape(-1)
            current = _conv_layer(current, kernels, bias)
        elif kind == 'relu':
            current = tf.nn.relu(current)
        elif kind == 'pool':
            current = _pool_layer(current, pooling)
        net[name] = current

    assert len(net) == len(VGG19_LAYERS)
    return net

# Convolution layer values
def _conv_layer(input, weights, bias):
    conv = tf.nn.conv2d(input, tf.constant(weights), strides=(1, 1, 1, 1),
            padding='SAME')
    return tf.nn.bias_add(conv, bias)

# Pool layer values
def _pool_layer(input, pooling):
    if pooling == 'avg':
        return tf.nn.avg_pool(input, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1),
                padding='SAME')
    else:
        return tf.nn.max_pool(input, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1),
                padding='SAME')

### Method to perform style transfer:

In [8]:
# In Progress - fine tuning required

# this function yields tuples (iteration, image);
def stylize(network, initial, initial_noiseblend, content, styles, preserve_colors, iterations,
        content_weight, content_weight_blend, style_weight, style_layer_weight_exp, style_blend_weights, tv_weight,
        learning_rate, beta1, beta2, epsilon, pooling,
        print_iterations=None, checkpoint_iterations=None):
    
    shape = (1,) + content.shape
    style_shapes = [(1,) + style.shape for style in styles]
    content_features = {}
    style_features = [{} for _ in styles]

    vgg_weights, vgg_mean_pixel = load_net(network)

    layer_weight = 1.0
    style_layers_weights = {}
    for style_layer in STYLE_LAYERS:
        style_layers_weights[style_layer] = layer_weight
        layer_weight *= style_layer_weight_exp

    # normalize style layer weights
    layer_weights_sum = 0
    for style_layer in STYLE_LAYERS:
        layer_weights_sum += style_layers_weights[style_layer]
    for style_layer in STYLE_LAYERS:
        style_layers_weights[style_layer] /= layer_weights_sum

    # to compute content features in feedforward mode
    g = tf.Graph()
    with g.as_default(), g.device('/cpu:0'), tf.Session() as sess:
        image = tf.placeholder('float', shape=shape)
        net = net_preloaded(vgg_weights, image, pooling)
        content_pre = np.array([preprocess(content, vgg_mean_pixel)])
        for layer in CONTENT_LAYERS:
            content_features[layer] = net[layer].eval(feed_dict={image: content_pre})
            
    # to compute style features in feedforward mode
    for i in range(len(styles)):
        g = tf.Graph()
        with g.as_default(), g.device('/cpu:0'), tf.Session() as sess:
            image = tf.placeholder('float', shape=style_shapes[i])
            net = net_preloaded(vgg_weights, image, pooling)
            style_pre = np.array([preprocess(styles[i], vgg_mean_pixel)])
            for layer in STYLE_LAYERS:
                features = net[layer].eval(feed_dict={image: style_pre})
                features = np.reshape(features, (-1, features.shape[3]))
                gram = np.matmul(features.T, features) / features.size
                style_features[i][layer] = gram
                
                
    initial_content_noise_coeff = 1.0 - initial_noiseblend

    # to make stylized image using backpropogation
    with tf.Graph().as_default():
        if initial is None:
            noise = np.random.normal(size=shape, scale=np.std(content) * 0.1)
            initial = tf.random_normal(shape) * 0.256
            
            
            
        else:
            initial = np.array([preprocess(initial, vgg_mean_pixel)])
            initial = initial.astype('float32')
            noise = np.random.normal(size=shape, scale=np.std(content) * 0.1)
            initial = (initial) * initial_content_noise_coeff + (tf.random_normal(shape) * 0.256) * (1.0 - initial_content_noise_coeff)
            
            
            
        image = tf.Variable(initial)
        net = net_preloaded(vgg_weights, image, pooling)

        # to compute the content loss
        content_layers_weights = {}
        content_layers_weights['relu4_2'] = content_weight_blend
        content_layers_weights['relu5_2'] = 1.0 - content_weight_blend

        content_loss = 0
        content_losses = []
        for content_layer in CONTENT_LAYERS:
            content_losses.append(content_layers_weights[content_layer] * content_weight * (2 * tf.nn.l2_loss(
                    net[content_layer] - content_features[content_layer]) /
                    content_features[content_layer].size))   
        content_loss += reduce(tf.add, content_losses)
        contentlosssumm = tf.summary.scalar("Content_Loss", content_loss)
            

        # to compute the style loss
        style_loss = 0
        for i in range(len(styles)):
            style_losses = []
            for style_layer in STYLE_LAYERS:
                layer = net[style_layer]
                _, height, width, number = map(lambda i: i.value, layer.get_shape())#
                size = height * width * number
                feats = tf.reshape(layer, (-1, number))
                gram = tf.matmul(tf.transpose(feats), feats) / size
                style_gram = style_features[i][style_layer]
                style_losses.append(style_layers_weights[style_layer] * 2 * tf.nn.l2_loss(gram - style_gram) / style_gram.size)
            style_loss += style_weight * style_blend_weights[i] * reduce(tf.add, style_losses)
            stylelosssumm = tf.summary.scalar("Style_Loss", style_loss)
                
        
        # to compute the overall loss
        loss = content_loss + style_loss #+ tv_loss    
        
        # optimizer setup, add it to the graph and return an tf.Operation
        train_step = tf.train.AdamOptimizer(learning_rate, beta1, beta2, epsilon).minimize(loss)
                       
        # to print loss figures after the final iteration
        def print_progress():
            stderr.write('  Content Loss: %g\n' % content_loss.eval())
            stderr.write('    Style Loss: %g\n' % style_loss.eval())
            stderr.write('    TOTAL Loss: %g\n' % loss.eval())

        # Optimization
        best_loss = float('inf')
        best = None
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            stderr.write('Optimization started...\n')
            if (print_iterations and print_iterations != 0):
                print_progress()
            for i in range(iterations):
                stderr.write('Iteration %4d/%4d\n' % (i + 1, iterations))
                train_step.run()

                last_step = (i == iterations - 1)
                if last_step or (print_iterations and i % print_iterations == 0):
                    print_progress()

                if (checkpoint_iterations and i % checkpoint_iterations == 0) or last_step:
                    this_loss = loss.eval()
                    if this_loss < best_loss:
                        best_loss = this_loss
                        best = image.eval()

                    img_out = unprocess(best.reshape(shape[1:]), vgg_mean_pixel)

                    if preserve_colors and preserve_colors == True:
                        original_image = np.clip(content, 0, 255)
                        styled_image = np.clip(img_out, 0, 255)

                        # Luminosity transfer steps:
                        # 1. Convert stylized RGB->grayscale accoriding to Rec.601 luma (0.299, 0.587, 0.114)
                        # 2. Convert stylized grayscale into YUV (YCbCr)
                        # 3. Convert original image into YUV (YCbCr)
                        # 4. Recombine (stylizedYUV.Y, originalYUV.U, originalYUV.V)
                        # 5. Convert recombined image from YUV back to RGB

                        # 1
                        styled_grayscale = rgb2gray(styled_image)
                        styled_grayscale_rgb = gray2rgb(styled_grayscale)

                        # 2
                        styled_grayscale_yuv = np.array(Image.fromarray(styled_grayscale_rgb.astype(np.uint8)).convert('YCbCr'))

                        # 3
                        original_yuv = np.array(Image.fromarray(original_image.astype(np.uint8)).convert('YCbCr'))

                        # 4
                        w, h, _ = original_image.shape
                        combined_yuv = np.empty((w, h, 3), dtype=np.uint8)
                        combined_yuv[..., 0] = styled_grayscale_yuv[..., 0]
                        combined_yuv[..., 1] = original_yuv[..., 1]
                        combined_yuv[..., 2] = original_yuv[..., 2]

                        # 5
                        img_out = np.array(Image.fromarray(combined_yuv, 'YCbCr').convert('RGB'))


                    yield (
                        (None if last_step else i),
                        img_out
                    )
                                         
def _tensor_size(tensor):
    from operator import mul
    return reduce(mul, (d.value for d in tensor.get_shape()), 1)


### Methods to preserve the color of content image:

In [9]:
def rgb2gray(rgb):
    return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])

def gray2rgb(gray):
    w, h = gray.shape
    rgb = np.empty((w, h, 3), dtype=np.float32)
    rgb[:, :, 2] = rgb[:, :, 1] = rgb[:, :, 0] = gray
    return rgb

In [10]:
# Method to load source image from disk
def imread(path):
    #read an image from a file as an array
    img = scipy.misc.imread(path).astype(np.float)
    if len(img.shape) == 2:
        # grayscale
        img = np.dstack((img,img,img))
    elif img.shape[2] == 4:
        # JPG with alpha channel
        img = img[:,:,:3]
    return img

# Method to save stylized image
def imsave(path, img):
    img = np.clip(img, 0, 255).astype(np.uint8)
    Image.fromarray(img).save(path, quality=95)

#if __name__ == '__main__':
   # main()

In [11]:
content_image = imageio.imread(CONTENT_PATH)
style_images = [imageio.imread(STYLE_PATH)]
target_shape = content_image.shape

### Adjusting size and shape of style image to fit the output shape

In [12]:
style_scale = STYLE_SCALE
for i in range(len(style_images)):
    #print(style_images[i].shape[1])
    #style_scale * target_shape[1] / style_images[i].shape[1] - use this as input
    #print("the output shape is {}".format(style_scale * target_shape[1]/ style_images[i].shape[1]))
    style_images[i] = skimage.transform.resize(image=style_images[i], output_shape=target_shape,mode='constant')

#In default, for every style image, the style weight is the same. Possibility to set weights of different style images
style_blend_weights = [1.0/len(style_images) for _ in style_images]

In [13]:
#In default, for every style image, the style weight is the same. 
initial_noiseblend = 1.0
initial = content_image

### Code to begin iteration based on the stylize method

In [14]:
# Iterations and optimization and calculating the content loss, style loss, total loss. 
#merge = tf.summary.merge([contentlosssumm, stylelosssumm, totallosssum])
for iteration, image in stylize(
        network                =VGG_PATH,
        initial                =initial,
        initial_noiseblend     =initial_noiseblend,
        content                =content_image,
        styles                 =style_images,
        preserve_colors        =PRESERVE_COLORS,
        iterations             =ITERATIONS,
        content_weight         =CONTENT_WEIGHT,
        content_weight_blend   =CONTENT_WEIGHT_BLEND,
        style_weight           =STYLE_WEIGHT,
        style_layer_weight_exp =STYLE_LAYER_WEIGHT_EXP,
        style_blend_weights    =STYLE_BLEND_WEIGHTS,
        tv_weight              =TV_WEIGHT,
        learning_rate          =LEARNING_RATE,
        beta1=BETA1,
        beta2=BETA2,
        epsilon=EPSILON,
        pooling                =POOLING,
        print_iterations       =None,
        checkpoint_iterations  =None
    ):
       
        
        output_file = None
        combined_rgb = image
        output_file = OUTPUT
        
        if output_file:
            imsave(output_file, combined_rgb)

Optimization started...
Iteration    1/   5
Iteration    2/   5
Iteration    3/   5
Iteration    4/   5
Iteration    5/   5
  Content Loss: 605030
    Style Loss: 3.3767e+06
    TOTAL Loss: 3.98173e+06


#### Result of the above run:

<img src="result.jpg" alt="Drawing" style="width: 500px;"> 

### Further results:
<img src="result_collage.png" alt="Drawing" style="width: 1000px;"> 

### Results from pre-trained models:

<img src="conv 4.png" alt="Drawing" style="width: 1000px;"> 

<img src="conv 1.png" alt="Drawing" style="width: 1000px;"> 

### Limitations:
- Lack of a GPU enabled system 
    training taking a longer time (1000 iterations took 3 hours on a CPU driven iteration) 
    Lower iterations (say around less than 1000) does not produce an identifiable resultant image.
-    Memory dumps while running larger iterations
-    TensorFlow compatibility with python version on Jupyter notebooks


### References:
Anish Athalye [Implementation of Neural Style in TensorFlow](https://github.com/anishathalye/neural-style)

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge [A Neural Algorithm of Artistic Style](https://github.com/leongatys/fast-neural-style)

Justin Johnson, Alexandre Alahi, Li Fei-Fei [Perceptual losses for real-time style transfer and super-resolution](https://github.com/jcjohnson/fast-neural-style)

Logan Engstrom [Fast Style Transfer](https://github.com/lengstrom/fast-style-transfer/)

