<a href="https://colab.research.google.com/github/IamAVB/Style_transfer/blob/master/Style_Transfer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a Stle transfer program which will transfer the artistic style of one image(style image) to another image(content image). 

This implementation is based on A Neural Algorithm of Artistic Style (Gatys et al., 2016) paper.

Here we need  to import two images. One for taking the style from the  image and another for the image to be transformed. 

End results are stored in outputs folder which can be seen on the left panel. 

I have stored the output images for fewer steps because the changes between the successive images will be minimal. 

This program I wrote as part of stanford course work I was taking as a self study course. 

In [1]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving Varagina Banakar Aravind_DACS.jpg to Varagina Banakar Aravind_DACS (1).jpg
User uploaded file "Varagina Banakar Aravind_DACS.jpg" with length 5314 bytes


We will make use of pretrained object detection model VGG-19 weights as initial weights for our model. This will give better results over initializing the weights with normal randomized values.

In [0]:
# Import necessary packages that we are going to make use of.

import os, time, scipy.misc
import numpy as np # for data manupulations
import scipy.io    
import tensorflow as tf # Tensorflow as background. 
from PIL import Image, ImageOps # used for resizing the original images
from six.moves import urllib    # Used to download VGG-19 model layer weights from URL. 

# Set TF_CPP_MIN_LOG_LEVEL to 0 All logs are shown (default)
#                             1 to filter out INFO logs, 
#                             2 to additionall filter out WARNING, 
#                             3 to additionally filter out ERROR.
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'


In [0]:
# Function to downlaod pretrained object detection VGG-19 model.
# We also check the byte size once downloaded to verify whether downloaded flie is correct and no byte loss happened. 
def download(download_link, file_name, expected_bytes):
    """ Download the pretrained VGG-19 model if it's not already downloaded """
    if os.path.exists(file_name):
        print("Pre-trained Object detection VGG-19 model exists already.")
        return
    print("Downloading the Pre-trained Object detection VGG-19 model. Wait till it's done")
    file_name, _ = urllib.request.urlretrieve(download_link, file_name)
    file_stat = os.stat(file_name)
    if file_stat.st_size == expected_bytes:
        print('Successfully downloaded VGG-19 model', file_name)
    else:
        raise Exception('File ' + file_name +
                        ' byte miss match. Files might be corrupted. Try downloading it from a browser.')

# To resize content image and style image to match the dimension that we want(output image dimension).
def get_resized_image(img_path, width, height, save=True):
    image = Image.open(img_path)
    # swap the places of width and height since it's column major.
    image = ImageOps.fit(image, (width, height), Image.ANTIALIAS)
    if save:
        image_dirs = img_path.split('/')
        image_dirs[-1] = 'resized_' + image_dirs[-1]
        out_path = '/'.join(image_dirs)
        if not os.path.exists(out_path):
            image.save(out_path)
    image = np.asarray(image, np.float32)
    return np.expand_dims(image, 0)

def generate_noise_image(content_image, width, height, noise_ratio=0.6):
    noise_image = np.random.uniform(-20, 20, (1, height, width, 3)).astype(np.float32)
    return noise_image * noise_ratio + content_image * (1 - noise_ratio)

def save_image(path, image):
    image = image[0]
    image = np.clip(image, 0, 255).astype('uint8')
    scipy.misc.imsave(path, image)

def safe_mkdir(path):
    """ Create a directory if there isn't one already. """
    try:
      os.mkdir(path)
    except OSError:
      pass

In [0]:
# VGG-19 parameters file
VGG_DOWNLOAD_LINK = 'http://www.vlfeat.org/matconvnet/models/imagenet-vgg-verydeep-19.mat'
VGG_FILENAME = 'imagenet-vgg-verydeep-19.mat'
EXPECTED_BYTES = 534904783

class VGG(object):
    def __init__(self, input_img):
        download(VGG_DOWNLOAD_LINK, VGG_FILENAME, EXPECTED_BYTES)
        self.vgg_layers = scipy.io.loadmat(VGG_FILENAME)['layers']
        self.input_img = input_img
        self.mean_pixels = np.array([123.68, 116.779, 103.939]).reshape((1,1,1,3)) # mean pixel values got from VGG 19 mean centered values.

    def _weights(self, layer_idx, expected_layer_name):
        """ Return the weights and biases at layer_idx already trained by VGG
        """
        W = self.vgg_layers[0][layer_idx][0][0][2][0][0]
        b = self.vgg_layers[0][layer_idx][0][0][2][0][1]
        layer_name = self.vgg_layers[0][layer_idx][0][0][0][0]
        assert layer_name == expected_layer_name
        return W, b.reshape(b.size)

    def conv2d_relu(self, prev_layer, layer_idx, layer_name):
        """ Return the Conv2D layer with RELU using the weights, biases from the VGG model at 'layer_idx'.
        Inputs:
            prev_layer: the output tensor from the previous layer
            layer_idx: the index to current layer in vgg_layers
            layer_name: the string that is the name of the current layer. It's used for variable_scope.
        Note that you first need to obtain W and b from from the corresponding VGG's layer 
        using the function _weights() defined above.
        W and b returned from _weights() are numpy arrays, so you have
        to convert them to TF tensors. One way to do it is with tf.constant.
        I am going to use SAME padding with stride of 1 since image is small. this can be modified to observe different outcome.
        """
        ###############################
        with tf.variable_scope(layer_name) as scope:
            W, b = self._weights(layer_idx, layer_name)
            W = tf.constant(W, name='weights')
            b = tf.constant(b, name='bias')
            conv2d = tf.nn.conv2d(prev_layer, 
                                filter=W, 
                                strides=[1, 1, 1, 1], 
                                padding='SAME')
            out = tf.nn.relu(conv2d + b)
        ###############################
        setattr(self, layer_name, out)

    def avgpool(self, prev_layer, layer_name):
        """ Return the average pooling layer. The paper suggests that 
        average pooling works better than max pooling.
        Input:
            prev_layer: the output tensor from the previous layer
            layer_name: the string that you want to name the layer.
                        It's used to specify variable_scope.
        # Kernel size and strides are chosen based on two three trials of different values.
        """
        ###############################
        with tf.variable_scope(layer_name):
            out = tf.nn.avg_pool(prev_layer, 
                                ksize=[1, 2, 2, 1], 
                                strides=[1, 2, 2, 1],
                                padding='SAME')
        ###############################
        setattr(self, layer_name, out)

    def load(self):
        self.conv2d_relu(self.input_img, 0, 'conv1_1')
        self.conv2d_relu(self.conv1_1, 2, 'conv1_2')
        self.avgpool(self.conv1_2, 'avgpool1')
        
        self.conv2d_relu(self.avgpool1, 5, 'conv2_1')
        self.conv2d_relu(self.conv2_1, 7, 'conv2_2')
        self.avgpool(self.conv2_2, 'avgpool2')
        
        self.conv2d_relu(self.avgpool2, 10, 'conv3_1')
        self.conv2d_relu(self.conv3_1, 12, 'conv3_2')
        self.conv2d_relu(self.conv3_2, 14, 'conv3_3')
        self.conv2d_relu(self.conv3_3, 16, 'conv3_4')
        self.avgpool(self.conv3_4, 'avgpool3')
        
        self.conv2d_relu(self.avgpool3, 19, 'conv4_1')
        self.conv2d_relu(self.conv4_1, 21, 'conv4_2')
        self.conv2d_relu(self.conv4_2, 23, 'conv4_3')
        self.conv2d_relu(self.conv4_3, 25, 'conv4_4')
        self.avgpool(self.conv4_4, 'avgpool4')
        
        self.conv2d_relu(self.avgpool4, 28, 'conv5_1')
        self.conv2d_relu(self.conv5_1, 30, 'conv5_2')
        self.conv2d_relu(self.conv5_2, 32, 'conv5_3')
        self.conv2d_relu(self.conv5_3, 34, 'conv5_4')
        self.avgpool(self.conv5_4, 'avgpool5')

In [5]:
def setup():
    safe_mkdir('checkpoints') # check points will be saved in this folder 
    safe_mkdir('outputs')     # outputs folder.

class StyleTransfer(object):
    def __init__(self, content_img, style_img, img_width, img_height):
        '''
        img_width and img_height are the dimensions we expect from the generated image.
        We will resize input content image and input style image to match this dimension using get_resize_image.
        '''
        self.img_width   = img_width
        self.img_height  = img_height
        self.content_img = get_resized_image(content_img, img_width, img_height)
        self.style_img   = get_resized_image(style_img, img_width, img_height)
        self.initial_img = generate_noise_image(self.content_img, img_width, img_height)

        ###############################
        ## created global step (gstep) and hyperparameters for the model
        self.content_layer = 'conv4_2'
        self.style_layers = ['conv1_1', 'conv2_1', 'conv3_1', 'conv4_1', 'conv5_1']
        self.content_w = 0.01
        self.style_w = 1
        self.style_layer_w = [0.5, 1.0, 1.5, 3.0, 4.0] 
        self.gstep = tf.Variable(0, dtype=tf.int32, 
                                trainable=False, name='global_step')
        self.learning_rate = 2.0
        ###############################

    def create_input(self):
        '''
        We will use one input_img as a placeholder for the content image, 
        style image, and generated image, because:
            1. they have the same dimension
            2. we have to extract the same set of features from them
        We use a variable instead of a placeholder because we're, at the same time, 
        training the generated image to get the desirable result.
        Note: image height corresponds to number of rows, not columns.
        '''
        with tf.variable_scope('input') as scope:
            self.input_img = tf.get_variable('in_img', 
                                        shape=([1, self.img_height, self.img_width, 3]),
                                        dtype=tf.float32,
                                        initializer=tf.zeros_initializer())
    def load_vgg(self):
        '''
        Load the saved model parameters of VGG-19, using the input_img
        as the input to compute the output at each layer of vgg.
        During training, VGG-19 mean-centered all images and found the mean pixels
        to be [123.68, 116.779, 103.939] along RGB dimensions. We have to subtract
        this mean from our images.
        '''
        self.vgg = VGG(self.input_img)
        self.vgg.load()
        self.content_img -= self.vgg.mean_pixels
        self.style_img -= self.vgg.mean_pixels

    def _content_loss(self, P, F):
        ''' Calculate the loss between the feature representation of the
        content image and the generated image.
        
        Inputs: 
            P: content representation of the content image
            F: content representation of the generated image
            Read the assignment handout for more details
            Note: Don't use the coefficient 0.5 as defined in the paper.
            Use the coefficient defined in the assignment handout.
        '''
        # There are two losses here. one is content loss and another is style loss. we need to minimize both.
        # Here content loss is reduced based on formula mentioned in the paper.
        ###############################
        self.content_loss = tf.reduce_sum((F - P) ** 2) / (4.0 * P.size)
        ###############################
    
    def _gram_matrix(self, F, N, M):
        """ Create and return the gram matrix for tensor F
        """
        ###############################
        F = tf.reshape(F, (M, N))
        return tf.matmul(tf.transpose(F), F)
        ###############################

    def _single_style_loss(self, a, g):
        """ Calculate the style loss at a certain layer
        Inputs:
            a is the feature representation of the style image at that layer
            g is the feature representation of the generated image at that layer
        Output:
            the style loss at a certain layer (which is E_l in the paper)
        """
        ###############################
        N = a.shape[3] # number of filters
        M = a.shape[1] * a.shape[2] # height times width of the feature map
        A = self._gram_matrix(a, N, M)
        G = self._gram_matrix(g, N, M)
        return tf.reduce_sum((G - A) ** 2 / ((2 * N * M) ** 2))
        ###############################

    def _style_loss(self, A):
        """ The total style loss is a weighted sum of style losses at all style layers.
        """
        n_layers = len(A)
        #         _single_style_loss(  a ,     g)  
        E = [self._single_style_loss(A[i], getattr(self.vgg, self.style_layers[i])) for i in range(n_layers)]
        
        ###############################
        self.style_loss = sum([self.style_layer_w[i] * E[i] for i in range(n_layers)])
        ###############################

    def losses(self):
        with tf.variable_scope('losses') as scope:
            with tf.Session() as sess:
                sess.run(self.input_img.assign(self.content_img)) 
                gen_img_content = getattr(self.vgg, self.content_layer)
                content_img_content = sess.run(gen_img_content)
            self._content_loss(content_img_content, gen_img_content)

            with tf.Session() as sess:
                sess.run(self.input_img.assign(self.style_img))
                style_layers = sess.run([getattr(self.vgg, layer) for layer in self.style_layers])                              
            self._style_loss(style_layers)

            ##########################################
            self.total_loss = self.content_w * self.content_loss + self.style_w * self.style_loss
            ##########################################

    def optimize(self):
        ###############################
        self.opt = tf.train.AdamOptimizer(self.learning_rate).minimize(self.total_loss, global_step=self.gstep)
        ###############################

    def create_summary(self):
        ###############################
        # Create summary of all losses, so we can visualize them in tensorboard.
        with tf.name_scope('summaries'):
            tf.summary.scalar('content loss', self.content_loss)
            tf.summary.scalar('style loss', self.style_loss)
            tf.summary.scalar('total loss', self.total_loss)
            self.summary_op = tf.summary.merge_all()
        ###############################


    def build(self):
        self.create_input()
        self.load_vgg()
        self.losses()
        self.optimize()
        self.create_summary()

    def train(self, n_iters):
        skip_step = 1
        with tf.Session() as sess:
            
            ###############################
            sess.run(tf.global_variables_initializer())
            writer = tf.summary.FileWriter('graphs/style_stranfer', sess.graph)
            ###############################
            sess.run(self.input_img.assign(self.initial_img))

            ###############################
            ## created a saver object.
            ## restore the variables if a checkpoint exists.
            saver = tf.train.Saver()
            ckpt = tf.train.get_checkpoint_state(os.path.dirname('checkpoints/style_transfer/checkpoint'))
            if ckpt and ckpt.model_checkpoint_path:
                saver.restore(sess, ckpt.model_checkpoint_path)
            ##############################

            initial_step = self.gstep.eval()
            
            start_time = time.time()
            for index in range(initial_step, n_iters):
                if index >= 5 and index < 20:
                    skip_step = 10
                elif index >= 20:
                    skip_step = 20
                
                sess.run(self.opt) 
                if (index + 1) % skip_step == 0:
                    ###############################
                    gen_image, total_loss, summary = sess.run([self.input_img,
                                                                self.total_loss,
                                                                self.summary_op])

                    ###############################
                    
                    # Need to add back the mean pixels we subtracted before
                    gen_image = gen_image + self.vgg.mean_pixels 
                    writer.add_summary(summary, global_step=index)
                    print('Step {}\n   Sum: {:5.1f}'.format(index + 1, np.sum(gen_image)))
                    print('   Loss: {:5.1f}'.format(total_loss))
                    print('   Took: {} seconds'.format(time.time() - start_time))
                    start_time = time.time()

                    filename = 'outputs/%d.png' % (index)
                    save_image(filename, gen_image)

                    if (index + 1) % 20 == 0:
                        ###############################
                        saver.save(sess, 'checkpoints/style_stranfer/style_transfer', index)
                        ###############################

if __name__ == '__main__':
    setup()
    machine = StyleTransfer('Varagina Banakar Aravind_DACS.jpg', 'the_scream_by_edvard_munch.jpg', 333, 250)
    machine.build()
    machine.train(300)

Pre-trained Object detection VGG-19 model exists already.
INFO:tensorflow:Summary name content loss is illegal; using content_loss instead.
INFO:tensorflow:Summary name style loss is illegal; using style_loss instead.
INFO:tensorflow:Summary name total loss is illegal; using total_loss instead.
Step 1
   Sum: 48500489.3
   Loss: 1650841600.0
   Took: 5.005255222320557 seconds
Step 2
   Sum: 48500152.9
   Loss: 1456552832.0
   Took: 0.17206311225891113 seconds


`imsave` is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
Use ``imageio.imwrite`` instead.


Step 3
   Sum: 48500820.1
   Loss: 1301022720.0
   Took: 0.16085600852966309 seconds
Step 4
   Sum: 48501166.9
   Loss: 1177465344.0
   Took: 0.15407800674438477 seconds
Step 5
   Sum: 48500080.9
   Loss: 1081616256.0
   Took: 0.15372467041015625 seconds
Step 10
   Sum: 48452387.5
   Loss: 793877568.0
   Took: 0.5451688766479492 seconds
Step 20
   Sum: 48181148.5
   Loss: 527253376.0
   Took: 1.02260160446167 seconds
Step 40
   Sum: 47489079.8
   Loss: 314573440.0
   Took: 2.4152352809906006 seconds
Step 60
   Sum: 46791165.3
   Loss: 224358688.0
   Took: 2.853074312210083 seconds
Step 80
   Sum: 46124431.6
   Loss: 175399696.0
   Took: 2.417937755584717 seconds
Step 100
   Sum: 45489640.2
   Loss: 145006096.0
   Took: 2.8651487827301025 seconds
Step 120
   Sum: 44888028.5
   Loss: 124051272.0
   Took: 2.489882469177246 seconds
Step 140
   Sum: 44314851.7
   Loss: 108262488.0
   Took: 2.3960471153259277 seconds
Step 160
   Sum: 43766735.7
   Loss: 95741416.0
   Took: 2.5008392333984375