***Copyright 2019 Pätzold, Menzel, Zacharias.***


In [None]:
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.

# **Neural Algorithm of Artistic Style**

Here we implement the network which enables us to combine two separate images,  
namely one image providing the content and the other one adding the style to this content image.  
For this purpose, we use a convolutional net - the VGG19 - and build a new feature space on top to reconstruct the style of the image.
<br />
The code partially uses code and structure from the TensorFlow tutorial: Neural Style Transfer (2018) and the ANN is replicated after the study from Gatys et al. (2015).

***Basic idea:***

Two images are input to the neural network: A content-image and a style-image. We wish to generate the mixed-image which has the contours of the content-image and the colours and texture of the style-image. We do this by creating several loss-functions that can be optimized.

The loss-function for the content-image tries to minimize the difference between the features that are activated for the content-image and for the mixed-image, at one or more layers in the network. This causes the contours of the mixed-image to resemble those of the content-image.

The loss-function for the style-image is slightly more complicated, because it instead tries to minimize the difference between the so-called Gram-matrices for the style-image and the mixed-image. This is done at one or more layers in the network. The Gram-matrix measures which features are activated simultaneously in a given layer. Changing the mixed-image so that it mimics the activation patterns of the style-image causes the colour and texture to be transferred.

We use TensorFlow to automatically derive the gradient for these loss-functions. The gradient is then used to update the mixed-image. This procedure is repeated a number of times until we are satisfied with the resulting image.

*Import all necessary packages.*

In [None]:
%tensorflow_version 2.x
import tensorflow as tf
%matplotlib inline
import matplotlib.pyplot as plt
import PIL.Image
import numpy as np

TensorFlow 2.x selected.


### *Image manipulation*
In order to load new input images, save mixed (output) images with transferred style and to plot the results during the transformation process, we define some important convenience functions.

*Loading an image.*

In [None]:
def load_image(filename, filepath):

  """
  filename: The name of the image (behind backslash, before file type)
  filepath: The URL path of the image.
  """

  # Read in the image.
  path = tf.keras.utils.get_file(filename, filepath)
  img = tf.io.read_file(path)
  # Preprocess image: decode and convert to float.
  img = tf.image.decode_image(img, channels=3)
  img = tf.image.convert_image_dtype(img, tf.float32)
  # Scale the image if any dimension is larger than 512 pixels.
  max_dim = 512
  shape = tf.cast(tf.shape(img)[:-1], tf.float32)
  long_dim = max(shape)
  scale = max_dim / long_dim
  
  new_shape = tf.cast(shape * scale, tf.int32)
  img = tf.image.resize(img, new_shape)
  img = img[tf.newaxis, :]
  return img

*Transform tensor into image.*

In [None]:
def tensor_to_image(tensor):

  """
  tensor: Tensor input to be converted to an image.
  """

  # To get the image we turn the tensor into an uint8 array in [0,255].
  tensor = tensor*255
  tensor = np.array(tensor, dtype=np.uint8)
  # The image must have no more than 3 dimensions.
  if np.ndim(tensor)>3:
    assert tensor.shape[0] == 1
    tensor = tensor[0]
  return PIL.Image.fromarray(tensor)

*Plot content and style image.*

In [None]:
def plot_images(content_image, style_image, mixed_image):

  """
  content_image: The image that is providing the contours for the mixed image.
  style_image: The image that is providing colour and structure for the mixed image.
  mixed_image: The resulting mixed image.
  """

  # Create figure with sub-plots.
  fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15, 15))
  images = [content_image, style_image, mixed_image]
  labels = ["content", "style", "mixed"]

  # Label and plot every image.
  for i in range(3):
    img = images[i]
    lbl = labels[i]
    # If shape of the image is too big it cannot be plotted.
    if len(img.shape) > 3:
      img = tf.squeeze(img, axis=0)
    img = tf.image.resize(img, (512,512))
    ax[i].imshow(img, interpolation='sinc')
    ax[i].set_title(lbl)
    ax[i].axis("off")
  plt.show()

*Save an image.*

In [None]:
def save_image(image, filename='mixed_image.jpeg'):

  """
  image: The image to be saved.
  filename: The name you want to save your image with.
  """

  # Add .jpeg (in default filename) and save in colab files
  image.save(filename)

*Load the data*

In [None]:
# Load a content and a style image.
content_image = load_image('YellowLabradorLooking_new','https://upload.wikimedia.org/wikipedia/commons/2/26/YellowLabradorLooking_new.jpg')
style_image = load_image('04-Pablo-Picasso-Head-of-Woman-Fernande-1909-56a03c7f5f9b58eba4af7614', 'https://www.thoughtco.com/thmb/DUsvOYMGSDG1LjjME0TfV3XgJO4=/2178x1800/filters:no_upscale():max_bytes(150000):strip_icc()/04-Pablo-Picasso-Head-of-Woman-Fernande-1909-56a03c7f5f9b58eba4af7614.jpg')

### *Build the model*
Here we implement the main model which is used for the optimization process of the mixed image.

In [None]:
class StyleContentModel(tf.keras.models.Model):
  def __init__(self, style_layers, content_layers):
    super(StyleContentModel, self).__init__()
    self.vgg = vgg_layers(style_layers + content_layers)
    self.style_layers = style_layers
    self.content_layers = content_layers
    self.num_style_layers = len(style_layers)
    # Set the model as non-trainable.
    self.vgg.trainable = False

  # Compute forward step.
  def call(self, inputs):

    """ Expects float input in [0,1]. """

    inputs = inputs*255.0

    # Get the outputs of a forward step and save in output lists.
    preprocessed_input = tf.keras.applications.vgg19.preprocess_input(inputs)
    outputs = self.vgg(preprocessed_input)
    style_outputs, content_outputs = (outputs[:self.num_style_layers], 
                                      outputs[self.num_style_layers:])
    
    # For the style outputs we transform each element into a gram matrix.
    style_outputs = [gram_matrix(style_output)
                     for style_output in style_outputs]

    # Create feed dictionaries for content and style.
    content_dict = {content_name:value 
                    for content_name, value 
                    in zip(self.content_layers, content_outputs)}
    style_dict = {style_name:value
                  for style_name, value
                  in zip(self.style_layers, style_outputs)}
    
    return {'content':content_dict, 'style':style_dict}

The style of an image can be described by the means and correlations across the different feature maps. We calculate a Gram matrix that includes this information by taking the outer product of the feature vector with itself at each location, and averaging that outer product over all locations.

*Transform tensors for the style layers into gram matrices.*

In [None]:
def gram_matrix(tensor):

  """
  tensor: Tensor input to be transformed into a gram matrix.
  """
  
  # Get the tensor's shape.
  shape = tf.shape(tensor)
  # Get the number of feature channels for the input tensor,
  # which is assumed to be from a convolutional layer with 4-dim.
  num_channels = int(shape[3])
  # Reshape the tensor so it is a 2-dim matrix. This essentially
  # flattens the contents of each feature-channel.
  matrix = tf.reshape(tensor, shape=[-1, num_channels])  
  # Calculate the Gram-matrix as the matrix-product of
  # the 2-dim matrix with itself. This calculates the
  # dot-products of all combinations of the feature-channels.
  gram = tf.matmul(tf.transpose(matrix), matrix)

  return gram

*Calculate intermediate layer outputs.*

In [None]:
def vgg_layers(layer_names):

  """ 
  Creates a vgg model that returns a list of intermediate output values.
  layer_names: The names of the layer for which we compute the outputs.
  """
  
  # Load our model. Load pretrained VGG, trained on imagenet data.
  vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
  vgg.trainable = False
  # Compute the output for each layer.
  outputs = [vgg.get_layer(name).output for name in layer_names]
  # Define a model using functional API.
  model = tf.keras.Model([vgg.input], outputs)

  return model

In order to get the contour features of the content image and the colour and texture of the style image, we need to calculate different losses for each layer. This function calculates the mean squared error between the feature activations of content and mixed image for the chosen layers. If the loss between these two images would be zero, the content image would have perfectly transferred the contour features onto the resulting mixed image.

*Content and style loss.*

In [None]:
# Define content and style layers from which we pull feature maps.
content_layers = ['block5_conv2']
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1', 
                'block4_conv1', 
                'block5_conv1']
num_content_layers = len(content_layers)
num_style_layers = len(style_layers)

def style_content_loss(outputs, style_weight=1e-2, content_weight=1e4):

  """
  Calculates the loss between targets and layer outputs.
  outputs: The list with outputs for each layer.
  style_weight: The weight of the style image.
  content_weight: The weight of the content image.
  """

  # Get style and content outputs.
  style_outputs = outputs['style']
  content_outputs = outputs['content']
  # Calculate weighted style loss by averaging over all style losses.
  style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_targets[name])**2) 
                         for name in style_outputs.keys()])
  style_loss *= style_weight / num_style_layers
  # Calculate weighted content loss by averaging over all content losses.
  content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_targets[name])**2) 
                           for name in content_outputs.keys()])
  content_loss *= content_weight / num_content_layers
  # Calculate total loss.
  loss = style_loss + content_loss
  
  return loss

### *Optimization*

*Initialization and instantiation.*

In [None]:
# Initialize Adam optimizer.
opt = tf.keras.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)

# Instantiate the model/extractor.
extractor = StyleContentModel(style_layers, content_layers)

# Targets.
style_targets = extractor(style_image)['style']
content_targets = extractor(content_image)['content']

# Input of same shape as content_image.
mixed_image = tf.Variable(content_image)

*Update function.*

In [None]:
@tf.function()
def train_step(image, total_variation_weight=30):

  """
  Computes one training step.
  image: The to be optimized mixed image.
  total_variation_weight: Weight of the total variation loss to reduce high frequency artifacts.
  """

  # Start a gradient type for gradient descent.
  with tf.GradientTape() as tape:
    # Get outputs and losses.
    outputs = extractor(image)
    loss = style_content_loss(outputs)
    # Total Variation denoising:
    # Average of differences in images when they are shifted by a pixel on x- and y-axis.
    loss += total_variation_weight*tf.image.total_variation(image)

  # Apply gradients and optimize the image.
  grad = tape.gradient(loss, image)
  opt.apply_gradients([(grad, image)])
  # Make sure the image values stay in [0,1].
  image.assign(tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0))

*Main function to transfer style of an image.*

In [None]:
def transfer_style(image=mixed_image, steps=500):

  """
  The main function to transfer the style onto the content image.
  image: The resulting mixed image which gets optimized.
  steps: The number of training steps. At least 500 is recommended.
  """

  # Initialize step counter.
  i = 0
  # Perform 'steps' training steps and plot intermediate results.
  while i < steps:
    train_step(image)
    i += 1
    if i%10 == 0 or i==1:
      if i%50 == 0 or i<=50:
        print()
        print()
        print("Iterations: ", i)
        plot_images(content_image, style_image, image)

  print()
  print()
  print("Resulting mixed image after ", i, " iteration(s):")

  # Save the resulting mixed image to colab files.
  save_image(tensor_to_image(image))

  # Plot the resulting mixed image.
  plt.figure(figsize=(7,7))
  if len(image.shape) > 3:
    image = tf.squeeze(image, axis=0)
  plt.imshow(image)
  plt.axis("off")
  plt.title("mixed")

### *Example*

In [None]:
# Hint: Run on GPU (Runtime > Change Runtime Type > Hardware Accelerator > GPU) for faster style transfer!
transfer_style(steps=1000)

# You can find a downloaded version of the 
# resulting mixed image in your colab files folder.

Output hidden; open in https://colab.research.google.com to view.

### References
Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576. 
<br />
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Mike Schuster, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: Neural Style Transfer, (2018). Software available from tensorflow.org.