neural style transfer consists of applying the style of a reference image to a target image while conserving the content of the target image

![A Style Transfer Example](./style_trans.png)

the key notion behind implement style transfer is the same idea that's central to all deep-learning algorithms: define a loss function to specify what to achieve, and minimize this loss: conserving the content of the original image while adopting the style of the reference image

mathematrically for [content] and [style], an appropriate loss function to minimize:

loss = distance(style(reference_image) - style(generated_iamge)) + distance(content(origial_image) - content(generated_image))

### The Content Loss

activations from earlier layers in a network contain local informaiton about the image, whereas activations from higher layers contain increasingly global, abstract information; the activations of the different layers of a convnet provide a decomposition of the contents of an image over different spatial scales; the content of an image, which is more global and abstract. to be captured by the representations of the upper layers in a convnet

a good candidate for content loss is thus the L2 norm between the activations of an upper layer in a pretrained convnet, computed over the target image, and the activations of teh same layer computed over the generated image; this guarantees that the generated image will look similar to the original target image; assmuing taht the upper layers of a convnet see is really the content of their input images, then this works as a wy to preserve image content 

### The Style Loss

capture the apperance of the sylte reference image at all spatial scales extracted by the convnet, not just a single scale; the inner product of the feature maps of a given layer; this inner product can be understaood as representing a map of the correlations between the layer's features; these featrue correlations capture the statistics of the patterns of a particular spatial scale, which empirically correspond to the apperance of the extures found at this scale

the style loss aims to preserve similar internal correaltions within the activations of different layers, across the style-reference image and the generated image ; this guarantees that the textures found at different spatial scales look similar across the style-reference image and the generated image

in short, use a pretrained convnet to define a loss that will do the following:
- preserve content by maintaining similar high-level layer activations between the target content image and the genenrated image as containing the same things
- preserve style by maintaining similar correlations within activations for both low level layers and high-level layers; feature correlations capture textures: the generated image and teh style-reference image should share the same textures at different spatial scales

### Neural Style Transfer in Keras

neural style transfer can be implemented using any pretrained convnet; VGG19 is a simple variant of the VGG16 network, with three more convolutional layers; the general process:
- set up a network that computes VGG19 layer activations for the style-reference image, the target image, and the generated image at the same time
- use the layer activations computed over these three images to define the loss function described earlier, which minimize in order to achieve tyle transfer 
- set up a gradient-descent process to minimize the loss function

In [1]:
# getting the style and content images
from tensorflow import keras

base_image_path = keras.utils.get_file(
    "sf.jpg", origin="https://img-datasets.s3.amazonaws.com/sf.jpg")
style_reference_image_path = keras.utils.get_file(
    "starry_night.jpg", origin="https://img-datasets.s3.amazonaws.com/starry_night.jpg")

original_width, original_height = keras.utils.load_img(base_image_path).size
img_height = 400
img_width = round(original_width * img_height / original_height)

Downloading data from https://img-datasets.s3.amazonaws.com/sf.jpg
Downloading data from https://img-datasets.s3.amazonaws.com/starry_night.jpg


In [2]:
# auxiliary function
import numpy as np

def preprocess_image(image_path):
    img = keras.utils.load_img(
        image_path, target_size=(img_height, img_width))
    img = keras.utils.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = keras.applications.vgg19.preprocess_input(img)
    return img

def deprocess_image(img):
    img = img.reshape((img_height, img_width, 3))
    img[:, :, 0] += 103.939
    img[:, :, 1] += 116.779
    img[:, :, 2] += 123.68
    img = img[:, :, ::-1]
    img = np.clip(img, 0, 255).astype("uint8")
    return img

In [3]:
# using a pretrained VGG19 model to create a feature extractor
model = keras.applications.vgg19.VGG19(weights="imagenet", include_top=False)

outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])
feature_extractor = keras.Model(inputs=model.inputs, outputs=outputs_dict)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5


In [4]:
# content loss
def content_loss(base_img, combination_img):
    return tf.reduce_sum(tf.square(combination_img - base_img))

In [5]:
# style loss
def gram_matrix(x):
    x = tf.transpose(x, (2, 0, 1))
    features = tf.reshape(x, (tf.shape(x)[0], -1))
    gram = tf.matmul(features, tf.transpose(features))
    return gram

def style_loss(style_img, combination_img):
    S = gram_matrix(style_img)
    C = gram_matrix(combination_img)
    channels = 3
    size = img_height * img_width
    return tf.reduce_sum(tf.square(S - C)) / (4.0 * (channels ** 2) * (size ** 2))

In [6]:
# total variation loss
def total_variation_loss(x):
    a = tf.square(
        x[:, : img_height - 1, : img_width - 1, :] - x[:, 1:, : img_width - 1, :]
    )
    b = tf.square(
        x[:, : img_height - 1, : img_width - 1, :] - x[:, : img_height - 1, 1:, :]
    )
    return tf.reduce_sum(tf.pow(a + b, 1.25))

In [7]:
# defining the final loss minimized

style_layer_names = [
    "block1_conv1",
    "block2_conv1",
    "block3_conv1",
    "block4_conv1",
    "block5_conv1",
]
content_layer_name = "block5_conv2"
total_variation_weight = 1e-6
style_weight = 1e-6
content_weight = 2.5e-8

def compute_loss(combination_image, base_image, style_reference_image):
    input_tensor = tf.concat(
        [base_image, style_reference_image, combination_image], axis=0
    )
    features = feature_extractor(input_tensor)
    loss = tf.zeros(shape=())
    layer_features = features[content_layer_name]
    base_image_features = layer_features[0, :, :, :]
    combination_features = layer_features[2, :, :, :]
    loss = loss + content_weight * content_loss(
        base_image_features, combination_features
    )
    for layer_name in style_layer_names:
        layer_features = features[layer_name]
        style_reference_features = layer_features[1, :, :, :]
        combination_features = layer_features[2, :, :, :]
        style_loss_value = style_loss(
          style_reference_features, combination_features)
        loss += (style_weight / len(style_layer_names)) * style_loss_value

    loss += total_variation_weight * total_variation_loss(combination_image)
    return loss

In [None]:
# setting up the gradient-descent process
import tensorflow as tf

@tf.function
def compute_loss_and_grads(combination_image, base_image, style_reference_image):
    with tf.GradientTape() as tape:
        loss = compute_loss(combination_image, base_image, style_reference_image)
    grads = tape.gradient(loss, combination_image)
    return loss, grads

optimizer = keras.optimizers.SGD(
    keras.optimizers.schedules.ExponentialDecay(
        initial_learning_rate=100.0, decay_steps=100, decay_rate=0.96
    )
)

base_image = preprocess_image(base_image_path)
style_reference_image = preprocess_image(style_reference_image_path)
combination_image = tf.Variable(preprocess_image(base_image_path))

iterations = 4000
for i in range(1, iterations + 1):
    loss, grads = compute_loss_and_grads(
        combination_image, base_image, style_reference_image
    )
    optimizer.apply_gradients([(grads, combination_image)])
    if i % 100 == 0:
        print(f"Iteration {i}: loss={loss:.2f}")
        img = deprocess_image(combination_image.numpy())
        fname = f"combination_image_at_iteration_{i}.png"
        keras.utils.save_img(fname, img)