# Widzenie maszynowe
## Laboratorium 1 - Neural Style Transfer
*Autor: Paweł Mendroch* - [Github](https://github.com/FrozenTear7/computer-vision-lab/tree/master/lab1)

Poniższy skrypt wykorzystuje [tutorial dla transferu stylu dla frameworku Keras](https://keras.io/examples/generative/neural_style_transfer/).

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.applications import vgg19

Przygotowuję przykładowe 5 obrazów, po jednym z każdego folderu do transferu stylu. Ścieżki relatywne do obrazów wrzucam do tablic, po których następnie będę iterował wykonując operację transferu stylu z obrazu rzeczywistego (z folderu real_train) na obraz syntetyczny.
Inicjuję również wagi strat komponentów.

In [None]:
start_index = 0
max_index = 5
base_image_paths = [
    "./hiragana/synthetic_8_frames_RGBA/01_a/01_a.000_Camera_0000.png",
    "./hiragana/synthetic_8_frames_RGBA/02_i/02_i.000_Camera_0000.png",
    "./hiragana/synthetic_8_frames_RGBA/03_u/03_u.000_Camera_0000.png",
    "./hiragana/synthetic_8_frames_RGBA/04_e/04_e.000_Camera_0000.png",
    "./hiragana/synthetic_8_frames_RGBA/05_o/05_o.000_Camera_0000.png",
]
style_reference_image_paths = [
    "./hiragana/real_train/01_a/real_train_01_a_0.png",
    "./hiragana/real_train/02_i/real_train_02_i_0.png",
    "./hiragana/real_train/03_u/real_train_03_u_0.png",
    "./hiragana/real_train/04_e/real_train_04_e_0.png",
    "./hiragana/real_train/05_o/real_train_05_o_0.png",
]
result_prefixes = ["01_a", "02_i", "03_u", "04_e", "05_o"]

# Weights of the different loss components
total_variation_weight = 1e-6
style_weight = 1e-6
content_weight = 2.5e-8

Inicjujemy funkcję preprocessingu obrazów przy pomocy vgg19, co załaduje i zformatuje obrazy do odpowiednich tensorów.

In [None]:
def preprocess_image(image_path):
    # Util function to open, resize and format pictures into appropriate tensors
    img = keras.preprocessing.image.load_img(
        image_path, target_size=(img_nrows, img_ncols)
    )
    img = keras.preprocessing.image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg19.preprocess_input(img)
    return tf.convert_to_tensor(img)

Inicjujemy również funkcję do deprocessingu obrazów, przetwarzającą tensory na rzeczywiste obrazy.

In [None]:
def deprocess_image(x):
        # Util function to convert a tensor into a valid image
        x = x.reshape((img_nrows, img_ncols, 3))
        # Remove zero-center by mean pixel
        x[:, :, 0] += 103.939
        x[:, :, 1] += 116.779
        x[:, :, 2] += 123.68
        # 'BGR'->'RGB'
        x = x[:, :, ::-1]
        x = np.clip(x, 0, 255).astype("uint8")
        return x

Macierze Grama odpowiadają za obliczanie wartości straty stylu, funkcja *style_loss* utrzymuje styl nakładanego obrazu w generowanym obrazie na podstawie macierzy Grama.
Funkcja *content_loss* pomaga utrzymać właściwości oryginalnego obrazu, a funkcja *total_variation_loss* pozwala utrzymać spójność w generowanym obrazie.

In [None]:
# The gram matrix of an image tensor (feature-wise outer product)
def gram_matrix(x):
    x = tf.transpose(x, (2, 0, 1))
    features = tf.reshape(x, (tf.shape(x)[0], -1))
    gram = tf.matmul(features, tf.transpose(features))
    return gram

# The "style loss" is designed to maintain
# the style of the reference image in the generated image.
# It is based on the gram matrices (which capture style) of
# feature maps from the style reference image
# and from the generated image

def style_loss(style, combination):
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3
    size = img_nrows * img_ncols
    return tf.reduce_sum(tf.square(S - C)) / (4.0 * (channels ** 2) * (size ** 2))

# An auxiliary loss function
# designed to maintain the "content" of the
# base image in the generated image

def content_loss(base, combination):
    return tf.reduce_sum(tf.square(combination - base))

# The 3rd loss function, total variation loss,
# designed to keep the generated image locally coherent

def total_variation_loss(x):
    a = tf.square(
        x[:, : img_nrows - 1, : img_ncols - 1, :] - x[:, 1:, : img_ncols - 1, :]
    )
    b = tf.square(
        x[:, : img_nrows - 1, : img_ncols - 1, :] - x[:, : img_nrows - 1, 1:, :]
    )
    return tf.reduce_sum(tf.pow(a + b, 1.25))

Następnie budujemy model i wyciągamy wartości aktywacji z kolejnych warstw w vgg19.

In [None]:
# Build a VGG19 model loaded with pre-trained ImageNet weights
model = vgg19.VGG19(weights="imagenet", include_top=False)

# Get the symbolic outputs of each "key" layer (we gave them unique names).
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])

# Set up a model that returns the activation values for every layer in
# VGG19 (as a dict).
feature_extractor = keras.Model(inputs=model.inputs, outputs=outputs_dict)

Poniżej obliczamy straty transferu stylu, przy pomocy warstw dla strat stylu i warstwy dla strat właściwości oryginalnego obrazu.

In [None]:
# List of layers to use for the style loss.
style_layer_names = [
    "block1_conv1",
    "block2_conv1",
    "block3_conv1",
    "block4_conv1",
    "block5_conv1",
]
# The layer to use for the content loss.
content_layer_name = "block5_conv2"

def compute_loss(combination_image, base_image, style_reference_image):
    input_tensor = tf.concat(
        [base_image, style_reference_image, combination_image], 0
    )
    features = feature_extractor(input_tensor)

    # Initialize the loss
    loss = tf.zeros(shape=())

    # Add content loss
    layer_features = features[content_layer_name]
    base_image_features = layer_features[0, :, :, :]
    combination_features = layer_features[2, :, :, :]
    loss = loss + content_weight * content_loss(
        base_image_features, combination_features
    )
    # Add style loss
    for layer_name in style_layer_names:
        layer_features = features[layer_name]
        style_reference_features = layer_features[1, :, :, :]
        combination_features = layer_features[2, :, :, :]
        sl = style_loss(style_reference_features, combination_features)
        loss += (style_weight / len(style_layer_names)) * sl

    # Add total variation loss
    loss += total_variation_weight * total_variation_loss(combination_image)
    return loss

Inicjujemy funkcję tensorflow do skompilowania i przyspieszenia działania, a następnie wykonujemy trening w przykładowo 2000 kroków, zapisując rezultaty po 100 kolejnych iteracjach i obserwujemy zmiany w stratach.

In [None]:
@tf.function
def compute_loss_and_grads(combination_image, base_image, style_reference_image):
    with tf.GradientTape() as tape:
        loss = compute_loss(combination_image, base_image, style_reference_image)
    grads = tape.gradient(loss, combination_image)
    return loss, grads

optimizer = keras.optimizers.SGD(
    keras.optimizers.schedules.ExponentialDecay(
        initial_learning_rate=100.0, decay_steps=100, decay_rate=0.96
    )
)

for i in range(start_index, max_index):
    base_image_path = base_image_paths[i]
    style_reference_image_path = style_reference_image_paths[i]
    result_prefix = result_prefixes[i]
    print(i)
    print(base_image_path)
    print(style_reference_image_path)
    print(result_prefix)

    # Dimensions of the generated picture.
    width, height = keras.preprocessing.image.load_img(base_image_path).size
    img_nrows = 400
    img_ncols = int(width * img_nrows / height)

    base_image = preprocess_image(base_image_path)
    style_reference_image = preprocess_image(style_reference_image_path)
    combination_image = tf.Variable(preprocess_image(base_image_path))

    iterations = 2000
    for i in range(1, iterations + 1):
        loss, grads = compute_loss_and_grads(
            combination_image, base_image, style_reference_image
        )
        optimizer.apply_gradients([(grads, combination_image)])
        if i % 100 == 0:
            print("Iteration %d: loss=%.2f" % (i, loss))
            img = deprocess_image(combination_image.numpy())
            fname = result_prefix + "_at_iteration_%d.png" % i
            keras.preprocessing.image.save_img(fname, img)

## Wnioski
Obrazy wynikowe załączone poniżej, po wykonaniu 2000 iteracji dla każdego przykładu odstają od przykładów podanych w publikacji, lecz styl mimo wszystko został przeniesiony poprawnie, jedynie problemem pozostaje jakość, czy raczej występujące na obrazach artefakty graficzne.
Możliwe, że zostało to spowodowane rozmiarem i jakością obrazów bazowych lub użytych parametrów.

![01 result](./results/01_result.png "01 result")
![02 result](./results/02_result.png "02 result")
![03 result](./results/03_result.png "03 result")
![04 result](./results/04_result.png "04 result")
![05 result](./results/05_result.png "05 result")

Niestety poszukiwanie najlepiej dostosowanych parametrów zajęłoby sporo czasu, końcowo zatem pozostaję przy parametrach dobranych w tutorialu dla Kerasa.
Poniżej przedstawiam progres wartości strat dla kolejnych iteracji dla 5 obrazów.

```
01_a
Iteration 100: loss=3221.62
Iteration 200: loss=1280.79
Iteration 300: loss=3838.59
Iteration 400: loss=1145.22
Iteration 500: loss=2216.15
Iteration 600: loss=1193.34
Iteration 700: loss=803.61
Iteration 800: loss=838.52
Iteration 900: loss=1271.66
Iteration 1000: loss=731.16
Iteration 1100: loss=584.26
Iteration 1200: loss=735.74
Iteration 1300: loss=443.89
Iteration 1400: loss=320.22
Iteration 1500: loss=297.92
Iteration 1600: loss=284.94
Iteration 1700: loss=276.19
Iteration 1800: loss=269.70
Iteration 1900: loss=264.57
Iteration 2000: loss=260.37
```

```
02_i
Iteration 100: loss=594.70
Iteration 200: loss=397.21
Iteration 300: loss=318.43
Iteration 400: loss=277.26
Iteration 500: loss=251.60
Iteration 600: loss=233.86
Iteration 700: loss=220.97
Iteration 800: loss=210.98
Iteration 900: loss=203.14
Iteration 1000: loss=196.88
Iteration 1100: loss=191.67
Iteration 1200: loss=187.29
Iteration 1300: loss=183.52
Iteration 1400: loss=180.28
Iteration 1500: loss=177.44
Iteration 1600: loss=174.95
Iteration 1700: loss=172.72
Iteration 1800: loss=170.70
Iteration 1900: loss=168.88
Iteration 2000: loss=167.25
```

```
03_u
Iteration 100: loss=1472.73
Iteration 200: loss=1070.47
Iteration 300: loss=884.30
Iteration 400: loss=774.16
Iteration 500: loss=700.18
Iteration 600: loss=648.66
Iteration 700: loss=610.49
Iteration 800: loss=580.96
Iteration 900: loss=557.37
Iteration 1000: loss=538.10
Iteration 1100: loss=521.87
Iteration 1200: loss=507.90
Iteration 1300: loss=495.84
Iteration 1400: loss=485.35
Iteration 1500: loss=475.98
Iteration 1600: loss=467.68
Iteration 1700: loss=460.16
Iteration 1800: loss=453.39
Iteration 1900: loss=447.36
Iteration 2000: loss=441.90
```

```
04_e
Iteration 100: loss=1645.54
Iteration 200: loss=1174.97
Iteration 300: loss=943.71
Iteration 400: loss=801.27
Iteration 500: loss=705.13
Iteration 600: loss=636.09
Iteration 700: loss=583.90
Iteration 800: loss=543.57
Iteration 900: loss=511.61
Iteration 1000: loss=485.61
Iteration 1100: loss=464.11
Iteration 1200: loss=446.05
Iteration 1300: loss=430.73
Iteration 1400: loss=417.58
Iteration 1500: loss=406.10
Iteration 1600: loss=396.08
Iteration 1700: loss=387.22
Iteration 1800: loss=379.33
Iteration 1900: loss=372.28
Iteration 2000: loss=365.94
```

```
05_o
Iteration 100: loss=3021.39
Iteration 200: loss=2339.65
Iteration 300: loss=1964.92
Iteration 400: loss=1717.22
Iteration 500: loss=1538.35
Iteration 600: loss=1402.04
Iteration 700: loss=1293.87
Iteration 800: loss=1206.12
Iteration 900: loss=1133.90
Iteration 1000: loss=1073.58
Iteration 1100: loss=1022.29
Iteration 1200: loss=978.60
Iteration 1300: loss=940.67
Iteration 1400: loss=907.26
Iteration 1500: loss=877.57
Iteration 1600: loss=851.00
Iteration 1700: loss=827.21
Iteration 1800: loss=805.83
Iteration 1900: loss=786.52
Iteration 2000: loss=769.02
```

Mimo porównywalnie lepszego wyniku strat dla obrazu pierwszego w porównaniu do trzeciego, obraz trzeci wygląda lepiej pod względem zaburzeń obrazowych, możemy stwierdzić, że wynik strat niekoniecznie przekłada się w tym aspekcie jakościowym.
Pomijając lekkie zaburzenia, można mimo wszystko stwierdzić, że transfer stylu wyszedł w większości poprawnie, zgodnie z zamierzeniami.