### Intro ###

Neural style transfer is an algorithm that given a reference style image will make another image to take the visual appearance of the reference image. Ever wandered how your selfie would look like if painted by Van Gogh or Picasso?

### Modules ###

I'd need the `sys` module to get the executable path and install some additional modules, the `time` one to time different events, `tensorflow` cause this is deep learning and we need tensors, `cv2` to process images and `numpy` to operate with matrices.

In [None]:
import sys
import time

import tensorflow as tf
import cv2
import numpy as np

Two more required modules are the `tensorflow-hub` module which will allow us to load already trained (pretrained) machine learning models and `tflite-runtime` for the same purpose, but for models that are a little more optimized to run on computers with little or no resources, like the Raspberry Pi is.

The next command just calls the `python` interpreted associated with the current environment, calls the pip module with `-m pip` and asks it to install the `tensorflow-hub` and `tflite-runtime` modules.

In [None]:
!{sys.executable} -m pip install tensorflow-hub tflite-runtime

After the installation is successful, let me load the `tensorflow_hub` module with a simpler name.

In [None]:
import tensorflow_hub as hub

### Models ###

The neural style transfer model that I'll use is the `arbistrary-image-stylization` model available on [tensorflow hub](https://www.tensorflow.org/tutorials/generative/style_transfer). As tensorflow is a dying technology, I suggest you download the model and load it from Raspberry Pi's storage. The following box will download the model into the current working directory. Line by line, the code actually:
- `wget URL -O path` will download the file from `URL` to `path` due to `-O` switch;
- `mkdir -p path` will create a folder at `path`, including all its parents due to `-p` switch;
- `tar -xzf archive_path -C path` will extract the content from the `tar.gz` archive with the name `archive_path` (if the type of archive is `tar.bz2` then the switch changes to `-xjf`), and copy the content to the `path` folder due to `-C` switch;

In [None]:
!wget "https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2?tf-hub-format=compressed" -O  magenta-arbitrary-image-stylization-v1-256.tar.gz
!mkdir -p magenta-arbitrary-image-stylization-v1-256
!tar -xzf magenta-arbitrary-image-stylization-v1-256.tar.gz -C magenta-arbitrary-image-stylization-v1-256

Next, I just load the model from the newly created folder:

In [None]:
model = hub.load('magenta-arbitrary-image-stylization-v1-256')

Now that the model is taken care of, I can just download two more models that will actually be doing the same thing. I'll skip the explanations for now, but return further in this notedbook.

In [None]:
!wget https://tfhub.dev/google/lite-model/magenta/arbitrary-image-stylization-v1-256/int8/prediction/1?lite-format=tflite -O arbitrary-image-stylization-v1-256-prediction-int8.tflite
!wget https://tfhub.dev/google/lite-model/magenta/arbitrary-image-stylization-v1-256/int8/transfer/1?lite-format=tflite -O arbitrary-image-stylization-v1-256-transfer-int8.tflite

### Style Image ###

Let me download a style image.
How does _Starry Night_ by Van Gogh sound like? Here's the Wikipedia [url](https://upload.wikimedia.org/wikipedia/commons/e/ea/Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg).

Or maybe the _Guitariste, La mandoliniste_ by Picasso? Here's the Wikipedia [url](https://upload.wikimedia.org/wikipedia/en/c/ca/Pablo_Picasso%2C_1910-11%2C_Guitariste%2C_La_mandoliniste%2C_Woman_playing_guitar%2C_oil_on_canvas.jpg)

Or maybe the _The Kiss_, from Klimt? Here's the Wikipedia [url](https://upload.wikimedia.org/wikipedia/commons/4/40/The_Kiss_-_Gustav_Klimt_-_Google_Cultural_Institute.jpg).

As I'm undecided, I'll download all of them:

In [None]:
!wget https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg/1280px-Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg -O starry_night_van_gogh.jpg
!wget https://upload.wikimedia.org/wikipedia/commons/thumb/4/40/The_Kiss_-_Gustav_Klimt_-_Google_Cultural_Institute.jpg/1024px-The_Kiss_-_Gustav_Klimt_-_Google_Cultural_Institute.jpg -O the_kiss_klimt.jpg
!wget https://upload.wikimedia.org/wikipedia/en/c/ca/Pablo_Picasso%2C_1910-11%2C_Guitariste%2C_La_mandoliniste%2C_Woman_playing_guitar%2C_oil_on_canvas.jpg -O guitariste_picasso.jpg

The next step is to prepare the style image so it's a floating point image, with pixels between 0 and 1 (by default, `cv2.imread` will read the image as having integer 8-bit pixels, with values between 0 and 255) by converting to float with `astype(np.float32)` and dividing afterwards with `255.0`.

There's a trick there as we need a tensor with 4 dimensions and an image has only 3: using the np.newaxis on the desired position, will add a new dimension to said matrix. So, A[np.newaxis, :] will create a matrix with shape (1, ?) from an original A matrix of shape (?, ). There's another operator used which is called `ellipsis` - the triple dots `...`. In this situation, it means _take all from all other positions_. So A[np.newaxis, ...] is the same as A[np.newaxis, :, :, :] if A is of shape (?, ?, ?). The `tf.image.resize` is very similar to `cv2.resize`, but works well on the 4-dimensional tensor with the image.

The style image needs to be resized to 256 by 256 pixels as this is the size of image used to train the neural style transfer model.

In [None]:
style_img = cv2.imread('starry_night_van_gogh.jpg')
style_img = style_img.astype(np.float32)[np.newaxis, ...] / 255.0
style_img = tf.image.resize(style_img, (256, 256))

Let me take a look at the resulting image and for that I'll use `matplotlib.pyplot`.

In [None]:
import matplotlib.pyplot as plt

New things that I use in the following box are:
- using `plt.subplots` to actually just set the size of the output graph; the method returns a figure and an axis object; the axis object is subscribable (so if more subgraphs need to be plotted on the same image, they can be easily referenced by number), and also it behaves exactly like the `plt` object;
- used the `::-1` on the last dimension, which represent the color channels, and as I've previously mentioned, are usually represented in the BGR format; using `::-1` will just change the order of the channels from BGR to RGB;
- again the `...` ellipsis which this time replaces the two remaining arguments;

In [None]:
fig, ax = plt.subplots(figsize = (6, 6))
ax.imshow(style_img[0, ..., ::-1])

### Styling ###

First, I'll capture an image from the attached camera using `cv2.VideoCapture` to create a camera object.

In [None]:
camera = cv2.VideoCapture(0)

Because I want to time everything, I will use the `time.time` to record the start and end time. The current camera image is retrieved using the `read` method on the previously defined camera object. The image is converted further to `float` and resize. There is no need for resizing, as the model works with arbistrary sized images, but this will make it run around 4 times faster. A tensorflow hub model will be invoked as a call on the input data, which in this case is the image that needs to be stylized and the image that will provide the style. The output image is the first output of the model.

In [None]:
begin_time = time.time()
_, current_img = camera.read()
current_img = current_img.astype(np.float32)[np.newaxis, ...] / 255
current_img = tf.image.resize(current_img, (256, 256))
outputs = model(tf.constant(current_img), tf.constant(style_img))
stylized_image = outputs[0]
end_time = time.time()

Remember to release the camera so you can use it in other scripts as well.

In [None]:
camera.release()

### Results ###

Checking the time it took for the inference:

In [None]:
print(end_time - begin_time)

Pretty long time as the algorithm is quite complex, as discussed previously. Now, to see the results I'm using the full power of `plt.subplots` by providing the first and second arguments which are actually the number of rows and the number of columns, respectively, in which subplots will be organized. In this situation, the axis object becomes indexed so I can access various subplot independently.

As my camera is upside down, I used `::-1` on the second and third position of the tensor slice for the output stylized image, as this will actually change the order of the pixels on both the vertical and the horizontal axis.

In [None]:
fig, ax = plt.subplots(1, 2, figsize = (12, 6))
_ = ax[0].imshow(style_img[0, ..., ::-1])
_ = ax[0].set_title('Style Image')
_ = ax[1].imshow(stylized_image[0, ::-1, ::-1, ::-1])
_ = ax[1].set_title('Processed Image')

Looks pretty good, right?

### Improvements ###

But still, on pretty small images, the algorithm is very slow. To speed it up, I can actually divide it into two parts: a part that will extract the style data and one that will apply the extracted style data onto a new image. Tensorflow Hub gives two versions of the two parts of the neural style transfer model. I'll use the int8 version as that's more suited for CPU usage, while the float16 version is built with graphics cards in mind.

The prediction model is the one that converts the style image to a style data embedding, like the face embeddings that I was discussing in the previous notebook. Tensorflow Lite is actually a runtime that takes pretrained models and allows inference from them. After loading the model, the tensors memory needs to be allocated and with the `get_input_details` and `get_output_details` methods, the information about the portion of memory allocated for each tensor is extracted (under the `index` key).

In [None]:
pred_model = tf.lite.Interpreter(model_path="arbitrary-image-stylization-v1-256-prediction-int8.tflite")
pred_model.allocate_tensors()

pred_input_details = pred_model.get_input_details()
pred_output_details = pred_model.get_output_details()

Let me check how the input and the output looks like:

In [None]:
pred_input_details

In [None]:
pred_output_details

For the style image, the processing is similar: load image, convert it to `np.float32` and limit the values to the interval between 0 and 1. The size required is specified in the `pred_input_details` as 256 by 256, same as the previous model.

In [None]:
style_img = cv2.imread('guitariste_picasso.jpg')
style_img = style_img.astype(np.float32)[np.newaxis, ...] / 255
style_img = tf.image.resize(style_img, (256, 256))

To do inference, the memory of the previously allocated tensor needs to be set.

In [None]:
pred_model.set_tensor(pred_input_details[0]['index'], style_img)

Then the model needs to be invoked.

In [None]:
pred_model.invoke()

After which, the tensor data corresponding to the output index can be extracted.

In [None]:
style_data = pred_model.get_tensor(pred_output_details[0]['index'])

### Reusing Style Data ###

As usually the style data needs to be loaded only once, the Tensorflow Lite allows reusing blocks of memory allocated to tensors. I'll load the transfer model next and extract the input and output details, as previously done:

In [None]:
tran_model = tf.lite.Interpreter(model_path="arbitrary-image-stylization-v1-256-transfer-int8.tflite")
tran_model.allocate_tensors()

tran_input_details = tran_model.get_input_details()
tran_output_details = tran_model.get_output_details()

Let me check the inputs and the outputs:

In [None]:
tran_input_details

In [None]:
tran_output_details

As the input for the transfer model is the style data, this can be set only once and reused for converting multiple images afterwards:

In [None]:
tran_model.set_tensor(tran_input_details[1]['index'], style_data)

Then, I can just capture some camera data initializing a new `cv2.VideoCapture` object.

In [None]:
camera = cv2.VideoCapture(0)

The only difference from the previously used code is that I resize the image to 384 by 384 as the input details suggest. The rest is a combination of the first full model and of the invocation from the prediction model.

In [None]:
begin_time = time.time()
_, current_img = camera.read()
current_img = current_img.astype(np.float32)[np.newaxis, ...] / 255
current_img = tf.image.resize(current_img, (384, 384))
tran_model.set_tensor(tran_input_details[0]['index'], current_img)
tran_model.invoke()
stylized_image = tran_model.get_tensor(tran_output_details[0]['index'])
end_time = time.time()

I should release the camera so it can be used by others as well.

In [None]:
camera.release()

### Faster Results ###

The inference time has decrease almost 3 times.

In [None]:
print(end_time - begin_time)

And the results look kind of the same.

In [None]:
fig, ax = plt.subplots(1, 2, figsize = (12, 6))
_ = ax[0].imshow(style_img[0, ..., ::-1])
_ = ax[0].set_title('Style Image')
_ = ax[1].imshow(stylized_image[0, ::-1, ::-1, ::-1])
_ = ax[1].set_title('Processed Image')