# Intro

CNNs are not only great for image classificaation, but also useful for image reconstruction.

![w](cnn_img/p1.png)

There are two elements:

1. Content 
2. Style (Fusing Element)

These two elements are merged to form a single image.

![w](cnn_img/p2.png)


When a CNN is trained to classify a CNN many complex features from an image. Maxpooling layers removes detailed spatial information that is irrelevant. The idea is that as we go deeper into the CNN, feature maps become more invested into the content of the image rather than detail about the texture and color of pixels. CNN can clearly represent the content of the image, but what about the style?

Style can be thought of the brushstrokes found in the painting: texture, colors, curvature and so on. We must combine the content of an image with the style of another image, so how do we combine such elements.

A feature space designed to capture texture and color info is used. This space essentially looks at spatial correlations within a layer of a network. Correlation is the measure of the relationship between two or more variables.

### Style Representation

The similarities and differences between features in a specific layer can give info abobut texture and color info about an image. At the same time leave out info about the content of the image.

During Style Transfer:

1. Extract Content from One Image
2. Extract Style from Another Image
3. Merge Content and Style

## VGG-19 Architecture

![w](cnn_img/p3.png)

- In between each layer there are 2 to 4 convolutional layers stacked.
- Each sub layer is named after main layer e.g conv1_2
- Deepest layer is conv5_4
- Ends with fully connected layer.


The output of the last convolutional layer, conv5_4, will represent the content of the image

![w](cnn_img/p4.png)

The network will extract the style representation of the image during the convolutional layers.

![w](cnn_img/p5.png)

## Content Representation

The content representation is taken from conv4_2. As we form our inmage we will constantly compare content representatiion from the content image with the content representation from the target image. They should still be close despite the style changing for our target image.

# Content Loss

The difference between the content image and target image. We will use mean squared difference as the content loss. 

![w](cnn_img/CONTENT_LOSS.png)

# Gram Matrix

Now we want to measure the error between the style representation from the style image. This is done by comparing the similarities between each different layer to compare i.e general color and textures. By doing the we can obatin a ***multi-scale*** representation of the input image.

The correlation at each layer is given by a gram matrix. To find the gram matrix.

For each layer with its given depth:

1. Vectorize each feature map into a vector (flatten)

![w](cnn_img/p7.png)

If we do this for each layer and stack them, we are converting a 3-D dimension to a 2-D matrix.

![w](cnn_img/p6.png)

Next step is to multiply to resulting matrix by its own transpose. The resultant matrix contains non-localized information about the layer. A non-localized layer would treat each pixel as an individual sample independant of its relative space. This allows us to recognize the style despite the content. The resultant matrix is a matrix with dimension of # of feature maps x # of feature maps. (Each layer has its own gram matrix)!


![w](cnn_img/p8.png)


# Style Loss

All gram matrices for each layer (conv1 to conv 5) is compared to the list of gram matrices for our target image. A weight factor is also applied to give more precedence to important gram matrices.


![w](cnn_img/style_loss.png)


# Total Loss

The total loss is the addition of style and content loss respectively. Backpropagation and optimization to reduce loss by changing target image to match the desired content and style. Constants will need to be multiplied to each loss to scale each loss respectively.

![w](cnn_img/total_loss.png)

Depending on the weight the resultant target image will favor one component more. The perfect balance is key.

![w](cnn_img/p10.png)

---
#### Implementation

# Style Transfer with Deep Neural Networks


In this notebook, we’ll *recreate* a style transfer method that is outlined in the paper, [Image Style Transfer Using Convolutional Neural Networks, by Gatys](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf) in PyTorch.

In this paper, style transfer uses the features found in the 19-layer VGG Network, which is comprised of a series of convolutional and pooling layers, and a few fully-connected layers. In the image below, the convolutional layers are named by stack and their order in the stack. Conv_1_1 is the first convolutional layer that an image is passed through, in the first stack. Conv_2_1 is the first convolutional layer in the *second* stack. The deepest convolutional layer in the network is conv_5_4.

<img src='cnn_img/vgg19_convlayers.png' width=80% />

### Separating Style and Content

Style transfer relies on separating the content and style of an image. Given one content image and one style image, we aim to create a new, _target_ image which should contain our desired content and style components:
* objects and their arrangement are similar to that of the **content image**
* style, colors, and textures are similar to that of the **style image**

An example is shown below, where the content image is of a cat, and the style image is of [Hokusai's Great Wave](https://en.wikipedia.org/wiki/The_Great_Wave_off_Kanagawa). The generated target image still contains the cat but is stylized with the waves, blue and beige colors, and block print textures of the style image!

<img src='cnn_img/style_tx_cat.png' width=80% />

In this notebook, we'll use a pre-trained VGG19 Net to extract content or style features from a passed in image. We'll then formalize the idea of content and style _losses_ and use those to iteratively update our target image until we get a result that we want. You are encouraged to use a style and content image of your own and share your work on Twitter with @udacity; we'd love to see what you come up with!

In [2]:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


device(type='cpu')