Run the code typing in:
python3 generate_art.py -c [CONTENT FILEPATH] -s [STYLE FILEPATH]
Where the content file is the original image, and the style file contains the artwork image. The generated image will generate an image similar to the content file, with the style of the style file. Please allow some time for the model to train. You can locate the generated models in :
/content
Thank you to Coursera and Andrew Ng for the helpful lectures for learning the content!
A NST (Neural Style Transfer) is a technique where we blend the content image with the style of the artwork.
The key idea behind this is that the shallow hidden layers of a CNN detects low level features such as horizontal contrast patterns while the deep hidden layers detect high features such as face shapes and different types of classes.
So given a generated image
$J(G) = \alpha J_{content}(h_C,h_G) + \beta J_{style}(h_S,h_G)$ -
$\alpha J_{content}(h_C,h_G)$ represents the content loss function given the hidden layers of the content image denoted by$h_C$ and the hidden layers of the generated image denoted by$h_G$ . -
$\beta J_{style}(h_S,h_G)$ represents the style loss function given the hidden layers of the style image denoted by$h_S$ and the hidden layers of the generated image denoted by$h_G$ . -
$alpha$ and$beta$ are just arbitual constants that weigh each component.
We will use a pre-train model: VGG network to apply transfer learning. Specifically, we will be using VGG-19, a 19 layer VGG network.
We will first talk about the cost function
The cost function can be computed as follows:
- $ J_{content}(h_C,h_G)=1/(4HW*D) \sum (h_C-h_G)^2 $
-
$H, W, D$ represents the height, width, and depth of the hidden layer$h_C$ . - The dimensions of
$h_C$ is equivalent to$h_G$ , since both images are fed through the same neural network.
Since we want the generated image to look like the content image, we will be selecting the hiden layer at the very end of the neural network for comparison. This will allow the model to learn that high level features needs to be preserved.
We will now talk about the style function $ J_{style}(h_S,h_G)$.
The gram matrix
- $ J_{style}(h_S, h_G) = 1/(4D^2(H*W)^2) \sum (G(h_S) - G(h_G))^2$
-
$H, W, D$ represents the height, width, and depth of the hidden layer$h_C$ . - The dimensions of
$h_S$ is equivalent to$h_G$ , since both images are fed through the same neural network. We will transform the dimensions to$(H * W, D)$ in order to feed it into the gram matrix.
To generate better results, we will merge the styles cost function scores from multiple hidden layers. We will weight the layers closest to the middle layer more, since it will have a good representation of low and high level features.
Now we can apply gradient descent and train our model!