## Neural Style Transfer Implementation using Pytorch

In this notebook we are going to implement Neural Style Transfer and we are going to showcase it on my city Kolkata's one of the famous places Victoria Memorial Hall and we are going to paint it like Van Gogh.

Now in this notebook we are going to utilize the VGG19 as our base model.

So, lets get started with all the imports.

In [1]:
## Importing necessary packages ##

import torch
import torch.nn
import torchvision
from torchvision.models import vgg19
import torchvision.transforms as transforms

import PIL
import numpy as np
import matplotlib as plt

With all the packages imported, lets check out the vgg19 model.

We basically dont need the entire model. We just need some parts of it. Just few convolutional layer outputs.

So we will check the model and then set our requirements.

In [2]:
## Checking vgg19 model ##

model_layers = vgg19(pretrained = True).features

model_layers

Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (6): ReLU(inplace=True)
  (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (8): ReLU(inplace=True)
  (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): ReLU(inplace=True)
  (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (13): ReLU(inplace=True)
  (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (15): ReLU(inplace=True)
  (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (17): ReLU(inplace=True)
  (18): MaxPoo

Now we can see the model.

The layers that we need are 0,5,10,19 and 28.

So, we are going to set it like that.

So, how are we going to do that?

Well we are going to get the output with regards to those layers.

Lets set our model and do that.

In [3]:
## Setting our custom model ##

class NeuralStyleNet(torch.nn.Module):
    
    def __init__(self):
        super().__init__()
        self.model = vgg19(pretrained = True).features[:29]
        self.feature_layers = [0,5,10,19,28]
        
    def forward(self , x):
        outputs = []
        
        for layer_num , layer_name in enumerate(self.model):
            
            x = layer_name(x)
            
            if layer_num in self.feature_layers:
                outputs.append(x)
        
        return outputs

And with that we have sorted out our model.

Next up lets instantiate it and send it to gpu.

In [4]:
## Setting device ##

def set_device():
    if torch.cuda.is_available():
        return torch.device('cuda')
    return torch.device('cpu')

device = set_device()


## Instantiating the model ##

neural_style_net = NeuralStyleNet().to(device).eval()

And the model is instantiated.

Now what we need to do is to import our content and style images and transfer them as tensors.

In [5]:
## Image loading utility function ##

def image_loader(img_path):
    
    img = PIL.Image.open(img_path)
    
    trans = transforms.Compose([
        transforms.Resize((256 , 256)),
        transforms.ToTensor()
    ])
    
    img_tensor = trans(img).unsqueeze(0).to(device)
    
    return img_tensor

## Getting the style and content images ##

content = image_loader('victoria.jpg')
style = image_loader('van_gogh.jpg')

And done!!

Now we need to set our generated image. 

Normally in the original paper they set it to random noise. But for ease of training we are going to get a copy of our content image.

In [6]:
## Generated image ##

gen_img = content.clone().requires_grad_(True)

Now what we need to do is set our loss and optimizer as well as the alpha and beta hyperparameters which are necessary for our final loss function.

Now for defining the loss we would define some utility function, so lets do that.

In [7]:
## Mean Square loss ##

def mse(tensor1 , tensor2):
    
    diff = tensor1 - tensor2
    
    mean_sq_loss = torch.mean((tensor1 - tensor2) ** 2)
    
    return mean_sq_loss

## Gram Matrix ##

def gram_matrix(tensor):
    
    tensor_transpose = tensor.t()
    
    g_m = torch.mm(tensor , tensor_transpose)
    
    return g_m

Now lets define our optimizer and some basic hyperparameters.

In [8]:
## Optimizer ##

optim = torch.optim.Adam([gen_img] , lr = 0.001)

## Alpha Beta values ##

alpha = 1
beta = 0.01

## Repeat steps ##

repeat_steps = 4000

Now lets get our hands dirty and train.

In [9]:
## Training ##

for i in range(repeat_steps):
    
    gen_features = neural_style_net(gen_img)
    
    style_features = neural_style_net(style)
    
    content_features = neural_style_net(content)
    
    content_loss = style_loss = 0
    
    for gen_feature , style_feature , content_feature in zip(gen_features , style_features , content_features):
        
        batch , channel , height , width = gen_feature.shape
        
        content_loss += mse(gen_feature , content_feature)
        
        gram_gen = gram_matrix(gen_feature.view(channel , height * width))
        
        gram_style = gram_matrix(style_feature.view(channel , height * width))
        
        style_loss += mse(gram_gen , gram_style)
        
    total_loss = alpha * content_loss + beta * style_loss
    
    optim.zero_grad()
    
    total_loss.backward()
    
    optim.step()
    
    if (i + 1) % 500 == 0:
        
        print('Step {} / {} --> The total loss of the generated image is : {:.3f}'.format(i + 1 , repeat_steps , total_loss))
        
        torchvision.utils.save_image(gen_img , 'generated_images/generated_{}.png'.format(i + 1))

Step 500 / 4000 --> The total loss of the generated image is : 4734.976
Step 1000 / 4000 --> The total loss of the generated image is : 2461.896
Step 1500 / 4000 --> The total loss of the generated image is : 1793.940
Step 2000 / 4000 --> The total loss of the generated image is : 1426.871
Step 2500 / 4000 --> The total loss of the generated image is : 1192.800
Step 3000 / 4000 --> The total loss of the generated image is : 1030.000
Step 3500 / 4000 --> The total loss of the generated image is : 908.405
Step 4000 / 4000 --> The total loss of the generated image is : 809.167
