## Table Of Contents

0. [References](#Reference)
1. [Getting Started](#GettingStarted)
2. [Architecture](#Architecture)
3. [Masked Convolution](#MaskedConvolution)
4. [First Masked Convolution](#FirstMaskedConvolution)
5. [Residual Blocks](#ResidualBlocks)
    1. [Prelu details](#PRelu)
6. [Stacking Residual Blocks](#StackingResidualBlocks)
7. [Wrapping Up For Outputs](#WrappingUpForOutput)

<a id="References"></a>
## References

This code has been implemented following the steps and instructions provided in this [link](https://israelg99.github.io/2017-02-27-Grayscale-PixelCNN-with-Keras/) which in turn is based on the [paper](https://arxiv.org/pdf/1601.06759.pdf).

<a id="GettingStarted"></a>
## Getting Started

Keras has two ways of defining models, the Sequential, which is the easiest but limiting way, and the Functional, which is more complex but flexible way.

We will use the Functional API because we need that additional flexibility, for example - the Sequential model limits the amount of outputs of the model to 1, but to model RGB channels, we will need 3 output units, one for each channel. As the model gets more complex (e.g Gated PixelCNN) it will become clearer why Functional API is a no-brainer for projects like this.

Our input shape(excluding batch) should be: (height, width, channels).
More specifically, MNIST (grayscale) input shape looks like this (28, 28, 1) and CIFAR (32, 32, 3).

Let’s start simple, we’ll do a PixelCNN for grayscale MNIST first.

<a id="Architecture"></a>
## Architecture

Since the paper focuses on PixelRNN, it fails to provide a clear explanation on how the architecture of PixelCNN should look like, however, it does a good job of describing the big picture, but it is not enough for actually implementing PixelCNN.

Here’s the architecture I came up with for grayscale MNIST (with only 1 residual block for simplicity):

<img src=https://israelg99.github.io/images/2017-02-27-Grayscale-PixelCNN-with-Keras/model.png>

Note that PixelCNN has to preserve the spatial dimension of the input, which is not shown in the graph above.

<a id="MaskedConvolution"></a>
## Masked Convolution

We already defined our input, and as you can see in the architecture graph, the next layer is a masked convolution, which is the next thing we are going to implement.

### How to implement grayscale masks?

Here’s a picture for reference:

<img src=https://israelg99.github.io/images/2017-02-27-Grayscale-PixelCNN-with-Keras/grayscale_mask_typeA.png>

The difference between type A and B masks in grayscale images is that type A also masks the center pixel.
Keep in mind that masks for grayscale images are simpler than RGB masks, but we’ll get to RGB masks too.

Here’s how we are going to implement masks:

1. Create a numpy array of ones in the shape of our convolution weights: (height, width, input_channels, output_channels)
2. Zero out all weights to the right and below of the center weights (to block future insight of pixels from flowing, as stated in the paper).
3. If the mask type is A, we’ll zero out the center weights too (to block insight of the current pixel as well).
4. Multiply the mask with the weights before calculating convolutions.

Let’s use the steps above to go ahead and implement a new Keras layer for masked convolutions:

In [1]:
import math

import numpy as np

from keras import backend as K
from keras.layers import Convolution2D

Using TensorFlow backend.


In [2]:
class MaskedConvolution2D(Convolution2D):
    #*args pick up any number of non-keyword arguments
    #*kwargs pick up any number of keyword arguments that are actually dictionaries
    def __init__(self, *args, mask='B' , n_channels=3, mono=False, **kwargs):
        super().__init__(*args, **kwargs)
        self.mask_type = mask

        self.mask = None
        
    def build(self, input_shape):
        super().build(input_shape)

        # Create a numpy array of ones in the shape of our convolution weights.
        self.mask = np.ones(self.W_shape)

        # We assert the height and width of our convolution to be equal as they should.
        assert mask.shape[0] == mask.shape[1]

        # Since the height and width are equal, we can use either to represent the size of our convolution.
        filter_size = self.mask.shape[0]
        filter_center = filter_size / 2

        # Zero out all weights below the center.
        self.mask[math.ceil(filter_center):] = 0

        # Zero out all weights to the right of the center.
        self.mask[math.floor(filter_center):, math.ceil(filter_center):] = 0

        # If the mask type is 'A', zero out the center weigths too.
        if self.mask_type == 'A':
            self.mask[math.floor(filter_center), math.floor(filter_center)] = 0

        # Convert the numpy mask into a tensor mask.
        self.mask = K.variable(self.mask)
    
    def call(self, x, mask=None):
        ''' I just copied the Keras Convolution2D call function so don't worry about all this code.
            The only important piece is: self.W * self.mask.
            Which multiplies the mask with the weights before calculating convolutions. '''
        output = K.conv2d(x, self.W * self.mask, strides=self.subsample,
                          border_mode=self.border_mode,
                          dim_ordering=self.dim_ordering,
                          filter_shape=self.W_shape)
        if self.bias:
            #Dimension ordering th means the channel dimension (the depth) is at index 1.
            #nb_filter is the number of convolutional filters to use.
            if self.dim_ordering == 'th':
                output += K.reshape(self.b, (1, self.nb_filter, 1, 1))
            #Dimension ordering th means the channel dimension (the depth) is at index 3.
            elif self.dim_ordering == 'tf':
                output += K.reshape(self.b, (1, 1, 1, self.nb_filter))
            else:
            #There are no other kind of dimension orderings so any other case would be invalid.
                raise ValueError('Invalid dim_ordering:', self.dim_ordering)

        output = self.activation(output)
        return output

    def get_config(self):
        # Add the mask type property to the config.
        return dict(list(super().get_config().items()) + list({'mask': self.mask_type}.items()))

<a id="FirstMaskedConvolution"></a>
## First Masked Convolution Layer

Now that we have masked convolutions implemented, let’s add the first masked convolution to our model(which is practically just an input layer at the moment).

According to the paper, the layer after the input is a masked convolution of type A, with a filter size of (7,7) and it has to preserve the spatial dimensions of the input, we’ll use border_mode='same' for that.
Note that this layer is the only masked convolution of type A the model will have.

Now we should have a simple graph like this: input -> masked_convolution.

In [None]:
shape = (28, 28, 1)
filters = 128
depth = 6

input_img = Input(shape)

model = MaskedConvolution2D(filters, 7, 7, mask='A', border_mode='same')(input_img)

<a id="ResidualBlocks"></a>
## Residual blocks

After the first masked convolution the model has a series of residual blocks (The architecture picture above has only 1 residual block).

To implement a residual block:

1. Take input of shape (height, width, filters).
2. Halve the filters with a (1,1) convolution.
3. Apply a (3,3) masked convolution of type B.
4. Scale the filters back to original with (1,1) convolution.
5. Merge the original input with the convolutions.

The reason for cutting the filters by half and then scaling back to original is because it is a good way to get a computational boost while not significally reducing model performance.

Let’s implement a residual block in Keras:

In [None]:
class ResidualBlock(object):
    def __init__(self, filters):
        self.filters = filters

    def __call__(self, model):
        # filters -> filters/2
        block = PReLU()(model)
        block = Convolution2D(self.filters//2, 1, 1)(block)

        # filters/2 3x3 -> filters/2
        block = PReLU()(block)
        block = MaskedConvolution2D(self.filters//2, 3, 3, border_mode='same')(block)

        # filters/2 -> filters
        block = PReLU()(block)
        block = Convolution2D(self.filters, 1, 1)(block)

        # Merge the original input with the convolutions.
        return Merge(mode='sum')([model, block])

<a id="PRelu"></a>
### PRelu Layers

The parametric rectifier linear unit (pReLU) activation layer applies the transform f(x) = max(0, x) + w * min(0, x) to the input data. The backward pReLU layer computes the values z = y*f'(x), where y is the input gradient computed on the preceding layer, w is the weight of the input argument. and

<img src=https://software.intel.com/sites/products/documentation/doclib/daal/daal-user-and-reference-guides/daal_prog_guide/equations/GUID-ADC54AE0-43B8-40CA-BA41-245D3240Bee1.png>

We will want to stack those residual blocks in our model, so let’s create a simple layer for that:

In [4]:
class ResidualBlockList(object):
    def __init__(self, filters, depth):
        self.filters = filters
        self.depth = depth

    def __call__(self, model):
        for _ in range(self.depth):
            model = ResidualBlock(self.filters)(model)

        return model

<a id="StackingResidualBlocks"></a>
## Stacking Residual Blocks

Now let’s stack those residual blocks on our model.
We also need to add an activation after the stack, because the residual block ends with a convolution, not an activation.

In [None]:
#shape = (28, 28, 1)
#filters = 128
#depth = 6

#input_img = Input(shape)

#model = MaskedConvolution2D(filters, 7, 7, mask='A', border_mode='same')(input_img)

model = ResidualBlockList(filters, depth)
model = PReLU()(model)

<a id="WrappingUpForOutput"></a>
## Wrapping Up For Output

As shown in the architecture picture above, the model has additional 2 masked convolutions before output. According to the paper, those 2 masked convolutions are of size (1,1) and of type B.

Let’s add those to our model:

In [None]:
#shape = (28, 28, 1)
#filters = 128
#depth = 6

#input_img = Input(shape)

#model = MaskedConvolution2D(filters, 7, 7, mask='A', border_mode='same')(input_img)

#model = ResidualBlockList(filters, depth)
#model = PReLU()(model)

for _ in range(2):
    model = MaskedConvolution2D(filters, 1, 1, border_mode='valid')(model)
    model = PReLU()(model)