In [1]:
import torch
import torch.nn as nn

# Discriminator
According to the paper, the discriminator has the following architecture.  
![Discriminator](Images/Discriminator.jpg)  
I designed the above picture using [alexlenail.me](http://alexlenail.me/NN-SVG/AlexNet.html).  
This architecture, however, has minor differences compared to the original diagram shown in the article. To begin with, it is apparent that instead of a 5x5 filter, here is 4x4. The reason is because of the following formula:
$$n_{out} = \lfloor\frac{n_{in} + 2p - k}{s}\rfloor + 1$$
$where$  
$n_{in}$ : **number of input features**  
$n_{out}$: **number of output features**  
$k$: **convolution kernel size**  
$p$: **convolution padding size**  
$s$: **convolution stride size**  
As we see from the formula, if $n_{in} = 64$, $p = 1$, $k = 4$ and $s = 1$, then $n_{out} = 32$; however, the calculation is not right if we set $k = 5$ without padding.

The order is also inverted from the original architecture, which is a noteworthy change. This is due to the fact that the paper's illustrated architecture was intended for the generator. Last but not least, we have 1-dimensional output at the result of this network since we need to evaluate whether the provided input picture is fake or not.

There are other points that are important to be mentioned which cannot be inferred from the picture above:
* Batchnorm is used for all layers
* LeakyReLU activation is used for all layers with 0.2 as the slope of the leak
* Since applying batchnorm to all layers cause sample oscillation and model instability, the paper suggested to avoid using batchnorm in the discriminator input layer.
* the last convolution layer is flattened and then fed into a single sigmoid output.

In [2]:
class Discriminator(nn.Module):
    def __init__(self, input_depth, feature_depth):
        super(Discriminator, self).__init__()
        
        self.leaky_relu_negative_slope = 0.2
        self.discriminator = nn.Sequential(
            self.input_conv_block(in_channels=input_depth, out_channels=feature_depth),  #  3x64x64 -> 128x32x32
            self.conv_block(in_channels=feature_depth, out_channels=feature_depth * 2),   #  128x32x32 -> 256x16x16
            self.conv_block(in_channels=feature_depth * 2, out_channels=feature_depth * 4),   #  256x16x16 -> 512x8x8
            self.conv_block(in_channels=feature_depth * 4, out_channels=feature_depth * 8),   #  512x8x8 -> 1024x4x4
            self.conv_block(in_channels=feature_depth * 8, out_channels=1, padding=0),   #  1024x4x4 -> 1x1x1
            nn.Sigmoid(),
        )
    
    def input_conv_block(self, in_channels, out_channels, kernel_size=4, stride=2, padding=1):
        return nn.Sequential(
            nn.Conv2d(in_channels,
                      out_channels,
                      kernel_size=4,
                      stride=2,
                      padding=1),
            nn.LeakyReLU(self.leaky_relu_negative_slope)
        )
    
    def conv_block(self, in_channels, out_channels, kernel_size=4, stride=2, padding=1):
        return nn.Sequential(
            nn.Conv2d(in_channels,
                      out_channels,
                      kernel_size,
                      stride,
                      padding,
                      bias=False,  # Since we use batchnorm, it is not necessary to use bias
                     ),
            nn.BatchNorm2d(out_channels),
            nn.LeakyReLU(self.leaky_relu_negative_slope)
        )
    
    def forward(self, x):
        return self.discriminator(x)