# COMP8650 Assignment 6
u5976992, Longfei Zhao

## 1. Conv2d and ConvTranspose2d

### a)

$H_{out}$ = $\left\lfloor{\frac{H_{in} + 2 × {\rm padding} - ({\rm kernel\_size} - 1) - 1}{\rm stride} + 1} \right\rfloor$ = 64

$W_{out}$ = $\left\lfloor{\frac{W_{in} + 2 × {\rm padding} - ({\rm kernel\_size} - 1) - 1}{\rm stride} + 1} \right\rfloor$ = 64

$C_{out}$ = 16

output size = (32, 16, 65, 65)

### b)
$H_{out}$ = ($H_{in}$ − 1) × stride − 2 × padding + kernel_size + output_padding = 3

$W_{out}$ = ($W_{in}$ − 1) × stride − 2 × padding + kernel_size + output_padding = 3

$C_{out}$ = 128

output size = (32, 128, 3, 3)

### c)
$H_{out}$ = ($H_{in}$−1) × stride − 2 × padding + kernel_size + output_padding = 65

$W_{out}$ = ($W_{in}$−1) × stride − 2 × padding + kernel_size + output_padding = 65

$C_{out}$ = 32

output size = (32, 32, 65, 65)

### d)

the shape of weight = ($C_{out}$, $C_{in}$, kernel_size, kernel_size) = (16, 3, 3, 3)

the shape of bias = $C_{out}$ = 16

the number of learnable parameters = the number of weight + the number of bias = 16 × 3 × 3 × 3 + 16 = 448


## 2. Implementation

In [1]:
# COMP4680/8650: ADVANCED TOPICS IN STATISTICAL MACHINE LEARNING
# ASSIGNMENT 6
#
# ONLY MODIFY CODE IN THIS FILE
#

import torch
import torch.nn as nn

class Encoder(nn.Module):
    """Encoder network to map from an RGB image to a latent feature vector."""

    def __init__(self, z_dim=64, img_size=64):
        super(Encoder, self).__init__()

        self.z_dim = z_dim
        self.hidden_layer = nn.Sequential(
            nn.Linear(img_size * img_size * 3, z_dim),
            nn.BatchNorm1d(z_dim),
            nn.ReLU()
        )
        self.output_layer = nn.Sequential(
            nn.Linear(z_dim, z_dim),
            nn.Tanh()
        )
        
    def forward(self, x):
        x = x.view(x.size()[0], -1)
        x = self.output_layer(self.hidden_layer(x))
        return x


class Decoder(nn.Module):
    """Decoder network to map from a latent feature vector to an RGB image."""

    def __init__(self, z_dim=64, img_size=64):
        super(Decoder, self).__init__()

        assert img_size==64
        self.z_dim = z_dim
        self.layer1 = nn.Sequential(
            nn.ConvTranspose2d(z_dim, 128, 7, stride=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Conv2d(128, 128, 3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU()
        )
        self.layer2 = nn.Sequential(
            nn.ConvTranspose2d(128, 64, 3, stride=2),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )
        self.layer3 = nn.Sequential(
            nn.ConvTranspose2d(64, 64, 3, stride=2),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )
        self.layer4 = nn.Sequential(
            nn.ConvTranspose2d(64, 32, 3, stride=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.Conv2d(32, 32, 3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU()
        )
        self.layer5 = nn.Sequential(
            nn.ConvTranspose2d(32, 32, 2, stride=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.Conv2d(32, 3, 1),
            nn.Tanh()
        )

    def forward(self, x):
        x = x.view(x.size()[0], self.z_dim, 1, 1)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.layer5(x)
        return x

## 3. Experiments
### a)

In [1]:
%run test_models.py

....
----------------------------------------------------------------------
Ran 4 tests in 1.370s

OK


### b)

In [2]:
%run asgn6.py

Keeping models on CPU.
Started training at Sun Oct 28 00:48:37 2018
Iteration [    10/  3500] | loss: 0.5035
Iteration [    20/  3500] | loss: 0.3876
Iteration [    30/  3500] | loss: 0.3255
Iteration [    40/  3500] | loss: 0.2740
Iteration [    50/  3500] | loss: 0.2358
Iteration [    60/  3500] | loss: 0.2132
Iteration [    70/  3500] | loss: 0.2151
Iteration [    80/  3500] | loss: 0.1872
Iteration [    90/  3500] | loss: 0.1771
Iteration [   100/  3500] | loss: 0.1747
Saved samples/sample-000100.png
Saved samples/novel-000100.png
Iteration [   110/  3500] | loss: 0.1560
Iteration [   120/  3500] | loss: 0.1569
Iteration [   130/  3500] | loss: 0.1611
Iteration [   140/  3500] | loss: 0.1618
Iteration [   150/  3500] | loss: 0.1511
Iteration [   160/  3500] | loss: 0.1427
Iteration [   170/  3500] | loss: 0.1373
Iteration [   180/  3500] | loss: 0.1322
Iteration [   190/  3500] | loss: 0.1318
Iteration [   200/  3500] | loss: 0.1447
Saved samples/sample-000200.png
Saved samples/nov

### c) Plot the loss function

<img src="loss_fig.png">
<img src="log_loss_fig.png">

According to trend of the log_loss curve, I think it's not converged.

### d)
<img src='sample-000100.png' align="center">

***sample-000100***: After 100 training, the donoised images just get a blurry outline. They are basically useless. The quality of the denoised images are much worse than the noisy input images.

<img src='sample-000700.png'>

***sample-000700***: After 700 training, color appears in the donoised images but very blurry. For some images, we can roughly find where is the head or leg but there are no many details. Most of details miss in the donoised images. Comparing with the noisy input images, we can see that the quality of the denoised images are still worse than the noisy input images.

<img src='sample-001400.png'>

***sample-001400***: After 1400 training, I would say that the quality of the denoised images are basically same as the noisy input images. It recovered many details.

After about 1000 training, more and more images are recovered pretty good. However, I think the denoised images are still worst than noisy input images since we can see 
After about 3000 training, I think the quality of the denoised images are better than noisy input images 

<img src='sample-003500.png'>

***sample-003500***: In the end, I would say that the quality of the denoised images are better than the noisy input images. Some anamorphic details and color are recovered pretty good. For many images, I can easily tell which Pokemon it is. However, we should know it's still have a big gap with the ground-truth images. For example, the second one. It's too complex and its noisy input image is already hard to distinguish.

### e)

At the beginning, the images are grey-scale and just some simple outline. With training, it's more and more colorful and the shape of it are more and more complex. I would say that about after 700 training, the novel images look like something. We can easily imagine where is the head, the body, the wing, the tail and the leg.

### f)