<a href="https://colab.research.google.com/github/LEBoltzmann/comp4211_pa2/blob/master/autoregressive_model_ipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#clone data
!git clone https://github.com/LEBoltzmann/comp4211_pa2.git
!pip install -r comp4211_pa2/pa2/requirements.txt


Cloning into 'comp4211_pa2'...
remote: Enumerating objects: 52506, done.[K
remote: Counting objects: 100% (8/8), done.[K
remote: Compressing objects: 100% (5/5), done.[K
remote: Total 52506 (delta 1), reused 3 (delta 0), pack-reused 52498[K
Receiving objects: 100% (52506/52506), 96.61 MiB | 23.81 MiB/s, done.
Resolving deltas: 100% (231/231), done.
Updating files: 100% (52503/52503), done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow-addons
  Downloading tensorflow_addons-0.19.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m37.1 MB/s[0m eta [36m0:00:00[0m
Collecting typeguard>=2.7
  Downloading typeguard-3.0.2-py3-none-any.whl (30 kB)
Installing collected packages: typeguard, tensorflow-addons
Successfully installed tensorflow-addons-0.19.0 typeguard-3.0.2


# AutoregressiveModel

**Description:** AutoregressiveModel implemented in Keras to generate image.

**Objective:** The objective of this assignment is to practise using the TensorFlow machine learning framework
through implementing custom training modules and data reader modules for image generation on
the Chinese Calligraphy dataset using a convolutional neural network (CNN) based architecture.
Throughout the assignment, students will be guided to develop the CNN-based model step by
step and study how to build custom modules on TensorFlow and the effects of different model
configurations.

## Introduction

Image generation is one of the fundamental computer vision tasks, referring to the process of generating new images that are visually realistic and similar to real-world images. It is widely used in many applications, such as super resolution, photograph editing and 3D modelling. 

One approach to image generation is to use models that learn to predict the probability distribution of pixel values, given the values of all the previous pixels. These models generate images one pixel at a time, using the previously generated pixels to condition the generation of the next pixel.

### Setting environment

Note: You can only use the packages listed below !!!

In [2]:
import numpy as np
import math
import os
from PIL import Image
import time
from tqdm import tqdm

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
#enable numpy in tensor
from tensorflow.python.ops.numpy_ops import np_config
np_config.enable_numpy_behavior()


## Getting the data


### Download dataset



In [None]:
# Download dataset from google drive
! wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=18ogOIVtYFkcCyNN6AHLCrTI95zMrYAZt' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=18ogOIVtYFkcCyNN6AHLCrTI95zMrYAZt" -O calligraphy.zip && rm -rf /tmp/cookies.txt
! mkdir ./data && unzip -q calligraphy.zip -d ./data/ && rm calligraphy.zip
! wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1w7JVXz6U-NVDZxBf1oSAVjKdR4BJs1zI' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1w7JVXz6U-NVDZxBf1oSAVjKdR4BJs1zI" -O calligraphy.zip && rm -rf /tmp/cookies.txt
! unzip -q calligraphy.zip -d ./data/ && rm calligraphy.zip
! ls -l ./data

### make dataset

In [3]:
# Model / data parameters
input_shape = (32, 32, 1)
batch_size = 32
data_dir = "comp4211_pa2/pa2"
data_name = "calligraphy"

In [5]:
# dataset class
class CalligraphySequence(tf.keras.utils.Sequence):

    def __init__(self, image_dir, batch_size):
        ### [C1: Build init and len functions]
        # Your code here
        self.batch_size = batch_size
        self.image_dir = image_dir + "/train"
        self.files = os.listdir(self.image_dir)
        self.file_num = len(self.files)



    def __len__(self):
        ### [C1: Build init and len functions]
        # Your code here
        
        return math.ceil(self.file_num / self.batch_size)

    def __getitem__(self, idx):
        #get index 
        low = idx * self.batch_size
        high = min(low + self.batch_size, self.file_num)
        batch_x = self.x[low:high]
        batch_y = self.y[low:high]
        ### [C2: Build getitem function]
        # Round all pixel values less than 33% of the max 256 value to 0
        # anything above this value gets rounded up to 1 so that all values are either
        # 0 or 1
        # Your code here
        return (batch_x, batch_y)

# final shape should be 1313 (32, 32, 32, 1) (32, 32, 32, 1)
train_ds = CalligraphySequence(data_dir, batch_size)
print(len(train_ds), train_ds[0][0].shape, train_ds[0][1].shape)

AttributeError: ignored

In [12]:
#convert to gray
image_dir = data_dir + "/train"
files = os.listdir(image_dir)
os.system('mkdir ' + image_dir + '/grey')

for image in files:
    img = Image.open(image_dir + '/' + image).convert('L')
    img.save(image_dir + '/grey/' + image)

## Create layers for the requisite Layers for the model


### Given function for conv2d / down_shift / right_shift / concat_elu
1. conv2d: 2d convolution layer using layers.Conv2D

2. down_shift: shift feature down in height dimension (by padding zero to the top and drop the bottom)

3. right_shift: shift feature right in width dimension

4. concat_elu: a nonlinearity layer (http://arxiv.org/abs/1603.05201)

The down_shift and right_shift functions are used to avoid information leaks in a causal network.


In [None]:
class Conv2d(layers.Layer):
    def __init__(self, num_filters, filter_size=[3, 3], stride=[1, 1], pad='SAME', nonlinearity=None, **kwargs):
        super().__init__()
        self.conv = layers.Conv2D(num_filters, filter_size, padding = pad, strides = stride, activation = nonlinearity, 
                         kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.05))

    def call(self, x):
        return self.conv(x)

def down_move(x, step=1):
    input_shape = tf.shape(x)
    return tf.concat([tf.zeros((input_shape[0], step, input_shape[2], input_shape[3])), x[:, :input_shape[1] - step, :, :]], 1)

def right_move(x, step=1):
    input_shape = tf.shape(x)
    return tf.concat([tf.zeros((input_shape[0], input_shape[1], step, input_shape[3])), x[:, :, :input_shape[2] - step, :]], 2)

def concat_elu(x):
    """ like concatenated ReLU (http://arxiv.org/abs/1603.05201), but then with ELU """
    axis = len(x.get_shape()) - 1
    out = tf.nn.elu(tf.concat([x, -x], axis))
    return out

### Gated Residual Block
The GatedResnet class applies gated residual connections to input tensors for feature extraction.

Please follow Section 4.2.3 to implement coding question.


In [None]:
class DownMovedConv2d(layers.Layer):
    def __init__(self, num_filters, filter_size=[2, 3], stride=[1, 1], pad='VALID', nonlinearity=None, **kwargs):
        super().__init__()
        ### [C4: Build DownMovedConv2d.]
        # Your code here
        

    def call(self, x):
        ### [C4: Build DownMovedConv2d.]
        # Your code here


class DownRightMovedConv2d(layers.Layer):
    def __init__(self, num_filters, filter_size=[2, 2], stride=[1, 1], pad='VALID', nonlinearity=None, **kwargs):
        super().__init__()
        ### [C3: Build DownRightMovedConv2d.]
        # Your code here

    def call(self, x):
        ### [C3: Build DownRightMovedConv2d.]
        # Your code here


class TensorDense(layers.Layer):
    def __init__(self, num_units, nonlinearity=None, **kwargs):
        super().__init__()
        ### [C5: Build TensorDense.]
        # Your code here

    def call(self, x):
        ### [C5: Build TensorDense.]
        # Your code here


class GatedResnet(layers.Layer):
    def __init__(self, num_filters, nonlinearity=concat_elu, **kwargs):
        super().__init__()
        ### [C6: Build GatedResnet.]
        # Your code here

    def call(self, x):
        ### [C6: Build GatedResnet.]
        # Your code here

### Main AutoregressiveModel

In [None]:
class AutoregressiveModel(layers.Layer):
    def __init__(self, n_resnet=5, n_filters=256, n_block=12, n_output=10, **kwargs):
        super().__init__()
        self.n_resnet = n_resnet
        self.n_filters = n_filters
        self.n_block = n_block
        self.n_output = n_output
        # init all network layers
        self.down_moved_conv2d = DownMovedConv2d(num_filters=self.n_filters, filter_size=[1, 3])
        self.down_right_moved_conv2d = DownRightMovedConv2d(num_filters=self.n_filters, filter_size=[2, 1])
        self.ul_list_gated_resnet = []
        self.ul_list_dense_layer = []
        ### [C7: Build AutoregressiveModel.]
        # Your code here

    def call(self, inputs):
        input_shape = tf.shape(inputs)
        x = down_move(self.down_moved_conv2d(inputs)) + right_move(self.down_right_moved_conv2d(inputs))
        ### [C7: Build AutoregressiveModel.]
        # Your code here
        return x_out

## Build the model based on the original paper


In [None]:
## Build the model based on the original paper
inputs = keras.Input(shape=input_shape, dtype=tf.float32)
x = AutoregressiveModel(n_resnet=6, n_filters=64, n_block=6, n_output=10)(inputs)
out = keras.layers.Conv2D(
    filters=1, kernel_size=1, strides=1, activation="sigmoid", padding="valid"
)(x)

pixel_cnn = keras.Model(inputs, out)

### [C11: Model training and log reporting]
# you can use keras.optimizers.Adam here to define "adam"
# compile your model and make a summary on its architecture
# Your code here

In [None]:
### [C8: Load the pretrained weights]
# Your code here

In [None]:
### [C11: Model training and log reporting]
# you can use model.fit here
# Your code here

In [None]:
# save weights 
# pixel_cnn.save_weights('pixel_cnn_e15.h5')

## Demonstration

The AutoregressiveModel cannot generate the full image at once. Instead, it must generate each pixel in
order, append the last generated pixel to the current image, and feed the image back into the
model to repeat the process.

In [None]:
from IPython.display import Image, display

# Create an empty array of pixels.
batch = 4 # you may want to change this parameter 
pixels = np.zeros(shape=(batch,) + (pixel_cnn.input_shape)[1:])
batch, rows, cols, channels = pixels.shape

# Iterate over the pixels because generation has to be done sequentially pixel by pixel.
for row in tqdm(range(rows)):
    for col in range(cols):
        for channel in range(channels):
            ### [C9: Qualitative Evaluation]
            # Your code here
            # 1. Feed the whole array and retrieving the pixel value probabilities for the next
            # pixel. You can use model.predict function to get predict value for each pixel.

            # 2. Use the probabilities to pick pixel values and append the values to the image
            # frame. you can use tf.math.ceil to achieve the 0.5 threshold.


def deprocess_image(x):
    # Stack the single channeled black and white image to RGB values.
    x = np.stack((x, x, x), 2)
    # Undo preprocessing
    x *= 255.0
    # Convert to uint8 and clip to the valid range [0, 255]
    x = np.clip(x, 0, 255).astype("uint8")
    return x


# Iterate over the generated images and plot them with matplotlib.
for i, pic in enumerate(pixels):
    keras.preprocessing.image.save_img(
        "generated_image_{}.png".format(i), deprocess_image(np.squeeze(pic, -1))
    )

display(Image("generated_image_0.png"))
display(Image("generated_image_1.png"))
display(Image("generated_image_2.png"))
display(Image("generated_image_3.png"))

## Quantitative Evaluation



In [None]:
### [C10: Quantitative Evaluation]
# Your code here