<a href="https://colab.research.google.com/github/SpirinEgor/HSE.Deep_Unsupervised_Learning/blob/hw1/Homework/hw1/Homework1_MADE.ipynb" target="_parent"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"/></a>

In [None]:
! nvidia-smi

# Homework 1

In [None]:
! if [ -d HSE.Deep_Unsupervised_Learning ]; then rm -Rf HSE.Deep_Unsupervised_Learning; fi
! git clone https://github.com/SpirinEgor/HSE.Deep_Unsupervised_Learning.git
%cd HSE.Deep_Unsupervised_Learning
! git checkout hw1

In [None]:
! pip install -r requirements.txt

In [None]:
! git pull

In [None]:
! unzip -qq Homework/hw1/data/hw1_data.zip -d Homework/hw1/data/
! mv -v Homework/hw1/data/hw1_data/* Homework/hw1/data/
! rm -rf Homework/hw1/data/hw1_data/

In [None]:
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader

from made.trainer import ImageMADETrainer
from utils.hw1_utils import *

%matplotlib inline
%load_ext autoreload
%autoreload 2

In [None]:
plt.rcParams["axes.labelsize"] = 25.0
plt.rcParams["xtick.labelsize"] = 20.0
plt.rcParams["ytick.labelsize"] = 20.0
plt.rcParams["legend.fontsize"] = 18.0

plt.rcParams["figure.figsize"] = [8.0, 6.0]

# MADE

In this question, you will implement [MADE](https://arxiv.org/abs/1502.03509). In the first part, you will use MADE to model a simple 2D joint distribution, and in the second half, you will train MADE on image datasets.

## Part (a) Fitting 2D Data

First, you will work with bivariate data of the form $x = (x_0,x_1)$, where $x_0, x_1 \in \{0, \dots, d\}$. We can easily visualize a 2D dataset by plotting a 2D histogram. Run the cell below to visualize our datasets.

In [None]:
visualize_q1a_data(dataset_type=1)
visualize_q1a_data(dataset_type=2)
# you can access data with get_data_q1_a(dset_type=1)

Implement and train a MADE model through maximum likelihood to represent $p(x_0, x_1)$ on the given datasets, with any autoregressive ordering of your choosing. 

A few notes:
* You do not need to do training with multiple masks
* You made find it useful to one-hot encode your inputs. 

**You will provide these deliverables**


1.   Over the course of training, record the average negative log-likelihood (nats / dim) of the training data (per minibatch) and test data (for your entire test set). Code is provided that automatically plots the training curves. 
2.   Report the final test set performance of your final model
3. Visualize the learned 2D distribution by plotting a 2D heatmap


### Solution
Fill out the function below and return the necessary arguments. Feel free to create more cells if need be.

In [None]:
def q1_a(train_data, test_data, d, dset_id):
    """
    train_data: An (n_train, 2) numpy array of integers in {0, ..., d-1}
    test_data: An (n_test, 2) numpy array of integers in {0, .., d-1}
    d: The number of possible discrete values for each random variable x1 and x2
    dset_id: An identifying number of which dataset is given (1 or 2). Most likely
            used to set different hyperparameters for different datasets

    Returns
    - a (# of training iterations,) numpy array of train_losses evaluated every minibatch
    - a (# of epochs + 1,) numpy array of test_losses evaluated once at initialization and after each epoch
    - a numpy array of size (d, d) of probabilities (the learned joint distribution)
    """
    config = {
        1: {
            "hidden_layers": [128, 128],
            "batch_size": 10,
            "test_batch_size": 20,
            "hyperparameters": {"lr": 2e-3, "n_epochs": 20, "weight_decay": 1e-4, "clip_norm": 10},
        },
        2: {
            "hidden_layers": [128, 128],
            "batch_size": 10,
            "test_batch_size": 20,
            "hyperparameters": {"lr": 2e-3, "n_epochs": 20, "weight_decay": 1e-4, "clip_norm": 10},
        },
    }

    made_trainer = ImageMADETrainer((2,), d, config[dset_id]["hidden_layers"], True)

    train_dataloader = DataLoader(train_data, batch_size=config[dset_id]["batch_size"], shuffle=True)
    test_dataloader = DataLoader(test_data, batch_size=config[dset_id]["test_batch_size"])

    train_losses, test_losses = made_trainer.train(
        train_dataloader, seed=7, test_dataloader=test_dataloader, **config[dset_id]["hyperparameters"]
    )

    return train_losses, test_losses, made_trainer.made.get_distribution()

### Results

Once you've implemented `q2_a`, execute the cells below to visualize and save your results



In [None]:
q1_save_results(1, "a", q1_a)

In [None]:
q1_save_results(2, "a", q1_a)

## Part (b) Shapes and MNIST
Now, we will work with a higher dimensional datasets, namely a shape dataset and MNIST. Run the cell below to visualize the two datasets

In [None]:
visualize_q1b_data(1)
visualize_q1b_data(2)
# you can access data with get_data_q1_b(dset_type=1)

Implement and train a MADE model on the given binary image datasets. Given some binary image of height $H$ and width $W$, we can represent image $x\in \{0, 1\}^{H\times W}$ as a flattened binary vector $x\in \{0, 1\}^{HW}$ to input into MADE to model $p_\theta(x) = \prod_{i=1}^{HW} p_\theta(x_i|x_{<i})$. Your model should output logits, after which you could apply a sigmoid over 1 logit, or a softmax over two logits (either is fine).

**You will provide these deliverables**


1.   Over the course of training, record the average negative log-likelihood (nats / dim) of the training data (per minibatch) and test data (for your entire test set). Code is provided that automatically plots the training curves. 
2.   Report the final test set performance of your final model
3. 100 samples from the final trained model

### Solution
Fill out the function below and return the necessary arguments. Feel free to create more cells if need be.

In [None]:
def q1_b(train_data, test_data, image_shape, dset_id, n_samples: int = 100):
    """
    train_data: A (n_train, H, W, 1) uint8 numpy array of binary images with values in {0, 1}
    test_data: An (n_test, H, W, 1) uint8 numpy array of binary images with values in {0, 1}
    image_shape: (H, W), height and width of the image
    dset_id: An identifying number of which dataset is given (1 or 2). Most likely
            used to set different hyperparameters for different datasets

    Returns
    - a (# of training iterations,) numpy array of train_losses evaluated every minibatch
    - a (# of epochs + 1,) numpy array of test_losses evaluated once at initialization and after each epoch
    - a numpy array of size (100, H, W, 1) of samples with values in {0, 1}
    """
    config = {
        1: {
            "hidden_layers": [512, 512],
            "batch_size": 10,
            "test_batch_size": 20,
            "hyperparameters": {"lr": 1e-3, "n_epochs": 20, "weight_decay": 1e-4, "clip_norm": 10},
        },
        2: {
            "hidden_layers": [512, 512],
            "batch_size": 10,
            "test_batch_size": 20,
            "hyperparameters": {"lr": 1e-3, "n_epochs": 20, "weight_decay": 1e-4, "clip_norm": 10},
        },
    }

    train_data = np.transpose(train_data, (0, 3, 1, 2))
    test_data = np.transpose(test_data, (0, 3, 1, 2))
    H, W = image_shape

    made_trainer = ImageMADETrainer((1, H, W), 2, hidden_sizes=config[dset_id]["hidden_layers"])

    train_dataloader = DataLoader(train_data.astype(int), batch_size=config[dset_id]["batch_size"], shuffle=True)
    test_dataloader = DataLoader(test_data.astype(int), batch_size=config[dset_id]["test_batch_size"])

    train_losses, test_losses = made_trainer.train(
        train_dataloader, seed=7, test_dataloader=test_dataloader, **config[dset_id]["hyperparameters"]
    )

    samples = made_trainer.made.sample(n_samples, (1, H, W))
    samples = np.transpose(samples, (0, 2, 3, 1))

    return train_losses, test_losses, samples

### Results

Once you've implemented `q2_b`, execute the cells below to visualize and save your results



In [None]:
q1_save_results(1, "b", q1_b)

In [None]:
q1_save_results(2, "b", q1_b)