# Assignment 2 (Practical)

**COMP9418 - Advanced Topics in Statistical Machine Learning**

**Louis Tiao** (TA), **Edwin V. Bonilla** (Instructor)

*School of Computer Science and Engineering, UNSW Sydney*

---

In the practical component of this assignment you will build a *class-conditional classifier* using the mixture model described in the theory section of this assignment.

The basic idea behind a class conditional classifier is to train a separate model for each class $p(\mathbf{x} \mid y)$, and use Bayes' rule to classify a novel data-point $\mathbf{x}^*$ with:

$$
p(y^* \mid \mathbf{x}^*) = \frac{p(\mathbf{x}^* \mid y^*) p(y^*)}{\sum_{y'=1}^C p(\mathbf{x}^* \mid y') p(y')}
$$

(c.f. Barber textbook BRML, 2012, $\S$23.3.4 or Murphy textbook MLaPP, 2012, $\S$17.5.4).

In this assignment, you will use the prescribed mixture model for each of the conditional densities $p(\mathbf{x} | y)$ and a Categorical distribution for $p(y)$.

### Prerequisites

You will require the following packages for this assignment:

- `numpy`
- `scipy`
- `scikit-learn`
- `matplotlib`
- `observations`

Most of these may be installed with `pip`:

    pip install numpy scipy scikit-learn matplotlib observations

### Guidelines

1. Unless otherwise indicated, you may not use any ML libraries and frameworks such as scikit-learn, TensorFlow to implement any training-related code. Your solution should be implement purely in NumPy/SciPy.
2. Do not delete any of the existing code-blocks in this notebook. It will be used to assess the performance of your algorithm.

### Assessment

Your work will be assessed based on:
- **[50%]** the application of the concepts for doing model selection, which allows you to learn a single model for prediction (Section 1);  
- **[30%]** the code you write for making predicitions in your model (Section 2); and
- **[20%]** the predictive performance of your model (Section 3). 

## Dataset

You will be building a class-conditional classifier to classify digits from the [Fashion MNIST dataset](https://github.com/zalandoresearch/fashion-mnist), containing grayscale images of clothing items --- coats, shirts, sneakers, dresses and the like.

This can be obtained with [observations](https://github.com/edwardlib/observations), a convenient tool for loading standard ML datasets.

In [None]:
from observations import fashion_mnist
from sklearn.preprocessing import LabelBinarizer

In [None]:
(x_train, y_train_), _ = fashion_mnist('.')

There are 60k training examples, each consisting of 784-dimensional feature vectors corresponding to 28 x 28 pixel intensities.

In [None]:
x_train.shape

The pixel intensities are originally unsigned 8-bit integers (`uint8`) and should be normalized to be floating-point decimals within range $[0,1]$.

In [None]:
x_train = x_train / 255.

The targets contain the class label corresponding to each example. For this assignment, you should represent this using the "one-hot" encoding. 

In [None]:
y_train = LabelBinarizer().fit_transform(y_train_)
y_train.shape

Note that you are only to use the training data contained in `x_train`, `y_train` as we have define it. In order to learn and test you model, you may consider splitting these data into training, validation and testing. You may not use any other data to for training.

In particular, if you want to assess the performance of your model in section 2, you must create a test set `test.npz`. You are not required to submit this test file as we will evaluate the performance of your model using our own test data.

## Preamble 

In [None]:
%matplotlib inline

In [None]:
import numpy as np
import matplotlib.pyplot as plt

#### Constants

You can use the function below to plot a digits in the dataset.

In [None]:
def plot_image_grid(ax, images, n=20, m=None, img_rows=28, img_cols=28):
    """
    Plot the first `n * m` vectors in the array as 
    a `n`-by-`m` grid of `img_rows`-by-`img_cols` images.
    """
    if m is None:
        m = n
 
    grid = images[:n*m].reshape(n, m, img_rows, img_cols)

    return ax.imshow(np.vstack(np.dstack(grid)), cmap='gray')

Here we have the first 400 images in the training set.

In [None]:
fig, ax = plt.subplots(figsize=(8, 8))

plot_image_grid(ax, x_train, n=20)

plt.show()

Here we have the first 400 images labeled "t-shirts" in the training set.

In [None]:
fig, ax = plt.subplots(figsize=(8, 8))

plot_image_grid(ax, x_train[y_train_ == 0])

plt.show()

## Section 1 `[50%]`: Model Training

Place all the code for training your model using the function `model_train` below. 

- We should be able to run your notebook (by clicking 'Cell->Run All') without errors. However, you must save the trained model in the file `model.npz`. This file will be loaded to make predictions in section 2 and assess the performance of your model in section 3. Note that, in addition to this notebook file, <span style="color:red"> ** you must provide the file `model.npz` **</span>.

- You should comment your code as much as possible so we understand your reasoning about training, model selection and avoiding overfitting. 

- You can process the data as you wish, e.g. by applying some additional transformations, reducing dimensionality, etc. However, all these should be here too. 

- Wrap all your training using the function `model_train` below. You can call all other custom functions within it.

In [None]:
def model_train(x_train, y_train):
    """
    Write your code here.
    """
    model = None

    # You can modify this to save other variables, etc 
    # but make sure the name of the file is 'model.npz.
    np.savez_compressed('model.npz', model=model)

## Section 2 `[30%]`: Predictions

Here we will assume that there is a file `test.npz` from which we will load the test data.  As this file is not given to you, you will need to create one yourself (but not to submit it) to test your code. <span style="color:red">Note that if you do not create this file the cell below will not run</span>. 

Your task is to fill in the `model_predict` function below. Note that this function should load your `model.npz` file, which must contain all the data structures necessary for making predictions.

In [None]:
# create these yourself for your own testing but need to delete before submisson
x_test = np.random.randn(10000, 784)
y_test = np.random.randint(low=0, high=9, size=(10000,1))
y_test.shape
np.savez('test.npz', x_test=x_test, y_test=y_test)

In [None]:
test = np.load('test.npz')
x_test = test.get('x_test')
y_test = test.get('y_test')

In [None]:
x_test.shape

In [None]:
y_test_ = LabelBinarizer().fit_transform(y_test)
y_test_.shape

In [None]:
fig, ax = plt.subplots()

plot_image_grid(ax, x_test, n=8, m=3)

plt.show()

In [None]:
def model_predict(x_test):
    """
    @param x_test: (N_test,D)-array with test data
    @return y_pred: (N,C)-array with predicted classes using one-hot-encoding 
    @return y_log_prob: (N,C)-array with  predicted log probability of the classes   
    """

    # Add your code here: You should load your trained model here 
    # and write to the corresponding code for making predictions
    model = np.load('model.npz');

    return y_pred, y_log_prob

## Section 3 `[20%]`: Performance 

You do not need to do anything in this section but you can use it to test the generalisation performance of your code. We will use it the evaluate the performance of your algorithm on a new test. 

In [None]:
def model_performance(x_test, y_test, y_pred, y_log_prob):
    """
    @param x_test: (N,D)-array of features 
    @param y_test: (N,C)-array of one-hot-encoded true classes
    @param y_pred: (N,C)-array of one-hot-encoded predicted classes
    @param y_log_prob: (N,C)-array of predicted class log probabilities 
    """

    acc = np.all(y_test == y_pred, axis=1).mean()
    llh = y_log_prob[y_test == 1].mean()

    return acc, llh

In [None]:
y_pred, y_log_prob = model_predict(x_test)
acc, llh = model_performance(x_test, y_test, y_pred, y_log_prob)

In [None]:
'Average test accuracy=' + str(acc)

In [None]:
'Average test likelihood=' + str(llh)