#  Machine Learning with Python - Classification

## Learning goals

In this exercise, you will learn how to formulate and solve a classification problem. A classification problem amounts to finding a good classifier which maps a given data point via its features to a particular label. The label indicates to which class or category the data point belongs. 
We will implement **logistic regression for binary classification** and use **gradient descent** to find the optimal classifier. We will also consider a simple approach to extend binary classifiers to multiclass problems which involve more than two categories.

## Exercise Contents

1. [Introduction](#1-Introduction) - Here we formulate a classification problem.
2. [Data](#2-Data) - A description of the dataset.
3. [Exercise](#3-Exercise) - A total of 5 tasks. Read the task descriptions and answer accordingly.
    * 3.1 [**Getting Hands on the Data**](#3.1-Getting-Hands-on-the-Data)
    * 3.2 [**Logistic Regression**](#3.2-Logistic-Regression)
    * 3.3 [**Gradient Descent Step Size**](#3.3-Gradient-Descent-Step-Size)
    * 3.4 [**Accuracy - How well did we do?**](#3.4-Accuracy---How-well-did-we-do?)
    * 3.5 [**Multiclass Classification (One vs All)**](#3.5-Multiclass-Classification)

## Keywords

`Classification`,`Logistic Regression`, `Sigmoid Function`, `Gradient Descent (GD)`

## Relevant Sections in [Course Book](https://arxiv.org/abs/1805.05052)  

Section 2; Section 3.4


## 1 Introduction

Suppose you are an intern at the (fictive) company `Hunda` whose brand-new lawn mower robot uses its on-board camera to find out which surface it is currently moving on. Your job is to develop a firmware module which allows the mower robot to classify images generated by the camera according to the categories "grass", "soil", "tiles". To do so we will use a specific classification method named logistic regression. We will first apply a logistic regression to distinguish between "grass" and "no grass". Then we will use another application of logistic regression to distinguish non-grass images further between soil and tiles images. In order to develop image classifier you are provided with a bunch of snapshots that have been labeled by the previous summer intern. Thus, we can use this labeled data to train the image classifier. 

![](./images/banner.jpg)

## 2 Data

The dataset consists of $m=55$ images, stored in the folder named `images`:

* 20 images of grass (stored in the files `image_1.jpg` to `image_20.jpg`)
* 20 images of soil (stored in the files `image_21.jpg` to `image_40.jpg`)
* 15 images of tiles (stored in the files `image_41.jpg` to `image_55.jpg`)

You can use the Python package `PIL` (=the Python Imaging Library) to determine typical image characteristics (size, color etc). Some basic functions of PIL are demonstrated below, so you can use those functions in the exercise. 

In [None]:
# Import most of the required libraries for this exercise

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

# These imports are for testing purposes only

from unittest.mock  import patch
from plotchecker import ScatterPlotChecker

# Read in an image from a jpg-file and store it in the variable "im"

im = Image.open("images/image_1.jpg");

# Determine the width and height of the image

width, height = im.size;
print('width: %d, height: %d' % (width, height))

# Convert the image to RGB bitmap

rgb_im = im.convert('RGB');

# Determine RGB values of the pixel at location (2,3) 

pixel = rgb_im.getpixel((2,3))
print('Pixel (R, G, B): (%d, %d, %d)' % (pixel[0], pixel[1], pixel[2]))

## 3 Exercise

The actual exercise starts here and is divided into 5 part:

* 3.1 [**Getting Hands on the Data**](#3.1-Getting-Hands-on-the-Data)
* 3.2 [**Logistic Regression**](#3.2-Logistic-Regression)
* 3.3 [**Gradient Descent Step Size**](#3.3-Gradient-Descent-Step-Size)
* 3.4 [**Accuracy - How well did we do?**](#3.4-Accuracy---How-well-did-we-do?)
* 3.5 [**Multiclass Classification**](#3.5-Multiclass-Classification)

Your task is to fill in `...` under `### STUDENT TASK ###` in each step.

## 3.1 Getting Hands on the Data

Although the images are quite small (around $3000 \times 3000$ pixels) we cannot easily process an image by just stacking the pixels into a vector since it would result in long vectors of length around $3000^2=9\cdot 10^6$. The processing of such long vectors is challenging both, computationally as it requires a lot computation time and statistically since the resulting method is likely to overfit the training data (see Section 7 of the course book). Therefore, we will represent each image by only $n=3$ features which are given by the average red, green and blue components ("redness", "greenness" and "blueness") denoted $x_{r}$, $x_{g}$ and $x_{b}$, respectively. 

In what follows, we represent the $i$th image in the dataset using the feature vector $\mathbf{x}^{(i)} = \big(x_{r}^{(i)},x_{g}^{(i)},x_{b}^{(i)} \big)^{T} \in \mathbb{R}^{3}$. The redness of the $i$th image is defined as 

\begin{equation*}
x_{r}  = (1/J) \sum_{j=1}^{J} r^{(i)}_{j}
\end{equation*}

where $r^{(i)}_{j}$ denotes the redness (on scale $0,\ldots,255$) of the $j$th pixel in the $i$th image. The total number of pixels is denoted $J$. In particular, $x^{(1)}_{r}$ denotes the average red component of the first image in the dataset. The greenness $x_{\rm g}$ and blueness $x_{\rm b}$ are defined similarly. 

It will be convenient to stack the feature vectors $\mathbf{x}^{(i)} \in \mathbb{R}^{3}$, for $i=1,\dots,m$, obtained for all images in the dataset into the feature matrix 

<a id='xm'></a>
\begin{equation*}
    \mathbf{X} = \big(\mathbf{x}^{(1)},\dots,\mathbf{x}^{(55)}\big)^T=\begin{bmatrix}
    x^{(1)}_{r}  & x^{(1)}_{g}  & x^{(1)}_{b} \\
    \vdots & \ddots & \vdots\\
    x^{(55)}_{r} & x^{(55)}_{g} & x^{(55)}_{b}
    \end{bmatrix},\ \mathbf{X} \in \mathbb{R}^{m \times n},\ \text{where } m=55, n=3.
    \tag{1}
\end{equation*}

Beside its features $\mathbf{x}^{(i)}$ the $i$th image in our dataset is characterized by the label $y^{(i)}$ which is $y^{(i)}=1$ if the image shows grass and $y^{(i)}=0$ otherwise (i.e. it shows either soil or tiles). It is notationally convenient, to collect the labels of all images in our dataset into the label vector 

<a id='vy'></a>
\begin{equation*}
    \mathbf{y}=\big(y^{(1)},y^{(2)},\ldots,y^{(m)} \big)^{T} = \begin{bmatrix}
    y^{(1)}\\
    y^{(2)}\\
    \vdots\\
    y^{(m)}
    \end{bmatrix} \in \mathbb{R}^{m}.
    \tag{2}
\end{equation*}

### Student Tasks:

- 3.1.1: [Feature Matrix](#featurefunction)
- 3.1.2: [Label Vector](#labelfunction)
- 3.1.3: [Visualize Data](#visualizedata)

<a id='featurefunction'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> Feature Matrix. 

- Implement a Python function `feature_matrix()` which returns the feature matrix ([1](#xm)) of size $55 \times 3$.
    - For each image, loop over all pixels of the image and to compute the average red $x_{\rm r}$, green $x_{\rm g}$ and blueness $x_{\rm b}$ of the image. Stack the features of all images into the feature matrix ([1](#xm)).
    - Remember to divide each R, G and B sum with the total pixel count $J$ (which might be different for different images) to get the average value for each image.
    - The $i$th image corresponds to the $i$th row (containing $x^{(i)}_{\rm r}$, $x^{(i)}_{\rm g}$ and $x^{(i)}_{\rm b}$) in the feature matrix. 
    - Most of the commands required for this task are in the [2.Data-section](#2-Data).

</div>

In [None]:
def feature_matrix(m = 55):
    """
    Generate a feature matrix representing the images in our dataset.
    
    :param m: scalar-like, type=int, number of images, default m=55. 
    
    :return: array-like, shape=(m, n), feature-matrix with n=3 features. One feature for each color and each image.
    """
    #initialize the feature matrix with zeros. 
    X = np.zeros((m,3))

    ### STUDENT TASK ###
    # YOUR CODE HERE
    raise NotImplementedError()
    return X

In [None]:
test_matrix = feature_matrix(1)
assert test_matrix.shape == (1,3), f'feature_matrix returns wrong matrix for m=1. It should be shape (1,3), but you gave {test_matrix.shape}'
test_matrix = feature_matrix(2)
assert test_matrix.shape == (2,3), f'feature_matrix returns wrong matrix for m=1. It should be shape (2,3), but you gave {test_matrix.shape}'
np.testing.assert_allclose(test_matrix[0], [137, 164, 76],atol=1, err_msg='This is close, but not exact check that you are doing the correct sum average calculation.')
print('All tests passed!')

<a id='labelfunction'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> Label Vector. 

- Implement a Python function `labels()` which returns the label vector ([2](#vy)).
    - The vector should contain the **label** $y^{(i)}$ for each image. The label is $y^{(i)}=1$ if the $i$th image shows grass, $y^{(i)}=0$ otherwise.
    - Hint: See the [2.Data-section](#2-Data).
</div>

In [None]:
def labels(m=55):
    """
    Generate the label vector, where 1 is a Grass image and 0 is Non-Grass.
    
    :param m: scalar-like, type=int, amount of pictures
    
    :return: array-like, shape=(m, 1), label-vector
    """
    y = np.zeros((m,1));
    ### STUDENT TASK ###
    # YOUR CODE HERE
    raise NotImplementedError()
    return y

In [None]:
test_labels = labels()
assert test_labels.shape == (55,1), f'Your label vector is incorrect shape. It should be (55,1), but you gave {test_labels.shape}'
for i in [1,3,6,19]:
    assert test_labels[i] == 1, f'image_{i+1}.jpg should be a grass picture, but you labeled it as non-grass'
for i in [20,25,40,49,54]:
    assert test_labels[i] == 0, f'image_{i+1}.jpg should be a non-grass picture, but you labeled it as grass'


print('All tests passed!')

<a id='visualizedata'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> Data Visualization. 

Scatter plots can be helpful in order to reveal relations between the features and labels of data points.

- Implement a Python function `Visualize_data(X,y)` which uses as input the feature matrix $\mathbf{X} \in \mathbb{R}^{m \times n}$ (see ([1](#xm))) and the label vector $\mathbf{y}=\big(y^{(1)},\ldots,y^{(m)}\big)^{T}$ (see ([2](#vy))) and generates three **scatter plots**.
    - One scatter plot for each color combination:
        - greenness $x_{\rm g}$ vs. redness $x_{\rm r}$
        - greenness $x_{\rm g}$ vs. blueness $x_{\rm b}$ 
        - redness $x_{\rm r}$ vs. blueness $x_{\rm b}$
    - In these plots, mark each data point with a cross ("x") if it represents a grass image ($y^{(i)}=1$) and with a dot ("$\cdot$") otherwise.
    
- The style of the plots are preconfigured. Your task is to use the correct input parameters (correct values for coordinates of data points) for the `plt.scatter()` function.
 
</div>

In [None]:
def Visualize_data(X,y):
    """
    Fill the blanks to generate correct visual demonstration of features

    :param X: array-like, shape=(m, n), feature matrix where m is the amount of features
    :param y: array-like, shape=(m, 1), label-vector
    
    :return: plot-axes, python plot library axes which includes all 3 plots
    """
    indx_1 = np.where(y == 1)[0] # index of each grass picture.
    indx_2 = np.where(y == 0)[0] # index of each non-grass picture.
    
    # Set figure size (width, height)
    fig, axes = plt.subplots(1, 3,figsize=(15, 5))

    '''
    PLOT GREENNESS AGAINST REDNESS
    - Make a scatterplot of the average greenness (x-axis) vs redness (y-axis). 
    - Indicate Grass images by a cross, and others by a dot.
    '''
    ### STUDENT TASK ###
    #axes[0].scatter(...,..., c='g', marker ='x', label='Grass')
    #axes[0].scatter(...,..., c='r', marker ='o', label='Soil+Tiles')
    # YOUR CODE HERE
    raise NotImplementedError()
    axes[0].set_xlabel('Greenness of Images')
    axes[0].set_ylabel('Redness of Images')
    axes[0].legend()
    axes[0].set_title(r'$\bf{Figure\ 1.}$Green vs Red')

    '''
    PLOT GREENNESS AGAINST BLUENESS
    - The same as above but now greenness (x-axis) vs blueness (y-axis).
    '''
    ### STUDENT TASK ###
    #axes[1].scatter(..., ..., c='g', marker ='x', label='Grass')
    #axes[1].scatter(..., ..., c='b', marker ='o', label='Soil+Tiles')
    # YOUR CODE HERE
    raise NotImplementedError()
    axes[1].set_xlabel('Greenness of Images')
    axes[1].set_ylabel('Blueness of Images')
    axes[1].legend()
    axes[1].set_title(r'$\bf{Figure\ 2.}$Green vs Blue')

    '''
    PLOT REDNESS AGAINST BLUENESS
    - The same as above but now redness (y-axis) vs blueness (x-axis).
    '''
    ### STUDENT TASK ###
    #axes[2].scatter(..., ..., c='r', marker ='x', label='Grass')
    #axes[2].scatter(..., ..., c='b', marker ='o', label='Soil+Tiles')
    # YOUR CODE HERE
    raise NotImplementedError()
    axes[2].set_xlabel('Redness of Images')
    axes[2].set_ylabel('Blueness of Images')
    axes[2].legend()
    axes[2].set_title(r'$\bf{Figure\ 3.}$Red vs Blue')
    plt.tight_layout()
    plt.show()
    return axes

In [None]:
y = labels()
X = feature_matrix()

# Full Vector
# Let's label : Grass = 1 , Soil = 0, Tiles = 0
assert X.shape == (55,3), f'Expected feature matrix to be shape (55,3), but it was {X.shape}'
axes = Visualize_data(X,y)
for i in range(len(axes)):
    pc = ScatterPlotChecker(axes[i])
    color1 = pc.colors[0]
    color2 = pc.colors[-1]
    for c in range(len(pc.colors)):
        if c < 20:
            np.testing.assert_array_equal(pc.colors[c],color1,f"In Figure {i+1}. You assigned image_{c+1}.jpg color incorrectly")
        else:
            np.testing.assert_array_equal(pc.colors[c],color2,f"In Figure {i+1}. You assigned image_{c+1}.jpg color incorrectly")

print('All tests passed!')

## 3.2 Logistic Regression
Our goal is to find out the label $y$ of an image, with $y=1$ if the image shows grass and $y=0$ otherwise. This classification of an image has to be based solely on its features $\mathbf{x} = (x_{\rm r},x_{\rm g},x_{\rm b})^{T}$ given by the image redness $x_{\rm r}$, greenness $x_{\rm g}$ and blueness $x_{\rm b}$. Similar to linear regression, logistic regression applies a linear function of the form $h^{(\mathbf{w})}(\mathbf{x})= \mathbf{w}^{T} \mathbf{x}$ to predict the label $y$ based on the features $\mathbf{x} = \big(x_{r},x_{g},x_{b} \big)^{T} \in \mathbb{R}^{3}$ of the image. 

Given a linear predictor $h^{(\mathbf{w})}(\mathbf{x})$, with some weight vector $\mathbf{w} \in \mathbb{R}^{n}$, we classify an image (with feature vector $\mathbf{x}$) as $\hat{y} = 1$ if $h^{(\mathbf{w})}(\mathbf{x})=\mathbf{x}^{T} \mathbf{w} \geq 0$ and $\hat{y}=0$ otherwise. In order to measure the quality of a particular classifier $h^{(\mathbf{w})}$ we use the **logistic loss** defined as:

\begin{equation*}
    \mathcal{L}\big((\mathbf{x},y),h^{(\mathbf{w})}\big) = -y\log\big(\sigma(h^{(\mathbf{w})}(\mathbf{x}))\big)-(1-y)\log\big(1-\sigma(h^{(\mathbf{w})}(\mathbf{x}))\big)
    \label{loss}
    \tag{3}
\end{equation*}
with the sigmoid function,
<a id='sigmoid'></a>
\begin{equation*}
    \sigma(z)= \frac{1}{1+{\rm exp}(-z)}.
    \label{sigmoid}
    \tag{4}
\end{equation*}

**Note that the expression \eqref{loss} for the logistic loss applies only if the classes are encoded as $y=1$ and $y=0$. If the two classes are encoded as $y=1$ and $y=-1$, we obtain a different formula for the logistic loss.** 

Since we have $m=55$ labeled images, each of them characterized by the features $\mathbf{x}^{(i)}$ and the true label $y^{(i)}$, we can evaluate the logistic loss for all those images to obtain the empirical risk

<a id='er'></a>
\begin{align}
\mathcal{E}(\mathbf{w}) & = (1/m) \sum_{i=1}^{m} \mathcal{L}((\mathbf{x}^{(i)},y^{(i)}),\ h^{(\mathbf{w})}) \nonumber \\ 
&  = (1/m) \sum_{i=1}^{m} -y^{(i)}\log\big(\sigma(\mathbf{w}^{T}\mathbf{x}^{(i)})\big)-(1-y^{(i)})\log\big(1-\sigma(\mathbf{w}^{T}\mathbf{x}^{(i)})\big)
   \label{erm}
    \tag{5}
\end{align}

Note that the empirical risk $\mathcal{E}( \mathbf{w})$ is a differentiable convex function of the weight vector $\mathbf{w}$. Therefore, we can use **gradient descent (GD)** to find the weight vector $\mathbf{w}_{\rm opt}$ which minimizes the empirical risk $\mathcal{E}(\mathbf{w})$. In particular, GD constructs a sequence of weight vectors $\mathbf{w}^{(k)}$ by iterating (=repeating) the GD update

<a id='gd'></a>
\begin{equation*}
    \mathbf{w}^{(k+1)} = \mathbf{w}^{(k)} − \alpha\nabla \mathcal{E}(\mathbf{w}^{(k)})
    \label{gd}
    \tag{6}
\end{equation*}

With appropriate step size $\alpha$, the iterate $\mathbf{w}^{(k)}$ becomes increasingly accurate approximation of the optimal weight vector,  i.e. $\mathbf{w}^{(k)}$ converges to $\mathbf{w}_{\rm opt}$,
\begin{equation*}
    \lim_{k \rightarrow \infty} \mathbf{w}^{(k)} = \mathbf{w}_{\rm opt}
\end{equation*}

Assume we run the GD updates for $k$ iterations which results in the weight vector $\mathbf{w}^{(k)}$ and corresponding classifier map $h^{(\mathbf{w}^{(k)})}(\mathbf{x}) = \big(\mathbf{w}^{(k)} \big)^{T} \mathbf{x}$. Using this classifier map, we can compute the predicted label for a new image with features $\mathbf{x}=\big(x_{r},x_{g},x_{b}\big)^{T}\in \mathbb{R}^{3}$ via simple thresholding

<a id='classify'></a>
\begin{equation*} 
    \hat{y} = \begin{cases} 
        1 &\text{if}\ \mathbf{x}^{T} \mathbf{w}^{(k)} \geq 0\\
        0 &\text{if}\ \mathbf{x}^{T} \mathbf{w}^{(k)} < 0.
    \end{cases}
    \label{eq_classify}
    \tag{7}
\end{equation*}

### Student Tasks
- 3.2.1: [Sigmoid Function](#sigmoidfunction)
- 3.2.2: [Empirical Risk](#erfunction)
- 3.2.3: [The Gradient](#gradientfunction)
- 3.2.4: [Gradient Descent](#gradientdescentfunction)
- 3.2.5: [Classification](#predictfunction)

<a id='sigmoidfunction'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> Sigmoid Function. 

Implement a Python function `sigmoid_function(X,w)` which
- reads in the feature matrix $\mathbf{X} \in \mathbb{R}^{m \times 3}$ and weight vector $\mathbf{w} \in \mathbb{R}^{3}$.
- The function should return a vector of length $m$ whose entries are the function values ([4](#sigmoid)) of the sigmoid function.
    
</div>

In [None]:
def sigmoid_func(X,w):
    """
    Create a function that calculates the sigmoid function for given input
    
    :param X: array-like, shape=(m, n), feature matrix where n is the amount of features and m is amount of pictures
    :param w: array-like, shape=(1, n), weight vector with length of amount of features

    :return: array-like, shape=(m, 1), sigmoid function values. Each in [0,1] interval.
    """
    ### STUDENT TASK ###
    # sigmoid = ...
    # YOUR CODE HERE
    raise NotImplementedError()
    return sigmoid

In [None]:
"""
These are just simple tests to check that your function outputs
in correct format and gives a correct result with simple input
"""
test_input = np.zeros((10,2))
test_output = np.array([0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5])
sig_res = sigmoid_func(test_input, np.array([1,1]))
assert sig_res.shape == test_output.shape, f'Your sigmoid function outputs array in incorrect shape. It should be (m,1), but it was {sig_res.shape}'
np.testing.assert_array_equal(sig_res, test_output)
print('All tests passed!')

<a id='erfunction'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> Empirical Risk. 


Implement a Python function `empirical_risk(X,y,w)` which
- reads in the feature matrix $\mathbf{X} \in \mathbb{R}^{m \times 3}$, the label vector $\mathbf{y} \in \mathbb{R}^{m}$ and the weight vector $\mathbf{w} \in \mathbb{R}^{3}$ of a classifier $h^{(\mathbf{w})}$.
- The function should return a scalar number which is the empirical risk ([5](#er)). 

</div>

In [None]:
def empirical_risk(X,y,w):
    """
    Calculate the empricial risk of the logistic regression with current weight vector

    :param X: array-like, shape=(m, n), feature matrix where n is the amount of features
    :param y: array-like, shape=(m, 1), label-vector
    :param w: array-like, shape=(1, n), weight vector size of a feature
    
    :return: scalar-like, type=Integer, loss or risk between the label and the prediction
    """
    loss = float('inf')
    ### STUDENT TASK ###
    # YOUR CODE HERE
    raise NotImplementedError()
    
    return loss

In [None]:
"""
These are just simple tests to check that your function outputs
in correct format and gives a correct result with simple input
"""
test_input_X =  np.array([[1,0]])
test_input_y = np.array([1])
test_input_w =  np.array([1,0])
test_output = 0.3132
er_res = empirical_risk(test_input_X,test_input_y,test_input_w)
np.testing.assert_almost_equal(er_res, test_output,  decimal=3,err_msg="Your empirical_risk function outputs incorrectly in a test case")
print('All tests passed!')

<a id='gradientfunction'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> The Gradient.


In order to use gradient descent for finding a good weight vector $\mathbf{w}$, we need to be able to compute the gradient of the function $\mathcal{E}(\mathbf{w})$. 

Implement a Python function `gradient(X,y,w)` which
- reads in the feature matrix $\mathbf{X} \in \mathbb{R}^{m \times n}$, the label vector $\mathbf{y} \in \mathbb{R}^{m}$ and a weight vector $\mathbf{w} \in \mathbb{R}^{n}$.
- The function should return the gradient of the empirical risk $\mathcal{E}(\mathbf{w})$ ([5](#er)) i.e. the partial derivate of the loss function respect to $\mathbf{w}$.
    - Calculating the gradient is actually pretty lengthy and non-trivial job. If you are struggling with this, feel free to look it up from the interwebs. We don't judge, but we expect you to understand the calculation.
</div>

In [None]:
def gradient(X,y,w):
    """
    Calculate a gradient

    :param X: array-like, shape=(m, n), feature matrix where n is the amount of features
    :param y: array-like, shape=(m, 1), label-vector
    :param w: array-like, shape=(1, n), weigh vector size of a feature
    
    :return: array-like, shape=(m, 1)
    """
    ### STUDENT TASK ###
    # grad = ...
    # YOUR CODE HERE
    raise NotImplementedError()
    return grad

In [None]:
"""
These are just simple tests to check that your function outputs
in correct format and gives a correct result with simple input
"""
test_input_X = np.ones((10,3))
test_input_y = np.ones(10)
test_input_w = np.zeros(3)
test_output = np.array([-0.5,-0.5,-0.5])
grad_res = gradient(test_input_X, test_input_y, test_input_w)
assert grad_res.shape == test_output.shape == (3,), f'Your Gradient function should output output a result with same shape as a weight vector. Yours was {grad_res.shape}'
np.testing.assert_array_equal(grad_res, test_output,'You probably have incorrect calculations in your gradient function since result was not the same with test inputs')
print('All tests passed!')

<a id='gradientdescentfunction'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> Gradient Descent. 

Now that we have all the necessary components to create the final Gradient Descent function.
So, implement a Python function `gradient_descent(X,y,step_size, K)` which
- reads in the feature matrix $\mathbf{X} \in \mathbb{R}^{m \times n}$, the label vector $\mathbf{y} \in \mathbb{R}^{m}$, the GD step size `step_size` and the number `K` of GD steps.
- The function should deliver the weight vector $\mathbf{w}^{(K)}$ obtained after $K$ GD steps and a vector of length $K$ whose $k$th entry is the empirical risk $\mathcal{E}(\mathbf{w}^{(k)})$ achieved by the weight vector $\mathbf{w}^{(k)}$ generated after $k$ GD steps.
</div>

In [None]:
def gradient_descent(X,y,step_size, K=3000):
    """
    Gradient Descent with Logistic Regression
    
    :param X: array-like, shape=(m, n), feature matrix where n is the amount of features
    :param y: array-like, shape=(m, 1), label-vector
    :param step_size: scalar-like, type=int, defines step size of each iteration of gradient descent
    :param K: scalar-like, type=int, how many steps we should take.
              Defaults to 3000. You can change this for personal testing to make gradient finish faster
    
    :return er_list: array-like, shape=(K,), vector containing error after each step of gradient descent
    :return w: array-like, shape=(1, n), final weight vector after K iterations.
    """
    n = X.shape[1]
    # Initialize w as 1xn array.
    w = np.zeros((1,n))
    er_list = np.zeros(K)
    for i in range(K):
        ### STUDENT TASK ###
        # YOUR CODE HERE
        raise NotImplementedError()
    return er_list, w

In [None]:
"""
These are just simple tests to check that your function outputs
in correct format and gives a correct result with simple input
"""
test_input_X = np.array([[1,0]])
test_input_y = np.array([1])
test_output = np.array([[0.44,0]])
grad_res = gradient_descent(test_input_X,test_input_y, 0.1, 10)

assert len(grad_res[0]) == 10, f'Ten iterations should lead into 10 empirical risk results. You had {len(grad_res[0])}'
assert grad_res[1].shape == test_output.shape == (1,2), f'Weight vector should be shape (1,2), but you got {grad_res[1].shape}'
np.testing.assert_allclose(grad_res[1], test_output,atol=0.05)
assert grad_res[1][0][1] == 0.0, 'In this test case, second value should be exactly zero. You are probably doing something wrong'

print('All tests passed!')

<a id='predictfunction'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> Classification.

After we have computed (approximately) the optimal weight vector using GD, we need to apply it to data points for  predicting their labels (see ([7](#classify))). Your task is to implement a Python function `predict_output(X,w)` which
- takes as input a feature matrix $\mathbf{X}=\big(\mathbf{x}^{(1)},\ldots,\mathbf{x}^{(m)}\big)^{T} \in \mathbb{R}^{m \times n}$, containing feature vectors $\mathbf{x}^{(i)}$ in its rows, and a weight vector $\mathbf{w}$ (which might have been obtained from GD).
- The function should return a vector $\hat{\mathbf{y}}=\big(\hat{y}^{(1)},\ldots,\hat{y}^{(m)}\big)^{T}$ of length $m$ containing the predicted labels $\mathbf{\hat{y}^{(i)}}$ (according to Eq. ([7](#classify))).
</div>

In [None]:
def predict_output(X,w):
    """
    Calculate the prediction with original feature matrix and final weight vector
    
    :param X: array-like, shape=(m, n), feature matrix where n is the amount of features
    :param w: array-like, shape=(1, n), weigh vector size of a feature
    
    :return: array-like, shape=(m, 1), prediction with given weight vector and feature matrix.
    """
    ### STUDENT TASK ###
    # YOUR CODE HERE
    raise NotImplementedError()
    return y

In [None]:
"""
These are just simple tests to check that your function outputs
in correct format and gives a correct result with simple input.
"""
with patch("__main__.sigmoid_func",side_effect=sigmoid_func) as mock:
    y_predict_test = predict_output(np.array([[-1,1],[0,1],[-1,1],[1,1]]),np.array([[1,0]]))
    assert y_predict_test.shape == (4,1) , f'Your predict should be shape (4,1), but it was {y_predict_test.shape}'
    np.testing.assert_array_equal(y_predict_test, np.array([[0],[1],[0],[1]]))
    assert mock.called, 'You should call sigmoid_func during predict_output execution'

for i in np.linspace(-1,1,100,endpoint=False):
    if i < 0:
        np.testing.assert_array_equal(
            predict_output(np.array([[i,1]]),np.array([[1,0]])), #Input shape (1,1) feature matrix and shape (1,2) weight vector
            np.array([[0]]), #Result should be
            "Incorrect output. Check that you defined and interpreted sigmoid/prediction correctly.")
    else:
        np.testing.assert_array_equal(
            predict_output(np.array([[i,1]]),np.array([[1,0]])), #Input shape (1,1) feature matrix and shape (1,2) weight vector
            np.array([[1]]), # Result should be
            "Incorrect output. Check that you defined and interpreted sigmoid/prediction correctly.")

print('All tests passed! Good job!')

In [None]:
"""
This is the final testing ground where we go through the execution chain with real inputs.
This also checks that you called each function required for gradient descent.
"""
step_size = 1e-5
num_iter = 3000
e_list, w_opt = gradient_descent(X,y,step_size,num_iter)
print('The optimal weight vector is:', w_opt)
y_hat = predict_output(X,w_opt)

assert np.sum(e_list)/num_iter < 40
assert y_hat.shape == (55,1)

# Test that each function is called
with patch("__main__.sigmoid_func",side_effect=sigmoid_func) as mock:
    gradient_descent(np.array([[0,1]]),np.array([0]),step_size,200)
    assert mock.called, 'You should call sigmoid_func function during the execution of gradient_descent()'
with patch("__main__.gradient",side_effect=gradient) as mock:
    gradient_descent(np.array([[0,1]]),np.array([0]),step_size,200)
    assert mock.called, 'You should call gradient function during the execution of gradient_descent()'
with patch("__main__.empirical_risk",side_effect=empirical_risk) as mock:
    gradient_descent(np.array([[0,1]]),np.array([0]),step_size,200)
    assert mock.called, 'You should call empirical_risk function during the execution of gradient_descent()'
print('All test passed!')

## 3.3 Gradient Descent Step Size

The performance of GD depends crucially on the step size or "learning rate" $\alpha$ (see ([6](#gd))). This task requires you to investigate how different choices for the step size influence the behavior of GD. 
<a id='stepsize'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> Choosing Step-Size. 

Implement a Python function `visualize_error(X, y)` which
- reads in the feature matrix $\mathbf{X} \in \mathbb{R}^{m \times n}$, the label vector $\mathbf{y} \in \mathbb{R}^{m}$
- The function should run GD using the values `step_sizes=[0.1,0.5,1,5,10,16]` for the step size/learning rate. For each step-size determine  the empirical risk $\mathcal{E}(\mathbf{w}^{(k)})$ obtained by the weight vector $\mathbf{w}^{(k)}$ delivered by GD after $k$ iterations. 
    - The function should generate two plots, both containing the same curves but using different colours. In particular, the second plot should contain 5 curves coloured blue and 1 curve coloured red.       
        - Each curve corresponds to one choice for the step-size, resulting in 6 curves. Each curve should be generated using the empirical risk $\mathcal{E}(\mathbf{w}^{(k)})$ as a function of the GD iteration $k$.
    - Determine which of the step size values results in the fastest decrease of the empirical risk and mark the corresponding curve using colour red (i.e. by changing `best=None` to right step size. For example if you think that the best step size is 1, choose `best=1`). Now you should obtain a plot where one line is red and rest of them are blue.
</div>

In [None]:
def visualize_error(X, y ):
    """
    Generate 2 plots which visualize the error over each gradient descent step
    
    :param X: array-like, shape=(m, n), feature matrix where n is the amount of features
    :param y: array-like, shape=(m, 1), label-vector
   
    :return axes: plot-axes, python plot library axes which include both plots
    """
    plt.figure(figsize=(12, 4))
    
    # how many GD steps we should take.
    # Defaults to 2000. You can change this for personal testing to make gradient finish faster
    
    num_iter = 2000
    
    #  here we store the best learning rate/step-size
    
    ### STUDENT TASK ###
    # Change best=None into step size from the list that provides the fastest converge. e.g best=1
    best = None
    # YOUR CODE HERE
    raise NotImplementedError()
    
    #  different values to be used for the GD step size 
    
    step_sizes=[0.1,0.5,1,5,10,16]
    
    fig, axes = plt.subplots(1, 2,figsize=(12, 4))
    for step in step_sizes:
        ### STUDENT TASK ###
        # Plot Error against Step Size
        # loss_list, _ = 
        # YOUR CODE HERE
        raise NotImplementedError()
        n = len(loss_list) # Size of list remains the same.
        x_axes = np.linspace(0,n,n,endpoint=False)
        axes[0].plot(x_axes, loss_list, label=step)
        
        ### STUDENT TASK ###
        # Plot Error against Step Size.
        # Now mark the best converge in red. Use value from best as a correct step size.
        if step == best:
            axes[1].plot(x_axes, loss_list, label=step, color="red")
        else:
            axes[1].plot(x_axes, loss_list, label=step, color="blue")
    axes[0].set_xlabel('Number of Iterations')
    axes[0].set_ylabel('Loss Function')
    axes[0].legend()
    axes[0].set_title(r'$\bf{Figure\ 4.}$Converge of GD')
    axes[1].set_xlabel('Number of Iterations')
    axes[1].set_ylabel('Loss Function')
    axes[1].legend()
    axes[1].set_title(r'$\bf{Figure\ 5.}$Converge of GD')
    plt.tight_layout()
    plt.show()
    return best, axes

In [None]:
res0_1, axes = visualize_error(X/255, y)
from plotchecker import LinePlotChecker

for i in range(len(axes)):
    pc = LinePlotChecker(axes[i])
    pc.assert_num_lines(6)
    
assert res0_1 in [0.1,0.5,1,5,10,16], "You should choose the best Converge line from the given list"

## 3.4 Accuracy - How well did we do?
In order to assess how well our model works, we calculate the accuracy achieved by the classifier $h^{(\mathbf{w})}$ obtained from task 3.3. We do this by computing the fraction of correctly labeled images, where the true label $y^{(i)}$ is equal to the predicted label $\hat{y}^{(i)}$:

\begin{equation*}
    \text{Accuracy} =\dfrac{1}{m} \sum_{i=1}^{m} \mathcal{I}(\hat{y}^{(i)} = y^{(i)})
    \label{acc}
    \tag{8}
\end{equation*}

Here $\mathcal{I}(\hat{y}^{(i)} = y^{(i)})$ denoes the indicator function which is equal to one if the argument is a correct statement, i.e., if $\hat{y}^{(i)} = y^{(i)}$, and equal zero otherwise. 

<a id='accuracy'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> Compute Accuracy. 
 
Implement a Python function `calculate_accuracy(y,y_hat)` according to (Eq. \ref{acc}) which
- takes as inputs a vector $\mathbf{y}=\big(y^{(1)},\ldots,y^{(m)}\big)^{T}$ of true labels and another vector $\mathbf{\hat{y}}=\big(\hat{y}^{(1)},\ldots,\hat{y}^{(m)}\big)^{T}$ containing predicted labels.
- The function should return the accuracy (Eq. \ref{acc}) (as percentage).
</div>

In [None]:
def calculate_accuracy(y, y_hat):
    """
    Calculate accuracy of your prediction
    
    :param y: array-like, shape=(m, 1), correct label vector
    :param y_hat: array-like, shape=(m, 1), label-vector prediction
    
    :return: scalar-like, percentual accuracy of your prediction
    """
    ### STUDENT TASK ###
    # YOUR CODE HERE
    raise NotImplementedError()
    return accuracy
print ('Accuracy of the result is: %f%%' % calculate_accuracy(y,y_hat))


In [None]:
np.testing.assert_equal(100,calculate_accuracy(np.array([0]),np.array([0])))
np.testing.assert_equal(0,calculate_accuracy(np.array([1]),np.array([0])))
np.testing.assert_equal(50,calculate_accuracy(np.array([1,0]),np.array([0,0])))
np.testing.assert_equal(25,calculate_accuracy(np.array([1,1,0,0]),np.array([0,0,1,0])))

test_acc = calculate_accuracy(y,y_hat)

assert 70 < test_acc < 100, "Your accuracy should be above 70%"
assert 75 < test_acc, "Your accuracy was too weak"
assert test_acc < 92, "Your accuracy was too good. You are probably not using correct methods."
print('All tests passed!')

## 3.5 Multiclass Classification

We will now extend logistic regression, which we used for binary classification above, to classify images according to the three categories "grass", "soil" or "tiles". So is all of our previous work on classifying images into "grass" vs. "no grass" for nothing? Nope! Adapting a binary classification method (using two different label values such as $y=1$ and $y=0$) to this multiclass task is straightforward using the **“one vs rest” technique**.

The idea is quite simple: split the multiclass problem into three subproblems, each subproblem being one binary classification problem as in [`3.3 Logistic Regression`](#3.2-Logistic-Regression). More specifically, we can solve the problem of classifying images into three classes "grass", "soil" or "tiles" by instead solving three subproblems (which are binary classification problems!):

1. subproblem: classify images into "grass" $(y=1)$ vs. "no grass"$(y=0)$ 
2. subproblem: classify images into "soil" $(y=1)$ vs. "no soil" $(y=0)$ 
3. subproblem: classify images into "tiles" $(y=1)$ vs. "no tiles" $(y=0)$

We can re-use the work done in [`3.3 Logistic Regression`](#3.2-Logistic-Regression), to solve each of these subproblems. The first subproblem has already been solved in [Sec. 3.2](#3.2-Logistic-Regression). For the other two subproblems, we only need to modify the label vector ([2](#vy)).

As an example, the first image (which shows grass) has the label $y^{(1)}=1$ in subproblem 1 but a different label $y^{(1)}=0$ in subproblems 2 and 3, since it is neither soil nor tile. For each subproblem we get a different optimal weight vector ($\mathbf{w}^{(\rm grass)}$, $\mathbf{w}^{(\rm soil)}$ or $\mathbf{w}^{(\rm tiles)}$) by solving the empirical risk minimization problem (5) using GD. 

### Example

Assume we want to classify a new image. We generate a feature vector $\mathbf{x}=(x_{r},x_{g},x_{b})^{T}$ and use our predictor three times, yielding the following prediction values: 

1. subproblem: $h^{(\mathbf{w}^{(\rm grass)})}(\mathbf{x}) = 0.1$ ("grass vs. no grass")
2. subproblem: $h^{(\mathbf{w}^{(\rm soil)})}(\mathbf{x}) = 0.4$ ("soil vs. no soil") 
3. subproblem: $h^{(\mathbf{w}^{(\rm tiles)})}(\mathbf{x}) = 0.8$ ("tiles vs. no tiles")

From these results, we can see that the predictor $h^{(\mathbf{w}^{(\rm tiles)})}(x)$ for subproblem 3 (`tiles` vs. `no tiles`) yields the highest confidence. Hence, we classify this image as `tiles`. 

<img src="./images/MulticlassHunda.jpg" style="width: 600px"/>

### Student Tasks
- 3.5.1: [Multiclass Labels](#sublabels)
- 3.5.2: [Multiclass Gradient](#multgrad)
- 3.5.3: [Multiclass Predict](#multpredict)
- 3.5.4: [Multiclass Accuracy](#multacc)
- 3.5.5: [Confusion Matrix](#multvis)

<a id='sublabels'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> Multiclass Labels. 

Label vector should change depending on the subproblem.
    
- Implement a Python function `sub_labels()` which takes the number of data points $m$ and the subproblem index $k$ as an input:
 - $k=0$ means subproblem 1, where we try to predict which pictures are grass
     - i.e. Grass pictures are equal to `1` where as all soil and tiles pictures are `0`
 - $k=1$ means subproblem 2, where we try to predict which picture represents soil
     - i.e. Soil pictures are equal to `1` where as all grass and tiles pictures are `0`
 - $k=2$ means subproblem 3, where we try to predict which picture represents tiles
     - i.e. Tile pictures are equal to `1` where as all grass and soil pictures are `0`
 - function should return the label vector $\mathbf{y}$ for that subproblem based on $k$. 
 
 **NOTE:** You can use the order of pictures to define the label vector. Use [2. Data-section](#2-Data) for reference.
</div>

In [None]:
def sub_labels(m=55,k=0):
    """
    Generate label vector for subproblem k
    
    :param m: scalar-like, number of pictures
    :param k: scalar-like, subproblem number indication
    
    :return: array-like, shape=(m,1)
    """
    y = np.zeros((m,1));
    ### STUDENT TASK ###
    ## Generate the label vector which has value 1 for the pictures of the subproblem we are currently looking at (indicated by k) 
    ## and 0 for the other two subproblems. 
    # YOUR CODE HERE
    raise NotImplementedError()
    return y

In [None]:
test_sub_labels = sub_labels(m=55,k=0)
assert test_sub_labels.shape == (55,1), f'Your label vector should be shape (55,1), but it was {test_sub_labels.shape}'
assert test_sub_labels[0] == 1, f'In subproblem 1, grass pictures should be 1, you have assigned picture in index 0 to {test_sub_labels[0]}'
assert test_sub_labels[54] == 0, f'In subproblem 1, tile pictures should be 0, you have assigned picture in index 54 to {test_sub_labels[54]}'
test_sub_labels = sub_labels(m=55,k=1)
assert test_sub_labels[0] == 0, f'In subproblem 2, grass pictures should be 0, you have assigned picture in index 0 to {test_sub_labels[0]}'
assert test_sub_labels[54] == 0, f'In subproblem 2, grass pictures should be 0, you have assigned picture in index 54 to {test_sub_labels[54]}'
test_sub_labels = sub_labels(m=55,k=2)
assert test_sub_labels[0] == 0, f'In subproblem 3, grass pictures should be 1, you have assigned picture in index 0 to {test_sub_labels[0]}'
assert test_sub_labels[54] == 1, f'In subproblem 3, grass pictures should be 0, you have assigned picture in index 54 to {test_sub_labels[54]}'

test_sub_labels = sub_labels(25,2)
assert test_sub_labels.shape == (25,1), f'Incorrect label shape, it should be (25,1), but got {test_sub_labels.shape}'
test_sub_labels = sub_labels(20,2)
assert test_sub_labels.shape == (20,1), f'Incorrect label shape, it should be (20,1), but got {test_sub_labels.shape}'

print('All tests passed!')

<a id='multgrad'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> Multiclass Learning.
    
Implement a Python function `multiclass_gradient_descent(X, step_size = 1e-5, steps = 3000)` which
- reads in the feature matrix $\mathbf{X}$ (see ([1](#xm))), step size (`step_size` with default =1e-5) and the number of GD iterations (`steps` with default 3000).
- This function implements GD for the three subproblems and then outputs a matrix $\mathbf{W} = \big( \mathbf{w}^{(\rm grass)},\mathbf{w}^{(\rm soil)},\mathbf{w}^{(\rm tiles)}\big)$. One for each subproblem
    - Use the `gradient_descent()` and `sigmoid_func()` of [section 3.2](#3.2-Logistic-Regression) to get the optimal weight vector
    
**NOTE:** You should be able to complete this part by writing a couple of lines of code
</div>

In [None]:
def multiclass_gradient_descent(X, step_size = 1e-5, steps = 3000):
    """
    :param X: array-like, shape=(m,n), feature matrix, m pictures, n features
    :param step_size: scalar-like, the step size of the gradient descent
    :param steps: scalar-like, how many steps does the gradient descent do.

    :return: array-like, shape=(n,3), 3 weight vectors (columns). One for each subproblem. n is amount of features
    """
    sub_weights = np.zeros((X.shape[1],3))
    for i in range(0,3):
        ### STUDENT TASK ###
        # YOUR CODE HERE
        raise NotImplementedError()
    return sub_weights

In [None]:
test_mult_X = np.array([[1,0]])
test_mult_grad = multiclass_gradient_descent(test_mult_X)
assert test_mult_grad.shape == (2,3), f'Feature matrix with 2 features and 3 subproblems should result a weight vector matrix with 3 vectors and legnth 2. i.e. shape (2,3), but you gave {test_mult_grad.shape}'
print('All tests passed!')

<a id='multpredict'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> Multiclass Classification. 
  
Now that we have a matrix a weight vectors, it's time to create predictions!

Implement Python function `multiclass_predict()` where you:
1. Iterate over each weight vector ($\mathbf{w}^{(\rm grass)}$, $\mathbf{w}^{(\rm soil)}$ or $\mathbf{w}^{(\rm tiles)}$)
2. Calculate a prediction using feature matrix and chosen weight vector
3. Store this prediction into a result matrix
4. After all predictions have been calculated, choose a prediction with highest confidence
    * Hint: if you store results in order into a result matrix, generating $\hat{y}$ can be done with single command: `np.argmax`
5. Return our prediction, $\hat{y}$
</div>

In [None]:
def multiclass_predict(X, weight_vectors):
    """
    Calculate a prediction for each subproblem and
    choose which class (0=grass,1=soil or 2=tiles) each picture fits the best.
    i.e. which one of the 3 predictions (column) has the highest confidence?
    
    :param X: array-like, shape=(m, n), feature matrix where n is the amount of features and m amount of pictures
    :param weight_vectors: array-like, shape=(n, 3), weigth vectors for data with n features. One weight vectors per subproblem
    :param result: array-like, shape=(55,3)
    
    :return: array-like, shape=(55,1), final prediction.
    """
    results = np.zeros((X.shape[0],3))
    for c in range(0,3):
        ### STUDENT TASK ###
        # YOUR CODE HERE
        raise NotImplementedError()
    return y_hat

In [None]:
test_predict_X = np.array([[1,0]])
test_predict_w = np.array([[1,0,0],[1,0,0]])
test_mul_predict = multiclass_predict(test_predict_X, test_predict_w)
assert test_mul_predict.shape == (1,),f'Prediction with single row in feature matrix should result only one result, but you gave {test_mul_predict.shape}'
assert test_mul_predict == 0, f'Result should be 0, but you gave {test_mul_predict}'
print('All tests passed!')

<a id='multacc'></a>
    <div class=" alert alert-warning">
<b>Student Task.</b> Multiclass Accuracy.

Let's use the function that we defined [previously](#3.4-Accuracy---How-well-did-we-do?) to see what is prediction's overall accuracy.
- Use the function `calculate_accuracy` of section 3.4 to calculate the accuracy. 
- We only need to define correct labels for `y_mult` based on [2. Data-section](#2-Data)
    - Output should be a label vector of shape (55,1) where each row is one of 3 different class numbers (0, 1 or 2 representing grass, soil and tiles respectively)
        - i.e. label vector used in this function is **not** an output from `sub_labels()`
</div>

In [None]:
def multiclass_accuracy(y_hat):
    """
    Calculate accuracy of your prediction
    :param y_hat: array-like, shape=(m, 1), label-vector prediction
    
    :return: scalar-like, accuracy of your prediction
    """
    ### STUDENT TASK ###
    #y_mult = 
    # YOUR CODE HERE
    raise NotImplementedError()
    
    return calculate_accuracy(y_mult, y_hat)

In [None]:
w_opts = multiclass_gradient_descent(X)
m_y_hat = multiclass_predict(X, w_opts)
m_acc = multiclass_accuracy(m_y_hat)

print(f'Weight Vectors:{w_opts}\nPrediction: {m_y_hat}')
assert m_y_hat.shape == (55,), f"Incorrect shapes, {m_y_hat.shape} instead of (55,)"
assert m_acc > 75, f"You accuracy should be over 75%. Yours was {m_acc}"
assert m_acc < 87, "Your accuracy was too good. You are probably using incorrect methods."

from unittest.mock  import patch
with patch("__main__.gradient_descent",side_effect=gradient_descent) as mock:
    multiclass_gradient_descent(X)
    assert mock.called, "Remember to reuse functions that you have already defined."
with patch("__main__.sigmoid_func",side_effect=sigmoid_func) as mock:
    multiclass_gradient_descent(X)
    assert mock.called, "Remember to reuse functions that you have already defined."
with patch("__main__.calculate_accuracy",side_effect=calculate_accuracy) as mock:
    multiclass_accuracy(m_y_hat)
    assert mock.called, "Remember to reuse functions that you have already defined."
    
print('All tests passed!')

<a id='multvis'></a>
    <div class=" alert alert-warning">
<b>Demo.</b> Confusion Matrix. 

Computing the accuracy, as the fraction of correctly classified images, is one way to check how well you did. However, in some cases the accuracy can be misleading, particularly for applications where the different classes occur with significantly different probabilities ("imbalanced data"). A more fine-grained assessment of a classification method is provided by the confusion matrix. 

- After executing the cell below, you should see a confusion matrix. Assuming that you have completed previous steps
</div>


In [None]:
import itertools
from sklearn.metrics import confusion_matrix
def visualize_cm(cm):
    """
    Function visualizes a confusion matrix with and without normalization
    """
    fig, axes = plt.subplots(1, 2,figsize=(12, 4))

    im1 = axes[0].imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
    fig.colorbar(im1, ax=axes[0])
    classes = ['grass','soil','tiles']
    tick_marks = np.arange(len(classes))
    axes[0].set_xticks(tick_marks)
    axes[0].set_xticklabels(classes,rotation=45)
    axes[0].set_yticks(tick_marks)
    axes[0].set_yticklabels(classes)

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        axes[0].text(j, i, format(cm[i, j], 'd'),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    axes[0].set_xlabel('Predicted label')
    axes[0].set_ylabel('True label')
    axes[0].set_title(r'$\bf{Figure\ 6.}$Without normalization')
    
    cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    im2 = axes[1].imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
    fig.colorbar(im2, ax=axes[1])
    
    axes[1].set_xticks(tick_marks)
    axes[1].set_xticklabels(classes,rotation=45)
    axes[1].set_yticks(tick_marks)
    axes[1].set_yticklabels(classes)

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        axes[1].text(j, i, format(cm[i, j], '.2f'),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    axes[1].set_xlabel('Predicted label')
    axes[1].set_ylabel('True label')
    axes[1].set_title(r'$\bf{Figure\ 7.}$Normalized')
    plt.tight_layout()
    plt.show()

# Compute confusion matrix

cnf_matrix = confusion_matrix(y, m_y_hat)
np.set_printoptions(precision=2)

# display the confusion matrix using an intensity plot

visualize_cm(cnf_matrix)
