<a href="https://colab.research.google.com/github/asia281/dnn2022/blob/main/Asia_of_Bootcamp_ML%2C_Lab_4_softmax_regression_student_version.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center><img src='https://drive.google.com/uc?id=1_utx_ZGclmCwNttSe40kYA6VHzNocdET' height="60"></center>

AI TECH - Akademia Innowacyjnych Zastosowań Technologii Cyfrowych. Programu Operacyjnego Polska Cyfrowa na lata 2014-2020
<hr>

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>

<center>
Projekt współfinansowany ze środków Unii Europejskiej w ramach Europejskiego Funduszu Rozwoju Regionalnego
Program Operacyjny Polska Cyfrowa na lata 2014-2020,
Oś Priorytetowa nr 3 "Cyfrowe kompetencje społeczeństwa" Działanie  nr 3.2 "Innowacyjne rozwiązania na rzecz aktywizacji cyfrowej"
Tytuł projektu:  „Akademia Innowacyjnych Zastosowań Technologii Cyfrowych (AI Tech)”
    </center>

Let's start with importing the MNIST dataset.

In [1]:
!wget -O mnist.npz https://s3.amazonaws.com/img-datasets/mnist.npz
#!pip install plotly==5.3.1

--2023-10-24 07:52:56--  https://s3.amazonaws.com/img-datasets/mnist.npz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 16.182.74.168, 52.217.234.168, 52.216.57.88, ...
Connecting to s3.amazonaws.com (s3.amazonaws.com)|16.182.74.168|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11490434 (11M) [application/octet-stream]
Saving to: ‘mnist.npz’


2023-10-24 07:52:57 (18.0 MB/s) - ‘mnist.npz’ saved [11490434/11490434]



In [2]:
import numpy as np

def load_mnist(path='mnist.npz'):
    with np.load(path) as f:
        x_train, _y_train = f['x_train'], f['y_train']
        x_test, _y_test = f['x_test'], f['y_test']

    x_train = x_train.reshape(-1, 28 * 28) / 255.
    x_test = x_test.reshape(-1, 28 * 28) / 255.

    y_train = np.zeros((_y_train.shape[0], 10))
    y_train[np.arange(_y_train.shape[0]), _y_train] = 1

    y_test = np.zeros((_y_test.shape[0], 10))
    y_test[np.arange(_y_test.shape[0]), _y_test] = 1

    return (x_train, y_train), (x_test, y_test)

(x_train, y_train), (x_test, y_test) = load_mnist()

Let's take a look at the data. In the "x" arrays you'll find the images (encoded as pixel intensities) and in the "y" ones you'll find the labels (one-hot encoded).

In [3]:
print(x_train.shape)
print(y_train.shape)

print(x_train[:10])
print(y_train[:10])

(60000, 784)
(60000, 10)
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]


Now let us see the data in a more human way.

In [4]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import plotly.express as px

num_samples = 10
plots = make_subplots(rows=1, cols=num_samples)

for i in range(num_samples):
  a = x_train[i, :].reshape(28,28)
  img = go.Heatmap(z=a, colorscale='gray')
  plots.add_trace(img, row=1, col=i+1)

plots.update_yaxes(autorange='reversed', scaleanchor='x', constrain='domain')
plots.update_xaxes(constrain='domain')
plots.update_traces(showscale=False)
plots.show()


Next, we prepare $X$ and $y$ variables

In [5]:
X = x_train[:4000]
y = y_train[:4000]

print(X.shape)
print(y.shape)

(4000, 784)
(4000, 10)


To train the model we will (obviously) use gradient descent. Inside the loop we need a method to compute the gradients. Let's start with implementing it, together with some helper functions.

# Softmax regression

In this exercise you will train a softmax regression model to recognize handwritten digits.
  
The general setup is as follows:
* we are given a set of pairs $(x, y)$, where $x \in R^D$ is a vector of real numbers representing the features, and $y \in \{1,...,c\}$ is the target (in our case we have ten classes, so $c=10$),
* for a given $x$ we model the probability of $y=j$ by $$h(x)_j=p_j = \frac{e^{w_j^Tx}}{\sum_{i=1}^c e^{w_i^Tx}},$$
* to find the right $w$ we will optimize the so called multiclass log loss:
$$L(y,p) = \log{p_y},$$
$$J(w) = -\frac{1}{n}\sum_{i=1}^n L(y_i,h(x)),$$
* with the loss function in hand we can improve our guesses iteratively:
    * $w_{ij}^{t+1} = w_{ij}^t - \text{step_size} \cdot \frac{\partial J(w)}{\partial w_{ij}}$,
* we can end the process after some predefined number of epochs (or when the changes are no longer meaningful).

In [10]:
# We will store the weights in a D x c matrix, where D is the number of features, and c is the number of classes
#weights = (...) # TODO: Fill in, be sure to have the right shape!
weights = np.random.random((X.shape[1], 10)) # D, c
print(weights)

def softmax(z):
    ########################################
    # TODO: implement the softmax function #
    ########################################
    sum = np.sum([np.exp(x) for x in z])
    return np.array([x / sum for x in z])


def predict(weights, X):
    ###################################
    # TODO: compute the probabilities #
    ###################################
    return softmax(X @ weights)

def l2_reg_term(weights, l2_term):
    return 0.5 * l2_reg * np.sum(weights ** 2)

def softmax_loss_vectorized(W, X, y):
    # Initialize the loss and gradient to zero.
    num_train = X.shape[0]
    num_classes = W.shape[1]

    # Step 1: compute score vector for each class
    scores = predict(W, X)

    #Step 3: obtain the correct class score
    # rang =
    # print(rang, y)
    classes = np.argmax(y, axis=1)
    correct_score = scores[list(range(num_train)), classes]
    #compute the sum of exp of all scores for all classes
    scores_sums = np.sum(scores, axis=1)

    #Step 4: compute softmax function
    softmax_loss = correct_score / scores_sums
    #compute cross-entropy function
    cross_entropy_loss = - np.log(softmax_loss)
    #compute loss function
    loss = np.sum(cross_entropy_loss) / num_train
    return loss


def compute_loss_and_gradients(weights, X, y, l2_reg):
    #############################################################################
    # TODO: compute loss and gradients, don't forget to include regularization! #
    #############################################################################
    y_pred = predict(weights, X)

    loss = softmax_loss_vectorized(weights, X, y) + l2_reg_term(weights, l2_reg)

    grad = np.dot(X.T, (y_pred - softmax(y))) / len(y) + l2_reg * weights

    return loss, grad

[[0.49928902 0.00705234 0.20296262 ... 0.12824213 0.32992646 0.02815963]
 [0.03083664 0.33462827 0.30373588 ... 0.92930963 0.18926109 0.73918306]
 [0.24255505 0.95424217 0.57750871 ... 0.48082187 0.53295404 0.91831151]
 ...
 [0.10128397 0.82884916 0.19819068 ... 0.30718117 0.58825529 0.31063171]
 [0.84636433 0.03577151 0.36065607 ... 0.43716416 0.3222104  0.21471666]
 [0.26361234 0.82685009 0.87885723 ... 0.9450609  0.68673258 0.63556966]]


We are now in position to complete the training pipeline.

If you have problems with convergence, be sure to check the gradients numerically.

In [11]:
l2_reg = 0.5
n_epochs = 2000
lr = 0.1
t = 0.99

losses = []
for i in range(n_epochs):
    loss, grad = compute_loss_and_gradients(weights, X, y, l2_reg)
    losses.append(loss)

    weights -= lr * grad
    lr *= t
print(losses)
fig = px.line(x=range(1,n_epochs+1), y=losses)
layout = go.Layout(xaxis_title="Epoch", yaxis_title='Loss')
fig.update_layout(layout)

fig.show()

[655.976456152561, 592.24320618272, 535.2845370811949, 484.3268753022263, 438.6905941651034, 397.77796817857137, 361.06274872470505, 328.0811330415186, 298.4239318416599, 271.7297692264962, 247.67917259932716, 225.9894307172031, 206.41011540869619, 188.71917729576094, 172.71953848717436, 158.2361159918414, 145.11321881286867, 133.2122695641651, 122.40980819998518, 112.59574123293864, 103.67180478020363, 95.55021404181926, 88.15247548141112, 81.4083411354369, 75.2548871958604, 69.63570135585687, 64.50016543216587, 59.802821526580026, 55.50281150163011, 51.563380854963945, 47.951439211558124, 44.637170636994, 41.593687829330996, 38.79672498941563, 36.224364815036445, 33.856795626282675, 31.67609511905218, 29.666037670574564, 27.811922493414034, 26.10042025988519, 24.51943610335826, 23.05798715190502, 21.706092967771937, 20.454677457265934, 19.295480983267534, 18.220981559747322, 17.224324136959634, 16.299257099681146, 15.44007520091439, 14.641568241593529, 13.89897488449606, 13.207941059

Now compute your accuracy on the training and test sets.

In [26]:
acc = np.mean(predict(weights, x_train).argmax(axis=1) == y_train.argmax(axis=1))
print("Train accuracy: ", acc)
acc = np.mean(predict(weights, x_test).argmax(axis=1) == y_test.argmax(axis=1))
print("Test accuracy: ", acc)

Train accuracy:  0.7200166666666666
Test accuracy:  0.7211


We can also visualize the weights learned by our algorithm. Try to anticipate the result before executing the cell below.

In [20]:
num_samples = 10
plots = make_subplots(rows=1, cols=num_samples)

for i in range(num_samples):
  a = weights[:, i].reshape(28,28)
  img = go.Heatmap(z=a, colorscale='gray')
  plots.add_trace(img, row=1, col=i+1)

plots.update_yaxes(autorange='reversed', scaleanchor='x', constrain='domain')
plots.update_xaxes(constrain='domain')
plots.update_traces(showscale=False)
plots.show()

Note that we only used a small portion of the data to develop the model. Now, implement the training on full data. Also, validate your model properly and find a good value for `l2_reg` hyperparameter. Try to experiment with `batch_size`.


In [25]:
################################################
# TODO: implement the proper training pipeline #
################################################
l2_reg = 0.5
n_epochs = 5000
lr = 0.1
t = 0.99

losses = []
for i in range(n_epochs):
    loss, grad = compute_loss_and_gradients(weights, X, y, l2_reg)
    losses.append(loss)

    weights -= lr * grad
    lr *= t

fig = px.line(x=range(1,n_epochs+1), y=losses)
layout = go.Layout(xaxis_title="Epoch", yaxis_title='Loss')
fig.update_layout(layout)

fig.show()

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>