## Importing dependancies

In [57]:
import numpy as np
from linearclassifier import *

## 1) Loss Functions and Output activations: classification


**When doing classification, it's natural to think of the output values as being discrete: +1 and -1. But it is generally difficult to use optimization-based methods without somehow thinking of the outputs as being continuous (even though you will have to discretize when it's time to make a prediction).**

### 1.1) Hinge loss, linear activation

**First way to make the output for a classifier continuous, in reel value**

#### ``Gradient of hinge loss function with respect to th``

In [58]:
def hinge_loss_grad(x, y, a):
    return np.where(y*a >= 0, 0, -y*x)

#### `Test cases`

In [59]:
hinge_loss_grad(cv([1,-2]), -1, 1.1).tolist()

[[1], [-2]]

In [60]:
hinge_loss_grad(cv([1,-2]), 1, 1.1).tolist()

[[0], [0]]

In [61]:
hinge_loss_grad(cv([-2, 1]), 1, -1.1).tolist()

[[2], [-1]]

### 1.2) log loss, sigmoidal activation

**Another way to make the output for a classifier continuous is to make it to be in range(0,1), which admits the interpretation of being the predicted `probability` that the example is positive.**

`In this model, we will consider positive points to have label +1, and negative points to have label 0`

#### `Gradient of negative log likehood (NLL) loss function with respect to th`

In [62]:
def nll_grad(x, y, a):
    return (a - y) * x

## 2) Multiclass classification

##### We use softmax as activation function to make multiclass classification. It's not a typical activation module, since it takes in all n_l pre-activation values and returns n_l in (0, 1). This can be interpreted as representing probalility distribution over the possible categories.

#### `Computing softmax of a vector output `

In [63]:
def softmax(z):
    den = np.sum(np.exp(z))
    return np.exp(z)/den
    

#### `Test cases`

In [64]:
softmax(cv([-1, 0, 1]))

array([[0.09003057],
       [0.24472847],
       [0.66524096]])

#### `Gradient of multiclass nll function with respect to W_L`

In [65]:
def mnll_grad(x, y, w):
    z = w.T @ x
    a = softmax(z)
    return x @ (a - y).T


#### `Data set`

In [66]:
X = cv([1, 1])
Y = cv([0, 1, 0])
W = np.array([[1, -1, -2], [-1, 2, 1]])

#### `Test case`

In [67]:
mnll_grad(X, Y, W)

array([[ 0.24472847, -0.33475904,  0.09003057],
       [ 0.24472847, -0.33475904,  0.09003057]])

#### `Predicted probability that x is in a class`

In [68]:
def predited_proba(x, w):
     z = w.T @ x 
     return softmax(z)

In [69]:
predited_proba(X, W)

array([[0.24472847],
       [0.66524096],
       [0.09003057]])

##### So the predicted probability that x is in class 1 is: 0.665

#### `MNLL Loss function`

In [70]:
def mnll(x, y, w):
    z = w.T @ x
    a = softmax(z)
    return -np.sum(y*np.log(a))

In [71]:
mnll(X, Y, W)

0.4076059644443803

#### `SGD update for W_L`

In [72]:
def sgd_update_wl(x, y, w, eta=0.5):
    grad_wl = mnll_grad(x, y, w)
    return w - eta*grad_wl

In [73]:
sgd_update_wl(X, Y, W).tolist()

[[0.8776357644726012, -0.8326204778874109, -2.04501528658519],
 [-1.1223642355273988, 2.167379522112589, 0.9549847134148097]]

#### `New predicted probability that x is in class 1 `

In [74]:
W_upd = sgd_update_wl(X, Y, W)
predited_proba(X, W_upd)

array([[0.15918761],
       [0.77245284],
       [0.06835955]])

## 3) Neural Networks

##### In this problem, we will analyze a simple neural network to understand its classification properties. We will consider a NN with `ReLU` activation function on all hiden neurons, and `softmax` activation for the output layer.

### `Data sets`

In [75]:
W1 = np.array([[1, 0, -1, 0], [0, 1, 0, -1]])
W0_1 = cv([-1, -1, -1, -1])
W2 = np.array([[1, 1, 1, 1], [-1, -1, -1, -1]]).T
W0_2 = cv([0, 2])
X = cv([3, 14])

### `3.1) Output`

In [76]:
def ReLU(x, w, w0):
    z = w.T @ x + w0 
    return np.where(z>= 0, z, 0)

#### `Layer 1 (4 neurons) outputs`

In [77]:
ReLU(X, W1, W0_1)

array([[ 2],
       [13],
       [ 0],
       [ 0]])

#### `Layer 2 (2 neurons) outputs`

In [78]:
a1 = ReLU(X, W1, W0_1)
z2 = W2.T @ a1 + W0_2
a2 = softmax(z2)

In [79]:
a1

array([[ 2],
       [13],
       [ 0],
       [ 0]])

In [86]:
a1 *(a1 != 0)

array([[0., 0., 0.],
       [0., 1., 0.],
       [0., 0., 2.],
       [0., 0., 0.]])

In [81]:
a2.tolist()

[[0.9999999999993086], [6.914400106935422e-13]]

#### `Hiden outputs for 3 data input points`

In [82]:
X = np.array([[0.5, 0, -3], [0.5, 2, 0.5]])
a1 = ReLU(X, W1, W0_1)
a1.tolist()

[[0.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 2.0], [0.0, 0.0, 0.0]]

In [83]:
1 + a1

array([[1., 1., 1.],
       [1., 2., 1.],
       [1., 1., 3.],
       [1., 1., 1.]])

In [84]:
X

array([[ 0.5,  0. , -3. ],
       [ 0.5,  2. ,  0.5]])