<a href="https://colab.research.google.com/github/LilianYou/dark-lily/blob/master/2_deltarule.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# PSYCH239 Lecture 2: Sigmoid Units and the Delta Learning Rule

In [0]:
import numpy as np

Creating data (this data is linearly separable). We use here a trick that subsumes the bias into a weight, whose input is always 1. 

In [0]:
X = np.array([[ 1, 0.04187914, -0.16537444],
        [ 1, 0.13746733,  0.60397647],
        [ 1, 0.68489807,  0.78038369],
        [ 1, 0.64652619, -0.05249682],
        [1, -0.80712692,  0.68521631]])
Y = np.array([-1,  1,  1, -1,  1])

**Transform into one-hot form**

In [0]:
Y1h = np.zeros([Y.shape[0],2])
Y1h[Y==1 ,1] = 1
Y1h[Y==-1,0] = 1

In [0]:
Y1h

array([[1., 0.],
       [0., 1.],
       [0., 1.],
       [1., 0.],
       [0., 1.]])

Our network has 3 inputs and 2 outputs. We have two outputs because there are two different classes!

In [0]:
N_out = 2
N_in = len(X[0])
W = np.random.rand(N_out, N_in)

**Modify activation function**. $\sigma(a) = \frac{1}{1+\exp(-a)}$

In [0]:
def sigmoid(a):
    return 1./(1+np.exp(-a))

hatY = sigmoid(X@W.T) #"@"" is a short hand for np.dot (=matrix product)

In [0]:
hatY.shape

(5, 2)

**Change Cost function to MSE**

In [0]:
def mse(y,t):
    return np.mean((y-t)**2)

In [0]:
loss = mse(hatY,Y1h)
loss

0.27593992913892396

Select one data sample at random. **Here we select even samples that are correctly classified**

In [0]:
sample_index = np.random.randint(0,len(X))
x = X[sample_index]
y1h = Y1h[sample_index]

Let's make an update. We will use a learning rate equal to .1. ***Modify this rule to match the delta rule! You need $\sigma'(a) = \sigma(a)(1-\sigma(a))$***

In [0]:
#Your code here that computes DW
A = X@W.T
delta = ((hatY-Y1h)*(sigmoid(A)*(1-sigmoid(A))))
DW = delta.T@X
DW.shape

(2, 3)

In [0]:
W = W - DW

**Modify the following function accordingly, return the value of the cost instead of indices of misclassified samples**

In [0]:
def train(X,Y1h,W):
    A = X@W.T
    delta = ((hatY-Y1h)*(sigmoid(A)*(1-sigmoid(A))))
    DW = delta.T@X
    W -= DW
    return mse(sigmoid(X@W.T),Y1h)



In [0]:
for n in range(100):
    cost = train(X,Y1h,W)
    if (n%20)==0: print(cost)

0.28005197746446475
0.05796398040761652
0.014071330653444708
0.0056403996665860155
0.002956164466124707


In [0]:
A = X@W.T
hatY1h = sigmoid(A)
print(hatY1h);print(Y1h)

[[0.94278134 0.03689423]
 [0.03621967 0.92922971]
 [0.02060054 0.94315678]
 [0.94400294 0.02540901]
 [0.00448166 0.99450043]]
[[1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [0. 1.]]
