# Implementing Stochastic Gradient Descent on LTU

A Linear Threshold Unit(LTU) is a simple artificial neuron whose output is the thresholded  weighted sum of its inputs. What does that mean?<br><br> 
Let x1,x2,x3 be the inputs to the LTU.<br>
For some arbitrary weights (w1,w2,w3), the weighted sum of these inputs would be calculated as <strong>w1x1 + w2x2 + w3x3</strong>.
If <strong>T</strong> is the threshold of the LTU, then<br><br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;
+1, 
&emsp;&emsp;
if w1x1 + w2x2 + w3x3 >= T<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; 
h(x) = <br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;
-1, 
&emsp;&emsp;otherwise

If a set of points can be seperated by a linear decision boundary(or a hyperplane in multiple dimensions), then they are said to be <strong>linearly seperable</strong>. For example, the AND,OR or NAND gates can have 0s and 1s seperated by a line. On the other hand, XOR is not linearly seperable.<br>

LTUs can be used to represent such linearly seperable classes.

### Training to learn the weights for AND Gate

Consider four input features(x1,x2,x3,x4). The AND Gate is represented as  x1 ∧ x2 ∧ x3 ∧ x4 <=> y , i.e, y is  true iff all the inputs are true.<br>
Gradient Descent can be used try to learn the function represented by an LTU with threshold T=4

In [277]:
import numpy as np
import pandas as pd

In [278]:
'''
 hx is the thresholded weighted sum. For an AND gate woth three inputs, the threshold is 3
 The method takes two parameters:
    x: one input value of shape (3,1) [since we have 3 features]
    w: weight vector of shape (3,1)  [since each feature requires a corresponding weight]
'''
def hx(x,w): 
    u = np.dot(w.T, x)   #x = (3,1), w=(3,1)
    if u>=3:
        return 1
    else:
        return 0

"""
 SGD_mse performs stochastic gradient descent using mean squred error as the Loss function. 
     J = ( y[i] - y_pred[i] )^2
 The gradient (dJ/dw) for the ith example and jth feature is 
             dJ/dw[j] = -2(y[i] - y_pred[i])*x[i][j]
 The method takes the followinf input parameters:
     X:             input matrix of shape (8,3,1) since we have 8 training examples, and each input is of shape (3,1)
     y:             output vector of shape (8,1) corresponding to 0/1 for each training example
     w:             weight vector of shape (3,1)
     alpha:         learning rate
     num_examples:  number of training examples
     num_features:  number of features in the input
     iterations:    number of iterations to run the gradient descent algorithm
     
 This method updates the weight vector w

"""
def SGD_mse(X, y, w, alpha, num_examples, num_features, iterations):
    for iter in range(iterations):
        g = np.zeros((num_features, 1))
        for i in range(num_examples):
            y_hat = hx(X[i],w)
            for j in range(num_features):
                g[j] = -2 * (y[i] - y_hat) * X[i][j]
                w[j] -= alpha*g[j]
        print("Iteration ",iter,", w: \n",w)
        
        

#Helper function to compare true and predicted values                
def validate(X,Y_true,w):
    for i in range(X.shape[0]):
        y_hat = hx(X[i],w)
        print("True: ",Y[i], "\tPredicted: ",y_hat)

In [287]:
#import the data using Pandas
data_df = pd.read_csv("data.csv")
print(data_df.head())

   x1  x2  x3  x4  y
0   0   0   0   0  0
1   0   0   0   1  0
2   0   0   1   0  0
3   0   0   1   1  0
4   0   1   0   0  0


In [280]:
X = np.array(data_df[['x1','x2','x3','x4']])
Y = np.array(data_df['y'])

print("X: ",X)
print("Y: ",Y)
X.shape

X:  [[0 0 0 0]
 [0 0 0 1]
 [0 0 1 0]
 [0 0 1 1]
 [0 1 0 0]
 [0 1 0 1]
 [0 1 1 0]
 [0 1 1 1]
 [1 0 0 0]
 [1 0 0 1]
 [1 0 1 0]
 [1 0 1 1]
 [1 1 0 0]
 [1 1 0 1]
 [1 1 1 0]
 [1 1 1 1]]
Y:  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]


(16, 4)

In [283]:
num_examples = X.shape[0]
num_features = X.shape[1]

w = np.zeros((num_features,1))
print("Initially, weight vector w: \n",w)

Initially, weight vector w: 
 [[0.]
 [0.]
 [0.]
 [0.]]


In [284]:
SGD_mse(X,Y,w,0.001,num_examples, num_features, 1000)

Iteration  0 , w: 
 [[0.002]
 [0.002]
 [0.002]
 [0.002]]
Iteration  1 , w: 
 [[0.004]
 [0.004]
 [0.004]
 [0.004]]
Iteration  2 , w: 
 [[0.006]
 [0.006]
 [0.006]
 [0.006]]
Iteration  3 , w: 
 [[0.008]
 [0.008]
 [0.008]
 [0.008]]
Iteration  4 , w: 
 [[0.01]
 [0.01]
 [0.01]
 [0.01]]
Iteration  5 , w: 
 [[0.012]
 [0.012]
 [0.012]
 [0.012]]
Iteration  6 , w: 
 [[0.014]
 [0.014]
 [0.014]
 [0.014]]
Iteration  7 , w: 
 [[0.016]
 [0.016]
 [0.016]
 [0.016]]
Iteration  8 , w: 
 [[0.018]
 [0.018]
 [0.018]
 [0.018]]
Iteration  9 , w: 
 [[0.02]
 [0.02]
 [0.02]
 [0.02]]
Iteration  10 , w: 
 [[0.022]
 [0.022]
 [0.022]
 [0.022]]
Iteration  11 , w: 
 [[0.024]
 [0.024]
 [0.024]
 [0.024]]
Iteration  12 , w: 
 [[0.026]
 [0.026]
 [0.026]
 [0.026]]
Iteration  13 , w: 
 [[0.028]
 [0.028]
 [0.028]
 [0.028]]
Iteration  14 , w: 
 [[0.03]
 [0.03]
 [0.03]
 [0.03]]
Iteration  15 , w: 
 [[0.032]
 [0.032]
 [0.032]
 [0.032]]
Iteration  16 , w: 
 [[0.034]
 [0.034]
 [0.034]
 [0.034]]
Iteration  17 , w: 
 [[0.036]
 [0.03

Iteration  741 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  742 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  743 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  744 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  745 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  746 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  747 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  748 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  749 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  750 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  751 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  752 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  753 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  754 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  755 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  756 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  757 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration  758 , w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]
Iteration 

In [285]:
print("\nAfter learning, w: \n",w)


After learning, w: 
 [[0.75]
 [0.75]
 [0.75]
 [0.75]]


In [286]:
validate(X,Y,w)

True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  0 	Predicted:  0
True:  1 	Predicted:  1
