OpenCV notebook - ©Arkaprabha Majumdar

# #1. Neural Networks

Neural Networks are essentially better algorithms than ML for non linearly separable data.

The input data into NNs are almost specifically numerical.

It is capable of learning complex, non- linear mappings from data to produce better classifications.

Consists of three main layers:
    
    Input Layer
    Hidden Layers
    Output Layers

Consider a neural network of 2 input nodes, a single hidden layer with 2 nodes, and 2 output nodes:

In [2]:
import numpy as np

In [51]:
#input nodes
inp_nodes=[]
for i in range(int(input(prompt="Enter no. of input nodes : "))):
    inp_nodes.append(np.random.uniform(0,1))
inp_nodes=np.array(inp_nodes)
inp_nodes=np.append(inp_nodes,1).reshape((3,1))
print(inp_nodes)

Enter no. of input nodes : 2
[[0.45255876]
 [0.72553979]
 [1.        ]]


In [80]:
#Considering 2 hidden nodes in the single hidden layer
weights = []
n=int(input("Enter no. of layers : "))
for layers in range(n):
    weight_layer=[]
    for i in range(len(inp_nodes)-1):
        add_weights=[]
        for j in range(len(inp_nodes)-1):
            add_weights.append(np.random.uniform(0,1))
        weight_layer.append(add_weights)
    weights.append(weight_layer)
weights=np.array(weights)
print(weights)

Enter no. of layers : 2
[[[0.21160951 0.75384909]
  [0.49526241 0.50229725]]

 [[0.71904786 0.23012301]
  [0.03196981 0.7720544 ]]]


In [148]:
weights[1]

array([[0.71904786, 0.23012301],
       [0.03196981, 0.7720544 ]])

In [82]:
bias=[]
for layer in range(n):
    bias.append(np.array([np.random.uniform(0,1) for i in range(len(inp_nodes)-1)]).reshape((2,1)))
print(bias[1])

[[0.57731464]
 [0.91991064]]


In [60]:
#for the activation function, we can consider sigmoid
def sigmoid(x):
    return 1/(1+np.exp(-x))

### Forward Propagation :-
1. We first forward propagate the input values, x_i

2. multiply with randomly selected weights w_i

3. add biases, b_i to the above

4. apply an activation function - sigmoid/ReLU

5. Do this throughout the hidden layers (consider 1) with nodes h_i, and outputs O_i


In [83]:
#as per the bias trick, we can reduce our computation time by augmenting weights and bias
augment_mat = np.column_stack((weights[1],bias[1]))
print("Augmented matrix : \n",augment_mat)
H=np.matmul(augment_mat,inp_nodes)
print("Net H : \n",H) #net value of hidden nodes
H_i = sigmoid(H)
print("hidden node values : \n",H_i)

Augmented matrix : 
 [[0.71904786 0.23012301 0.57731464]
 [0.03196981 0.7720544  0.91991064]]
Net H : 
 [[1.06968945]
 [1.49453505]]
hidden node values : 
 [[0.74453785]
 [0.81675798]]


In [109]:
#Lets make this into a function:
def calculate_values(weights,bias):
    nodes = inp_nodes
    augment_mat=[]
    Net_H=[]
    H=[]
    for i in range(n):
        print(i)
        augment_mat.append(np.column_stack((weights[i],bias[i])))
        print("Augmented matrix : \n",augment_mat[i])
        Net_H.append(np.matmul(augment_mat[i],nodes))
        print("Net H : \n",Net_H[i])
        H.append(sigmoid(Net_H[i]))
        print("hidden node values : \n",H[i])
        nodes = np.append(H[i].reshape((1,2)),1).reshape((3,1))
        #print("aug shape:",augment_mat[i].shape)
        #print("H shape",H[i].shape)
    return(augment_mat,Net_H,H)

In [110]:
augment_mat,Net_H,H=calculate_values(weights,bias)

0
Augmented matrix : 
 [[0.21160951 0.75384909 0.23122393]
 [0.49526241 0.50229725 0.08797374]]
Net H : 
 [[0.87393718]
 [0.67654573]]
hidden node values : 
 [[0.70556428]
 [0.6629673 ]]
1
Augmented matrix : 
 [[0.71904786 0.23012301 0.57731464]
 [0.03196981 0.7720544  0.91991064]]
Net H : 
 [[1.23721316]
 [1.45431422]]
hidden node values : 
 [[0.77507855]
 [0.81066151]]


In [113]:
target = np.array([1.2 , 0.3]).reshape((2,1))
target

array([[1.2],
       [0.3]])

So our first "hidden node value" are H_1 and H_2

The other two are O_1 and O_2

Let our target values:
    T1 = 1.2
    T2=0.3 (just random assumptions, nothing ideal)

next, we move towards 
### Gradient Descent Optimization using BackPropagation.

We shall use Mini-Batch GD, which combines the advantages of both Naive and Stochastic GD methods.

We shall also use Loss function MSE (Mean Square Error), which is always a good,safe bet.

### BackPropagation :-
1. Calculate the loss using loss function E_i = 0.5 * (T_i - O_i)^2

    The (1/2) is to cancel the exponent on differentiation (has no effect on result)
    
2. Next, we calculate E_total = sum(E_i)

3. Select a weight w_k coming from previous layer.

    We need to figure out if changing w_k will result in decrease in loss in E_total
    
4. Using chain rule:
        (d E_total / d w_k) = (dE_total / dO_k) * (dO_k / d net0_k) * (d net0_k / dw_k)
        
        dE_total / dO_k = (T_k - O_k)
        
        dO_k / dNetO_k = O_k * (O_k)
        #NetO_k is the input to output node, before applying sigmoid
        
        dNetO_k / dw_k = (h_j) #the hidden node connected with 
        
5. Then we select a learning parameter "η", which is to scale DOWN the value change in w_k.
    Taking too large learning parameter will overshoot the globalmin.
    
6. new w_k = w_k + η * (dE_total / dw_k)

7. Do this for all the weights.

In [185]:
#calculate MSE
def mse(output,target):
    return(0.5*((target - output)**2))

def derivative(weights, output, target):
    i=0
    for weight in weights:
        print("weights : \n",weight)
        component1 = (output - target)
        component2 = output[1-i] * (1 - output[1-i])
        component3 = output[i]
        print("derivative : \n",component1*component2*component3)
        i+=1

In [186]:
updations=derivative(weights[1],H,target) #these are updations for w5,w6,w7,w8

weights : 
 [0.71904786 0.23012301]
derivative : 
 [[[-0.06081672]
  [ 0.036935  ]]

 [[-0.05226631]
  [ 0.05196413]]]
weights : 
 [0.03196981 0.7720544 ]
derivative : 
 [[[-0.07961275]
  [ 0.06574628]]

 [[-0.06841975]
  [ 0.09249895]]]
