In [1]:
import numpy as np 

**Problem 0**

```Write a function that takes dimensions as inputs and creates an initial set of weights in a matrix of those
dimensions. (For the example above, you will need to create an 8×7 matrix for 𝑊𝑖𝑛𝑝𝑢𝑡 as well as another 8×3 matrix for 𝑊ℎ𝑖𝑑𝑑𝑒𝑛.) Each entry should be a randomly selected nonzero but small value, say in the range (−0.5 , 0.5). There is one caveat. The final row of the matrix will hold the weights for that layer’s bias weights, and these need to be the same value for each entry. After each pass (or batch, if we choose a batch update approach), we will be updating this set of weights, so the initialization will only need to be done once for each matrix. Hence, problem zero!```

In [2]:
def w_input(rows,cols):
    W = np.random.sample((rows-1, cols)) - 0.5
    random_bias_weight = np.random.sample()
    bias_row = np.repeat(random_bias_weight,cols)
    W_input = np.insert(W, rows-1, bias_row, axis=0)
    return W_input

**Problem 1**

```Write a function that takes a vector of length 𝑚 as input and returns a vector of length 𝑚 + 1, where
the output vector is the original vector plus an extra 1 appended as the final entry.```

In [3]:
def append(v):
    v = np.array(v)
    return np.append(v,[1])

**Problem 2**

```Given the vector 𝑓 of inputs, we need to use weights in 𝑊𝑖𝑛𝑝𝑢𝑡 to obtain the raw, incoming values for
the hidden layer nodes. In other words, we need to calculate:
ℎ𝑟𝑎𝑤 = 𝑎𝑝𝑝𝑒𝑛𝑑(𝑓)𝑇*𝑊𝑖𝑛𝑝𝑢𝑡
Write a function that takes the properly appended feature vector 𝑓 and multiplies its transpose by the
weight matrix 𝑊𝑖𝑛𝑝𝑢𝑡 to get ℎ𝑟𝑎𝑤. (What are the dimensions of this resulting vector?)
Note that you should incorporate your function from problem 1 into the function for problem 2.```

In [4]:
def hraw(input_layer,W_input):
    hraw = np.dot(append(input_layer),W_input)
    return hraw

**Problem 3**

``We next need to activate the neurons in our hidden layer. This means, loosely, we need:
ℎ𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑒𝑑 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(ℎ𝑟𝑎𝑤)
Write a function that takes a vector of raw values and outputs a vector of transformed values, where the
transformation is performed using the sigmoid function on each element of the vector individually.
What are the dimensions of the output vector with respect to the input vector?``

In [5]:
def sigmoid(x):
    return (1 / (1 + np.exp(-x)))

def h_activated_(hidden_layer):
    hactivated = sigmoid(hidden_layer)
    return hactivated

**Problem 4**

```We must begin the backward propagation process by calculating our error. Write a function that takes
the final, activated output vector and calculates its error with respect to the true vector of one-hot
encodings for this observation. Use the squared error loss function we used before:
𝐸 = 0.5 ∗ ∑(𝑡𝑎𝑟𝑔𝑘 − 𝑇𝑜𝑢𝑡𝑘)^2
In vector terms we can express this as:
𝐸 = 0.5 ∙ (𝑜𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑒𝑑 − 𝑡)𝑇(𝑜𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑒𝑑 − 𝑡)
Concept check: Why do I calculate 𝑜𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑒𝑑 − 𝑡 instead of 𝑡 − 𝑜𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑒𝑑? Does it matter?
```

In [6]:
def w_output(rows,length_of_output_layer):
    W = np.random.sample((rows-1, length_of_output_layer)) - 0.5
    random_weight = np.random.sample([1]) - 0.5
    weight_vector = np.repeat(random_weight,length_of_output_layer)
    W_output = np.insert(W, rows-1, weight_vector, axis=0)
    return W_output

def o_raw(hactivated,W_output):
    h_activated_with_bias = append(hactivated)
    oraw = np.dot(h_activated_with_bias,W_output)
    return oraw

def o_activated(oraw):
    o_activated = sigmoid(oraw)
    return o_activated

'''Concept check: the order does not matter as the abs(b-a) = abs(a-b) '''

'Concept check: the order does not matter as the abs(b-a) = abs(a-b) '

In [7]:
def error(o_activated, target_layer):
    error = .5 * ((target_layer - o_activated)**2).sum()
    return error

**Problem 5**

``Write a function that calculates the gradient of the weight from a hidden node to an output node. In
Bramer’s notation, this is 𝑔(𝐸, 𝑊𝑗𝑘). Note that you will need to make use of the various quantities that
are defined above. Make sure you include all necessary inputs to your function.``

In [8]:
def grad_hidden_to_output(hactivated, o_activated, target_layer): 
    E = (o_activated - target_layer) * o_activated * (1 - o_activated)
    h_activated_transpose = np.array([[i] for i in hactivated])
    grad_hidden_to_output = h_activated_transpose * E 
    return grad_hidden_to_output

**Problem 6**

``Write a function that calculates the gradient of the weight from the bias term to the output nodes. In
Bramer’s notation, this is 𝑔(𝐸, 𝑏𝑖𝑎𝑠𝑂).``

In [9]:
def gradient_hidden_bias_to_output(o_activated,target_layer):
    bias_gradient = ((o_activated - target_layer) * o_activated * (1 - o_activated)).sum()
    return np.array([bias_gradient])

In [10]:
def hidden_to_output_gradient_matrix(o_activated,target_layer,hactivated):
    grad_hid_2_output = grad_hidden_to_output(hactivated, o_activated, target_layer)
    grad_hid_bias_2_out =  gradient_hidden_bias_to_output(o_activated,target_layer)
    len_bias_row = grad_hid_2_output.shape[1]
    BG_H_O = np.repeat(grad_hid_bias_2_out[0],len_bias_row)
    hidden_to_output_gradient_matrix = np.vstack([grad_hid_2_output, BG_H_O])
    return hidden_to_output_gradient_matrix

**Problem 7**

`` Write a function that calculates the gradient of the weight from an input node to a hidden node. In
Bramer’s notation, this is 𝑔(𝐸, 𝑤𝑖𝑗). 𝑔(𝐸,𝑤) = ∑ [(𝑇𝑜𝑢𝑡𝑘−𝑡𝑎𝑟𝑔𝑘)∙𝑇𝑜𝑢𝑡𝑘∙(1−𝑇𝑜𝑢𝑡𝑘)∙𝑊 ]∙𝑇h𝑖𝑑𝑗∙(1−𝑇h𝑖𝑑𝑗)∙𝑖𝑛𝑝𝑖 ``

In [11]:
def grad_input_to_hidden(hactivated, o_activated, target_layer, input_layer, W_output):
    W_output_wo_bias = W_output[:-1,:]
    E = (o_activated - target_layer) * o_activated * (1 - o_activated) 
    grad_input_to_hidden = np.zeros((7,7))
    for j in range(len(hactivated)):
        pt1 = []  
        pt2 = []
        pt1.append( np.dot(E,W_output_wo_bias[j]) )
        pt2.append( hactivated[j] * (1 - hactivated[j])) 
        grad_input_to_hidden[:,j] = np.array(pt1) * np.array(pt2) * input_layer
    return grad_input_to_hidden

**Problem 8**

``Write a function that calculates the gradient of the weight from the bias term to the hidden nodes. In
Bramer’s notation, this is 𝑔(𝐸,𝑏𝑖𝑎𝑠𝐻) = ∑[(𝑇𝑜𝑢𝑡𝑘−𝑡𝑎𝑟𝑔𝑘)∙𝑇𝑜𝑢𝑡𝑘∙(1−𝑇𝑜𝑢𝑡𝑘)∙𝑊 ]∙𝑇h𝑖𝑑𝑗∙(1−𝑇h𝑖𝑑𝑗)``

In [12]:
def gradient_input_bias_to_hidden(o_activated,target_layer,hactivated,W_output):
    W_output_wo_bias = W_output[:-1,:]
    E = (o_activated - target_layer) * o_activated * (1-o_activated)
    pt1 = np.dot(E,W_output_wo_bias.T)
    pt2 = hactivated *(1-hactivated)
    bias_gradient_il = np.dot(pt1,pt2) 
    return bias_gradient_il

In [13]:
def input_to_hidden_gradient_matrix(input_layer,o_activated,target_layer,hactivated,W_output):
    W_output_wo_bias = W_output[:-1,:]
    input_to_hidden_gradient_bias =  gradient_input_bias_to_hidden(o_activated,target_layer,hactivated,W_output)
    len_bias_row = hactivated.shape[0]
    BG_I_H = np.repeat(input_to_hidden_gradient_bias,len_bias_row)
    G_I_H = grad_input_to_hidden(hactivated,o_activated,target_layer,input_layer,W_output)
    input_to_hidden_gradient_matrix = np.vstack((G_I_H,BG_I_H))
    return input_to_hidden_gradient_matrix

**Problem 9**

``Write a function that takes as inputs a learning rate (𝛼) and updates the weights for the input-tohidden-node step. Note that you need to update the bias weights so that it is the same weight for each
node. You will probably want to incorporate your functions from problems 7-8 here.``

In [14]:
def update_input_hidden_weights(lr, input_to_hidden_gradient_matrix):
    updated_weights_input_hidden = input_to_hidden_gradient_matrix - lr * input_to_hidden_gradient_matrix
    return updated_weights_input_hidden

**Problem 10**

``Write a function that takes as inputs a learning rate (𝛼) and updates the weights for the hidden-to-output
node step. Note that you need to update the bias weights so that it is the same weight for each
node. You will probably want to incorporate your functions from problems 5-6 here.``

In [15]:
def update_hidden_output_weights(lr, hidden_to_output_gradient_matrix):
    updated_weights_hidden_output = hidden_to_output_gradient_matrix - lr * hidden_to_output_gradient_matrix
    return updated_weights_hidden_output

**Problem 11**

```We now should have all the pieces that we need to train our model. Write a function that performs one
entire pass based on a single training instance, using your functions above. What should the output(s) of
this function be?
Hint: What would we need from here to run our model on a new instance to make a prediction?```

In [16]:
def epoch(input_layer,target_layer,lr):
    
        #weights from input to hidden layer
        W_input = w_input(len(input_layer)+1,len(input_layer))
        
        #hidden layer 
        h_raw = hraw(input_layer,W_input)
        hactivated = h_activated_(h_raw)
     
        #weights from hidden to output layer
        W_output = w_output(len(hactivated)+1,len(target_layer))
        
        #output layer
        oraw = o_raw(hactivated,W_output)
        oactivated = o_activated(oraw)
        
        #Error
        error1 = error(oactivated, target_layer)
        
        #Hidden to Output Gradient Matrix
        ho_gm = hidden_to_output_gradient_matrix(oactivated,target_layer,hactivated)
        
        #Input to Hidden Gradient Matrix
        ih_gm = input_to_hidden_gradient_matrix(input_layer,oactivated,target_layer,hactivated,W_output) 
        
        #Updated Weights Input to Hidden
        updated_weights_input_hidden = update_input_hidden_weights(lr,ih_gm)
        
        #Updated Weights Hidden to Output
        updated_weights_hidden_output = update_hidden_output_weights(lr,ho_gm)
        
        #Round 2 
        hraw2 = hraw(input_layer, updated_weights_input_hidden)
        hactivated2 = h_activated_(hraw2) 
        oraw2 = o_raw(hactivated2, updated_weights_hidden_output)
        oactivated2 = o_activated(oraw2)
        error2 = error(oactivated2, target_layer)
        
        #Print Results
        print('The Input Layer:\n\n{}\n\n{}'.format(input_layer,input_layer.shape))
        print('\nThe Input Layer with Bias:\n\n{}\n\n{}'.format(append(input_layer),append(input_layer).shape))
        print('\nThe Input to Hidden Layer Weights:\n\n{}\n\n{}'.format(W_input,W_input.shape))
        print('\nThe Updated Input to Hidden Layer Weights:\n\n{}\n\n{}'.format(updated_weights_input_hidden,updated_weights_input_hidden.shape))
        print('\nActivated Hidden Layer:\n\n{}\n\n{}'.format(hactivated,hactivated.shape))
        print('\nActivated Hidden Layer with Bias:\n\n{}\n\n{}'.format(append(hactivated),append(hactivated).shape))
        print('\n2nd Acitivated Hidden Layer:\n\n{}\n\n{}'.format(hactivated2,hactivated2.shape))
        print('\n2nd Activated Hidden Layer with Bias:\n\n{}\n\n{}'.format(append(hactivated2,),append(hactivated2).shape))
        print('\nThe Hidden to Output Layer Weights:\n\n{}\n\n{}'.format(W_output,W_output.shape))
        print('\nThe Hidden to Output Layer Weights:\n\n{}\n\n{}'.format(updated_weights_hidden_output,updated_weights_hidden_output.shape))
        print('\nTarget Layer:\n\n{}\n\n{}'.format(target_layer,target_layer.shape))
        print('\nOutput Activated:\n\n{}\n\n{}'.format(oactivated,oactivated.shape))
        print('\nError:\n\n{}'.format(error1))
        print('\nOutput Activated after first Epoch:\n\n{}\n\n{}'.format(oactivated2,oactivated2.shape))
        print('\nNew Error:\n\n{}'.format(error2))
        print('\nDifference in Error:\n\n{}'.format((error2-error1)))
        
        return

**EXAMPLE**

In [17]:
il = np.array([.8,.5,.2,.4,.1,.9,.05])
t = np.array([1,0,1])
epoch(il,t,0.2)

The Input Layer:

[0.8  0.5  0.2  0.4  0.1  0.9  0.05]

(7,)

The Input Layer with Bias:

[0.8  0.5  0.2  0.4  0.1  0.9  0.05 1.  ]

(8,)

The Input to Hidden Layer Weights:

[[ 0.26117775  0.24955745  0.05239375  0.03716968 -0.30364393 -0.28100587
  -0.11204826]
 [ 0.45685888 -0.37722857 -0.35178791  0.09734848  0.06872325  0.31335523
   0.33949041]
 [ 0.27691861 -0.38827277  0.11226109 -0.48072102  0.38510449 -0.13817043
   0.13122025]
 [ 0.0570362  -0.20135073  0.39340942 -0.26880682 -0.18955664 -0.07893268
  -0.09680542]
 [-0.48693779 -0.26485398 -0.37113303  0.39351482 -0.40851947  0.25747275
   0.22550512]
 [-0.33213158  0.42732106 -0.32121571  0.32397543  0.24169247  0.41458053
   0.12409748]
 [ 0.47870973 -0.03751717  0.18317469 -0.29711467 -0.1139993  -0.02398624
   0.26655153]
 [ 0.94522838  0.94522838  0.94522838  0.94522838  0.94522838  0.94522838
   0.94522838]]

(8, 7)

The Updated Input to Hidden Layer Weights:

[[-1.28803030e-03  1.15688518e-02  1.77514733e-02  9.087590