In [1]:
import numpy as np
import math

Problem Statement
You are presented with a problem of classifying cakes as chocolate or not
chocolate based on certain features of the cake. You are given 3 features,
- X1: Sugar Content (Values ranging from 0-1)
- X2: Cocoa Content (Values ranging from 0-1)
- X3: Flour Content (Values ranging from 0-1)


Exercise 1:Questions
Answer the following questions in short sentences:
- How well did the following MLP perform?
- Why might the MLP perform as it did, despite the lack of weight updates?
- How would the performance change if backpropagation were introduced?


1. The high mean squared error indicated poor performance from the MLP, meaning that the predicted outputs were very different from the true values.
2. Without weight updating, our multilayer perceptron might as well be a linear model because it is not intelligently learning from its iterations at all.
3. With back propogation, our MLP would work as intended and adjust/learn by adjusting weights based on output. This will definitely reduce our MSE and improve prediction performance.


Exercise 1: Coding
Using only the numpy library, create the same multi-layered perceptron in python
and initialize using the same parameters and print the MSE. Make sure to
structure/generalize your code (either using functions and or classes).


In [2]:
#input weights
w11, w12, w13 = 0.14, 0.78, 0.33
w21, w22, w23 = 0.91, 0.47, 0.56

#output weights
wh1, wh2 = 0.65, 0.38

#biases
b1, b2, b10 = 1, 1, 1

#inputs
X = np.array([
    [0.4, 0.6, 0.8],
    [0.8, 0.1, 0.7],
    [0.55, 0.65, 0.7]
])

#true output
y_true = np.array([1, 0, 1])

n = len(X)

# activation formula
def neuron_weighting(x1, x2, x3):
    z_h1 = w11 * x1 + w12 * x2 + w13 * x3 + b1 #first layer
    z_h2 = w21 * x1 + w22 * x2 + w23 * x3 + b2 #second layer
    return z_h1, z_h2

def sigmoid(x):
  postsig = 1 / (1 + math.exp(-x))
  return postsig

# Output neuron activation
def second_weight(z_h1, z_h2):
    z_out = wh1 * z_h1 + wh2 * z_h2 + b10
    return z_out


predictions = np.zeros(n)
for i in range(n):
    z_h1, z_h2 = neuron_weighting(X[i][0], X[i][1], X[i][2])
    print("first weighting:", z_h1, z_h2)
    z_h1 = sigmoid(z_h1)
    z_h2 = sigmoid(z_h2)
    print("first sigmoid:",z_h1,z_h2)
    secondweight = second_weight(z_h1,z_h2)
    print("second weighting:",secondweight )
    prediction = sigmoid(secondweight)
    print("final prediction:",prediction,"\n")
    predictions[i] = prediction

# Calculate MSE
mse = np.mean((y_true - predictions) ** 2)
print(f" predicted y values: {predictions[0],predictions[1]}")
print(f" mean squared error: {mse}")

first weighting: 1.788 2.0940000000000003
first sigmoid: 0.8566818955968716 0.8903186413438859
second weighting: 1.895164315848643
final prediction: 0.8693432420162845 

first weighting: 1.421 2.1670000000000003
first sigmoid: 0.8054951369940119 0.8972467108359328
second weighting: 1.8645255891637622
final prediction: 0.8658235707990888 

first weighting: 1.815 2.198
first sigmoid: 0.8599650843722273 0.9000697663968664
second weighting: 1.901003816072757
final prediction: 0.8700050956219487 

 predicted y values: (0.8693432420162845, 0.8658235707990888)
 mean squared error: 0.2612067731074529


Exercise 2: Hands-On
Let’s say we’re given the same cake
problem and only want to classify
using sugar and cocoa content and
instead, we will be using another
neural network, with 3 hidden neurons
and 2 output neurons all using sigmoid
as the activation function. If y1 and y2
output (0.3,0.7) then it’s classified as
chocolate cake, but if the output is
(0.7,0.3) then it is classified as not
chocolate cake. Given a set of
weights, biases, and inputs, manually
calculate their outputs and the MSE.
Additionally, update the weights and
biases twice using backpropagation.


Exercise 2: Coding Portion
Using only the numpy library, create the same simple neural network in python
and initialize using the same parameters. Update its weights and biases 100 times
and print out its new weights and MSE. Make sure to structure/generalize your
code (either using functions and or classes).


In [3]:
w11, w12, w21 = 0.21, 0.61, 0.48
w22, w31, w32 = 0.13,0.74,0.35
wh11,wh12,wh13 = .89,.52,.06
wh21,wh22,wh23 = .97,.41,.09
b1,b2,b3 = 1,1,1
b10,b20 = 1,1

X = np.array([
    [0.3,0.6,0.3,0.7],
    [0.1,0.1,0.7,0.3],
    [0.5,0.7,0.3,0.7],
    [0.4,0.5,0.3,0.7],
    [0.2,0.5,0.7,0.3]
])

y_true = np.array([1,0,1,1,0])

n = len(X)

def neuron_weighting(x1, x2):
    z_h1 = w11*x1 + w21 * x2 + b1 #first layer
    z_h2 = x1*w21 + x2*w22 + b2  #second layer
    z_h3 = w31*x1 + w32 *x2 + b3 #third layer
    return z_h1, z_h2, z_h3

def output_weighting(h1,h2,h3):
    output1 = wh11*h1 + wh12 * h2 + wh13 * h3 + b10
    output2 = wh21*h1 + wh22*h2  + wh23*h3 + b20
    return output1,output2

def sigmoid(x):
    postsig = 1 / (1 + math.exp(-x))
    return postsig

predictions = np.zeros(n)

def forward(x1,x2):
    z_h1, z_h2, z_h3 = neuron_weighting(x1, x2)
    print("first weighting:", z_h1, z_h2,z_h3)
    z_h1 = sigmoid(z_h1)
    z_h2 = sigmoid(z_h2)
    z_h3 = sigmoid(z_h3)
    print("first sigmoid:",z_h1,z_h2,z_h3)
    o1,o2 = output_weighting(z_h1,z_h2,z_h3)
    print("second weighting:", o1,o2)
    notchocolate = (sigmoid(o1) + sigmoid(o2))/2
    print("not chocolate:", notchocolate,"\n")
    return z_h1, z_h2, z_h3, notchocolate

def sigmoid_derivative(x):
    return x * (1 - x)

epochs = 100

for epoch in range(epochs):
    total_loss = 0
    for i in range(n):
        x1, x2 = X[i][0], X[i][1]
        y = y_true[i]
        
        # Forward pass
        h1, h2, h3, output_not_chocolate = forward(x1, x2)
        
        # Compute MSE Loss
        loss = (output_not_chocolate - y) ** 2
        total_loss += loss
        
        # Compute output layer error
        delta_output = output_not_chocolate - y
        delta_o1 = delta_output * sigmoid_derivative(output_not_chocolate)
        
        # Backpropagate to hidden layer
        delta_h1 = delta_o1 * wh11 * sigmoid_derivative(h1)
        delta_h2 = delta_o1 * wh12 * sigmoid_derivative(h2)
        delta_h3 = delta_o1 * wh13 * sigmoid_derivative(h3)
        
        # Update output layer weights and biases
        wh11 -= 1 * delta_o1 * h1
        wh12 -= 1 * delta_o1 * h2
        wh13 -= 1 * delta_o1 * h3
        wh21 -= 1 * delta_o1 * h1
        wh22 -= 1 * delta_o1 * h2
        wh23 -= 1 * delta_o1 * h3
        b10 -= 1 * delta_o1
        
        # Update hidden layer weights and biases
        w11 -= 1 * delta_h1 * x1
        w21 -= 1 * delta_h1 * x2
        w12 -= 1 * delta_h2 * x1
        w22 -= 1 * delta_h2 * x2
        w31 -= 1 * delta_h3 * x1
        w32 -= 1 * delta_h3 * x2
        b1 -= 1 * delta_h1
        b2 -= 1 * delta_h2
        b3 -= 1 * delta_h3

    print(f"Epoch {epoch+1}/{epochs}, Loss: {total_loss/n}")


first weighting: 1.351 1.222 1.432
first sigmoid: 0.7942930678747973 0.772415320751363 0.8072127476700125
second weighting: 2.1570095620594794 2.1598037046369134
not chocolate: 0.8964516642303231 

first weighting: 1.070523559180833 1.0620152227547401 1.1090978266537064
first sigmoid: 0.7446964695553937 0.7430754686025084 0.7519608801854211
second weighting: 2.12094562507708 2.111729881615223
not chocolate: 0.8924802103779981 

first weighting: 1.4268248347297665 1.3227090457891855 1.6139286749427524
first sigmoid: 0.8064061060945483 0.7896320680539253 0.8339561170438743
second weighting: 1.9654203261812784 2.0441217471927002
not chocolate: 0.8812353324738278 

first weighting: 1.3125616892583851 1.2505842107514018 1.469956646174628
first sigmoid: 0.7879415034650841 0.7774009744689839 0.8130507963746639
second weighting: 1.980666701758886 2.046179350110248
not chocolate: 0.882156604316193 

first weighting: 1.2729057407153053 1.155910009506671 1.322006322296834
first sigmoid: 0.7812397