<a href="https://colab.research.google.com/github/MorganBaccus/CptS-437/blob/main/Baccus_HW5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Homework Assignment #5**

Assigned: March 24, 2021

Due: April 12, 2021



---

This assignment consists of questions that require a short answer and one Python programming task. You can enter your answers and your code directly in a Colaboratory notebook and upload the **shareable** link for the your notebook as your homework submission.

---

#1.

 (12 points) In most real world scenarios, data contain outliers. When using a support vector machine, outliers can be dealt with using a soft margin, specified in a slightly different optimization problem shown in Equation 7.38 in the text and called a soft-margin SVM.

Intuitively, where does a data point lie relative to the margin when $\zeta_i = 0$? Is this data point classified correctly?

Intuitively, where does a data point lie relative to the margin when $0 < \zeta_i \leq 1$? Is this data point classified correctly?

Intuitively, where does a data point lie relative to the margin when $\zeta_i > 1$? Is this data point classified correctly?


Answer:

All data points that lie relative to the margin when $\zeta_i = 0$ will be classified as a positive example and that is the correct classification.

All data points that lie relative to the margin when $0 < \zeta_i \leq 1$? will be classified as a positive example and that is the correct classification.

All data points that lie relative to the margin when $\zeta_i > 1$? will be classified as a negative example and that is the incorrect classification.

---

#2.

(12 points) Suppose the two-layer neural network shown below processes the input (0, 1, 1, 0). If the actual output should be 0.2, show step-by-step how the vector of weights *v* will be updated using backpropagation and $\eta = 0.2$.

![](https://drive.google.com/uc?id=1mLkFgXA0drWp6nYL50n0BZv2Z13EA9CN)

Answer:
![](https://drive.google.com/uc?id=1YzEmQGrTPVcakPRG1X1Db2srdpEagQ3D)

---


#3. 

(8 points) Under which of these conditions does an ensemble classifier perform best? There can be more than one right answer, explain all of your responses.

- Low prediction correlation between base classifiers.
- High prediction correlation between base classifiers.
- Base classifiers have low variance.
- Base classifiers have high bias.
- Base classifiers have high variance.


Answer:

An ensemble classifier performs best when there is a low correlation between base classifiers as it increases the error-correcting capability of the ensemble. Usually weak learners are used in ensemble methods as they have low bias and low variance which prevents them from overfitting the training data.

---

#4.

(80 points) The goal of this problem is for you to implement backpropagation from scratch. You can make use of python libraries for handling the data and computation, but implement the actual activation and weight change calculations yourself.

Test your neural network using the MNIST dataset. Information on loading and storing this handwritten-digit dataset can be found at https://scikit-learn.org/stable/auto_examples/linear_model/plot_sparse_logistic_regression_mnist.html. Only consider digit classes '0' (which you can map onto value -1) and '1' (which you can map onto value 1). Train the network on a randomly-selected 2/3 of the data points and test on the remaining 1/3. You can report mean squared error or accuracy for the test data for a minimum of 10 epochs.


In [None]:
from math import exp
from random import random
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
import numpy as np

def initialize_network(n_inputs, n_hidden, n_outputs):
  network = list()  # initialize weights to random number in [0..1]
  hidden_layer = [{'weights':[random() for i in range(n_inputs+1)]} for i in range(n_hidden)]
  network.append(hidden_layer)
  output_layer = [{'weights':[random() for i in range(n_hidden+1)]} for i in range(n_outputs)]
  network.append(output_layer)
  return network


def activate(weights, inputs):
  activation = weights[-1]   # bias
  for i in range(len(weights)-1):
    activation += weights[i] * inputs[i]
  return activation


def transfer(activation): # sigmoid function
  return 1.0 / (1.0 + exp(-activation))


def forward_propagate(network, X, y):
  inputs = X
  for layer in network:
    new_inputs = []
    for node in layer:
      activation = activate(node['weights'], X)
      node['output'] = transfer(activation)
      new_inputs.append(node['output']) # output of one node input to another
    inputs = new_inputs
  return inputs   # return output from last layer

  
def transfer_derivative(output): # derivative of sigmoid function
  return output * (1.0 - output)


def backward_propagate_error(network, expected):
  for i in reversed(range(len(network))): # from output back to input layers
    layer = network[i]
    errors = list()
    if i != len(network)-1:  # not the output layer
      for j in range(len(layer)):
        error = 0.0
        for node in network[i+1]:
          error += (node['weights'][j] * node['delta'])
        errors.append(error)
    else:   # output layer
      for j in range(len(layer)):
        node = layer[j]
        errors.append(expected[j] - node['output'])
    for j in range(len(layer)):
      node = layer[j]
      node['delta'] = errors[j] * transfer_derivative(node['output'])


def update_weights(network, x, y, eta):
  for i in range(len(network)):
    inputs = x
    if i != 0:
      inputs = [node['output'] for node in network[i-1]]
    for node in network[i]:
      for j in range(len(inputs)):
        node['weights'][j] += eta * node['delta'] * inputs[j]
      node['weights'][-1] += eta * node['delta']


def train_network(network, X, y, eta, num_epochs, num_outputs):
  expected = np.full((2), 0)
  for epoch in range(num_epochs):
    sum_error = 0
    # There are two output nodes. The one corresponding to the correct label
    # should output 1, the other should output 0.
    for i in range(len(y)):
      outputs = forward_propagate(network, X[i], y[i])
      if y[i] == 0:
        expected[0] = 1
        expected[1] = 0
      else:
        expected[0] = 0
        expected[1] = 1
      sum_error += sum([(expected[i] - outputs[i])**2 for i in range(len(expected))])
      backward_propagate_error(network, expected)
      update_weights(network, X[i], y[i], eta)    
    print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, eta, sum_error))


def test_network(network, X, y, num_outputs):
  expected = np.full((2), 0)
  sum_error = 0
  # There are two output nodes. The one corresponding to the correct label
  # should output 1, the other should output 0.
  for i in range(len(y)):
    outputs = forward_propagate(network, X[i], y[i])
    if y[i] == 0:
      expected[0] = 1
      expected[1] = 0
    else:
      expected[0] = 0
      expected[1] = 1
    sum_error += sum([(expected[i] - outputs[i])**2 for i in range(len(expected))])
  print('mse of test data is', sum_error / float(len(y)))


if __name__ == "__main__":
  # Load data from https://www.openml.org/d/554
  features, targets = fetch_openml('mnist_784', version=1, return_X_y=True)
  X = []
  y = []
  for i in range(len(targets)):
    if targets[i] == '1' or targets[i] == '0':
      X.append(features[i])
      if targets[i] == '0':
        y.append(0)
      else:
        y.append(1)
  n_inputs = len(X[0])
  n_outputs = 2  # possible class values are '0' and '1'
  # Create a network with 1 hidden layer containing 2 nodes
  network = initialize_network(n_inputs, 2, n_outputs)
  X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.67, test_size=0.33)
  # train network for 10 epochs using learning rate of 0.1 
  train_network(network, X_train, y_train, 0.1, 10, n_outputs)
  for layer in network:
    print('layer \n', layer)
  test_network(network, X_test, y_test, n_outputs)

>epoch=0, lrate=0.100, error=4938.336
>epoch=1, lrate=0.100, error=4932.779
>epoch=2, lrate=0.100, error=4932.779
>epoch=3, lrate=0.100, error=4932.779
>epoch=4, lrate=0.100, error=4932.779
>epoch=5, lrate=0.100, error=4932.779
>epoch=6, lrate=0.100, error=4932.779
>epoch=7, lrate=0.100, error=4932.779
>epoch=8, lrate=0.100, error=4932.779
>epoch=9, lrate=0.100, error=4932.779
layer 
 [{'weights': [0.15315925438105815, 0.22121148977270944, 0.9230053149417814, 0.8841761509349702, 0.22544996190714628, 0.3213160634263741, 0.5709395849605104, 0.22949009122542618, 0.9764813265925268, 0.3623313423181016, 0.4980515574181761, 0.2073275160372403, 0.4546156471676992, 0.03807880062892677, 0.24086225454323074, 0.9750374957815325, 0.03824921695479422, 0.5585564502795652, 0.8205124093091326, 0.4294469364546788, 0.44755250034443195, 0.28754041343228254, 0.8168983185738857, 0.36183146261044674, 0.9873338755099496, 0.2038514250388539, 0.569750572558605, 0.779634269779047, 0.0364197319301377, 0.98871831