<a href="https://colab.research.google.com/github/QBlek/ML_practice/blob/main/MLpractice5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#1.

In most real world scenarios, data contain outliers. When using a support vector machine, outliers can be dealt with using a soft margin, specified in a slightly different optimization problem shown in Equation 7.38 in the text and called a soft-margin SVM.

Intuitively, where does a data point lie relative to the margin when $\zeta_i = 0$? Is this data point classified correctly?

Intuitively, where does a data point lie relative to the margin when $0 < \zeta_i \leq 1$? Is this data point classified correctly?

Intuitively, where does a data point lie relative to the margin when $\zeta_i > 1$? Is this data point classified correctly?

---

Answer: This answer focuses on positive examples (the same argument applies to negative examples where $y_i = -1$). Here, the slack $\zeta_i$ for example $i$ represents the distance from the example to the positive (upper-right) margin.

![](https://drive.google.com/uc?id=1YtpoLHCCxWlL7BpUR13Oc0R0ZmZdcmlv)

For $\zeta_i < 0$, the example is well outside the margin and correctly classified as a positive example. The SVM constrains all $\zeta_i \geq 0$, so this $\zeta_i$ would be set to zero in the minimization equation (7.38).

For $\zeta_i = 0$, the example is on the margin and correctly classified as a positive example. 

For $0 < \zeta_i < 1$, the example is inside the margin, but still correctly classified as a positive example.

For $\zeta_i = 1$, the example is right on the hyperplane. Typically points on the hyperplane are classified as positive, but it could go either way.

For $\zeta_i > 1$, the example is on the wrong side of the hyperplane, so it will be incorrectly classified as a negative example.

---

#2.

Suppose the two-layer neural network shown below processes the input (0, 1, 1, 0). If the actual output should be 0.2, show step-by-step how the vector of weights *v* will be updated using backpropagation and $\eta = 0.2$.

![](https://drive.google.com/uc?id=1mLkFgXA0drWp6nYL50n0BZv2Z13EA9CN)

---

Answer: The activation of the hidden units is calculated using the formula $a_i = w_i . x$, so the activation for hidden units 1, 2, and 3 (numbered top to bottom) is $tanh(1.0 \times 0 + 1.0 \times 1 + 1.0 \times 1 + 1.0 \times 0) = tanh(2.0) = 0.964$.

The output of the network is next calculated using the formula $v . h$, which equals $-0.4 \times 0.964 + 1.0 \times 0.964 + 0.2 \times 0.964 =  0.7712$. The error $e = 0.2 - 0.7712 = -0.5712$. The vector of weights *v* is updated using the formula $v = v - \eta g$ where $g = g - eh$. Here $g = 0.0 + 0.5712 \times 0.964 = 0.5506$ for the output layer and $v_1 = -0.4 - 0.2 \times 0.5506 = -0.5101$, $v_2 = 1.0 - 0.2 \times 0.5506 = 0.8899$, and $v_3 = 0.2 - 0.2 \times 0.5506 = 0.0899$.

To completely update the network we would next need to update the vector of weights $W$ using information that includes the error of the network, the previous $W$, the learning rate, and the output of the hidden node to which the edge points.

---

#3. 

Under which of these conditions does an ensemble classifier perform best? There can be more than one right answer, explain all of your responses.

- Low prediction correlation between base classifiers.
- High prediction correlation between base classifiers.
- Base classifiers have low variance.
- Base classifiers have high bias.
- Base classifiers have high variance.

---

Answer: A lower correlation among base classifiers will increase the error-correcting capability of the ensemble, so this situation is preferred.

Weak learners are typically used in ensemble methods. They have low variance and do not typically overfit the training data. They also typically have low bias.

---

#4.

The goal of this problem is for you to implement backpropagation from scratch. You can make use of python libraries for handling the data and computation, but implement the actual activation and weight change calculations yourself.

Test your neural network using the MNIST dataset. Information on loading and storing this handwritten-digit dataset can be found at https://scikit-learn.org/stable/auto_examples/linear_model/plot_sparse_logistic_regression_mnist.html. Only consider digit classes '0' (which you can map onto value -1) and '1' (which you can map onto value 1). Train the network on a randomly-selected 2/3 of the data points and test on the remaining 1/3. You can report mean squared error or accuracy for the test data for a minimum of 10 epochs.

Below this code I provide a separate backprop code, slightly simpler with just one output node, to classify breast cancer data. The second set of code more closely mirrors the Daume book.


In [None]:
from math import exp
from random import random
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
import numpy as np


def initialize_network(n_inputs, n_hidden, n_outputs):
  network = list()  # initialize weights to random number in [0..1]
  hidden_layer = [{'weights':[random() for i in range(n_inputs+1)]} for i in range(n_hidden)]
  network.append(hidden_layer)
  output_layer = [{'weights':[random() for i in range(n_hidden+1)]} for i in range(n_outputs)]
  network.append(output_layer)
  return network


def activate(weights, inputs):
  activation = weights[-1]   # bias
  for i in range(len(weights)-1):
    activation += weights[i] * inputs[i]
  return activation


def transfer(activation): # sigmoid function
  return 1.0 / (1.0 + exp(-activation))

"""
def transfer(activation): # tanh function
  return np.tanh(activation)
"""


def forward_propagate(network, X, y):
  inputs = X
  for layer in network:
    new_inputs = []
    for node in layer:
      activation = activate(node['weights'], X)
      node['output'] = transfer(activation)
      new_inputs.append(node['output']) # output of one node input to another
    inputs = new_inputs
  return inputs   # return output from last layer


def transfer_derivative(output): # derivative of sigmoid function
  return output * (1.0 - output)

"""
def transfer_derivative(output): # derivative of tanh function
  return 1.0 - np.tanh(output)**2
"""


def backward_propagate_error(network, expected):
  for i in reversed(range(len(network))): # from output back to input layers
    layer = network[i]
    errors = list()
    if i != len(network)-1:  # not the output layer
      for j in range(len(layer)):
        error = 0.0
        for node in network[i+1]:
          error += (node['weights'][j] * node['delta'])
        errors.append(error)
    else:   # output layer
      for j in range(len(layer)):
        node = layer[j]
        errors.append(expected[j] - node['output'])
    for j in range(len(layer)):
      network[i][j]['delta'] = errors[j] * transfer_derivative(node['output'])
  return network


def update_weights(network, x, eta):
  for i in range(len(network)):
    inputs = x
    if i != 0:
      inputs = [node['output'] for node in network[i-1]]
    for n in range(len(network[i])):
      node = network[i][n]
      for j in range(len(inputs)):
        network[i][n]['weights'][j] += eta * node['delta'] * inputs[j]
      network[i][n]['weights'][-1] += eta * node['delta']
  return network


def train_network(network, X, y, eta, num_epochs, num_outputs):
  expected = np.full((2), 0)
  for epoch in range(num_epochs):
    sum_error = 0
    # There are two output nodes. The one corresponding to the correct label
    # should output 1, the other should output -1.
    for i in range(len(y)):
      outputs = forward_propagate(network, X[i], y[i])
      if y[i] == 0:
        expected[0] = 1
        expected[1] = 0
      else:
        expected[0] = 0
        expected[1] = 1
      sum_error += sum([(expected[i] - outputs[i])**2 for i in range(len(expected))])
      network = backward_propagate_error(network, expected)
      network = update_weights(network, X[i], eta)
    print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, eta, sum_error))
  return network


def test_network(network, X, y, num_outputs):
  expected = np.full((2), 0)
  sum_error = 0
  # There are two output nodes. The one corresponding to the correct label
  # should output 1, the other should output -1.
  for i in range(len(y)):
    outputs = forward_propagate(network, X[i], y[i])
    if y[i] == 0:
      expected[0] = 1
      expected[1] = 0
    else:
      expected[0] = 0
      expected[1] = 1
    sum_error += sum([(expected[i] - outputs[i])**2 for i in range(len(expected))])
  print('mse of test data is', sum_error / float(len(y)))


def main_mnist():
  # Load data from https://www.openml.org/d/554
  features, targets = fetch_openml('mnist_784', version=1, return_X_y=True)
  X = []
  y = []
  for i in range(len(targets)):
    if targets[i] == '1' or targets[i] == '0':
      X.append(features[i])
      if targets[i] == '0':
        y.append(0)
      else:
        y.append(1)
  n_inputs = len(X[0])
  n_outputs = 2  # possible class values are '0' and '1'
  # Create a network with 1 hidden layer containing 2 nodes
  network = initialize_network(n_inputs, 2, n_outputs)
  X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.67, test_size=0.33)
  # train network for 10 epochs using learning rate of 0.1 
  network = train_network(network, X_train, y_train, 0.1, 10, n_outputs)
  for layer in network:
    print('layer \n', layer)
  test_network(network, X_test, y_test, n_outputs)

def main_bc():
  X, y = load_breast_cancer(return_X_y=True)
  n_inputs = len(X[0])
  n_outputs = 2  # possible class values are '0' and '1'
  # Create a network with 1 hidden layer containing 2 nodes
  network = initialize_network(n_inputs, 2, n_outputs)
  X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.67, test_size=0.33)
  # train network for 50 epochs using learning rate of 0.1 
  network = train_network(network, X_train, y_train, 0.1, 10, n_outputs)
  for layer in network:
    print('layer \n', layer)
  test_network(network, X_test, y_test, n_outputs)

if __name__ == "__main__":
  #main_mnist()
  main_bc()


>epoch=0, lrate=0.100, error=380.953
>epoch=1, lrate=0.100, error=380.943
>epoch=2, lrate=0.100, error=380.927
>epoch=3, lrate=0.100, error=380.896
>epoch=4, lrate=0.100, error=380.837
>epoch=5, lrate=0.100, error=380.714
>epoch=6, lrate=0.100, error=380.436
>epoch=7, lrate=0.100, error=379.758
>epoch=8, lrate=0.100, error=377.935
>epoch=9, lrate=0.100, error=372.557
layer 
 [{'weights': [0.6839030751490057, 0.4041748063424126, 0.9799659374311156, 0.6212718178274422, 0.10697068445709385, 0.31492660451447835, 0.20796602164147868, 0.03563923837928916, 0.31862346701535127, 0.37057017938335296, 0.709151182215734, 0.25169059795853543, 0.2830923083183624, 0.019467026791268043, 0.383262415915383, 0.13567089574731434, 0.7018507676733173, 0.3762958668792869, 0.587046885143515, 0.5906844704242051, 0.07962836212649402, 0.3714021341288188, 0.8725636654877679, 0.752166600686019, 0.5308187935440984, 0.9938019318895014, 0.43206430139074653, 0.06747437522328514, 0.061426909342788626, 0.597383031433212

In [None]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import math

def compute_activation(weights, input):
  activation = 0
  for i in range(len(weights)):
    activation += weights[i] * input[i]
  return activation

def transfer(activation): # sigmoid function
  return 1.0 / (1.0 + np.exp(-activation))

def deriv_transfer(activation):
  return activation * (1.0 - activation)

def forward_propagate(W, v, X):
  a = np.zeros((len(W)))  # hidden node values
  h = np.zeros((len(W)))  # hidden node values
  for i in range(len(W)):  # compute hidden node activation
    a[i] = compute_activation(W[i], X)
    h[i] = transfer(a[i])
  y_predict = compute_activation(v, h)
  return a, h, y_predict

def train_network(W, v, X, y, eta, num_epochs):
  for epoch in range(num_epochs):
    sum_error = 0
    for d in range(len(y)):
      G = np.zeros((len(W), len(X[0])))
      g = np.zeros((len(W)))
      a, h, y_predict = forward_propagate(W, v, X[d])
      error = y[d] - y_predict
      sum_error = sum_error + (error * error)
      for i in range(len(W)):
        g[i] = g[i] - error * h[i]
      for i in range(len(W)):   # hidden
        for j in range(len(X[0])):   # input
          G[i][j] = G[i][j] - error * v[i] * deriv_transfer(a[i]) * X[d][j]
      for i in range(len(W)):
        v[i] = v[i] - eta * g[i]
      for i in range(len(W)):
        for j in range(len(W[0])):
          W[i][j] = W[i][j] - eta * G[i][j]
    print('epoch', epoch, 'error', sum_error / float(len(y)))
  return W, v

# This implementation is designed for one output node
def test_network(W, v, X, y):
  sum_error = 0
  for i in range(len(y)):
    _, _, y_predict = forward_propagate(W, v, X[i])
    sum_error += (y[i] - y_predict)**2
  print('mse =', sum_error / float(len(y)))

def main_bc():
  X, y = load_breast_cancer(return_X_y=True)
  X = preprocessing.normalize(X, axis=0)
  num_input = len(X[0])
  num_hidden = 2
  # Define a network by two layers of random weights in range (0,1]
  W = np.random.uniform(low=-0.5, high=0.5, size=(num_hidden, num_input))
  v = np.random.uniform(low=-0.5, high=0.5, size=(num_hidden))
  X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.67, test_size=0.33)
  # train network for 50 epochs using learning rate of 0.1
  W, v = train_network(W, v, X_train, y_train, 0.01, 50)
  test_network(W, v, X_test, y_test)

if __name__ == "__main__":
  main_bc()