# Machine learning project  

## Introduction

In this project, an implementation of the backpropagation algorithm for neural networks is made and applied to the task of [`micro_PCB`](https://www.kaggle.com/frettapper/micropcb-images) recognition. This project is made for the machine learning course [`Machine learning`](https://uhintra03.uhasselt.be/studiegidswww/opleidingsonderdeel.aspx?a=2021&i=4483&n=4&t=04) given at the joint training of kuleuven and uhasselt. The authors of this project are Molenaers Arno and Purnal Lennert.


The libraries that need to be imported for this project are the following:
- [`numpy`](http://www.numpy.org/) for all arrays and matrix operations.
- [`matplotlib`](https://matplotlib.org/) for plotting.
- [`scipy`](https://docs.scipy.org/doc/scipy/reference/) for scientific and numerical computation functions and tools.
- [`csv`](https://docs.python.org/3/library/csv.html) for importing the image data
- [`utils`]() utilities from exercise 4 from the coursera machine learning course
- [`math`](https://docs.python.org/3/library/math.html) math library used for functions like square root

In [3]:
# used for manipulating directory paths
import os

# Scientific and vector computation for python
import numpy as np

# math library 
import math

# Plotting library
from matplotlib import pyplot

# Optimization module in scipy
from scipy import optimize

# Used for imorting csv data
import csv

# utilies library from the exercises of the machine learning course
import utils

# tells matplotlib to embed plots within the notebook
%matplotlib inline

## importing the data
the training data is imported from a csv file. The X matrix contains the input features as a '6500x7500' matrix. y is a matrix containing the labels, for the neural network each label is encoded as a 13 dimensional vector with a 1 at the corresponding correct label and the other elements set to 0, making y a '6500x13' vector.
> do not forget to specify the correct path to the csv file

In [4]:
X = np.zeros((6500,7500))
y = np.zeros((6500,13))
alpha = ['A','B','C','D','E','F','G','H','I','J','K','L','M']

with open('../Channeldata.csv', newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    i = 0
    for row in reader:
        X[i,:] = row[1:]
        # find the position of the label in the alphabet
        j = alpha.index(row[0][0])
        # set the label to 1 at that position
        y[i,j] = 1
        i += 1

## cost function 
in the next block the cost function for this dataset is initialized. this is done using the following cost function for neural networks:

$$ J(\theta) = \frac{1}{m} \sum_{i=1}^{m}\sum_{k=1}^{K} \left[ - y_k^{(i)} \log \left( \left( h_\theta \left( x^{(i)} \right) \right)_k \right) - \left( 1 - y_k^{(i)} \right) \log \left( 1 - \left( h_\theta \left( x^{(i)} \right) \right)_k \right) \right] + \frac{\lambda}{2 m} \left[ \sum_{j=1}^{sl+1} \sum_{k=1}^{sl} \left( \Theta_{j,k}^{(1)} \right)^2 + \sum_{j=1}^{sl+1} \sum_{k=1}^{sl} \left( \Theta_{j,k}^{(2)} \right)^2 \right] $$

we also first initialize the sigmoidgradient function

In [5]:
def sigmoidGradient(z):
    """
    Computes the gradient of the sigmoid function evaluated at z. 
    This should work regardless if z is a matrix or a vector. 
    In particular, if z is a vector or matrix, you should return
    the gradient for each element.
    
    Parameters
    ----------
    z : array_like
        A vector or matrix as input to the sigmoid function. 
    
    Returns
    --------
    g : array_like
        Gradient of the sigmoid function. Has the same shape as z. 
    
    
    """

    g = np.zeros(z.shape)

    g = utils.sigmoid(z)*(1-utils.sigmoid(z))

    return g

In [6]:
def nnCostFunction(nn_params,
                   input_layer_size,
                   hidden_layer_size,
                   num_labels,
                   X, y, lambda_=0.0):
    """
    Implements the neural network cost function and gradient for a two layer neural 
    network which performs classification. 
    
    Parameters
    ----------
    nn_params : array_like
        The parameters for the neural network which are "unrolled" into 
        a vector. This needs to be converted back into the weight matrices Theta1
        and Theta2.
    
    input_layer_size : int
        Number of features for the input layer. 
    
    hidden_layer_size : int
        Number of hidden units in the second layer.
    
    num_labels : int
        Total number of labels, or equivalently number of units in output layer. 
    
    X : array_like
        Input dataset. A matrix of shape (m x input_layer_size).
    
    y : array_like
        Dataset labels. A vector of shape (m,num_labels).
    
    lambda_ : float, optional
        Regularization parameter.
 
    Returns
    -------
    J : float
        The computed value for the cost function at the current weight values.
    
    grad : array_like
        An "unrolled" vector of the partial derivatives of the concatenatation of
        neural network weights Theta1 and Theta2.
    
    Note 
    ----
    We have provided an implementation for the sigmoid function in the file 
    `utils.py` accompanying this assignment.
    """
    # Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
    # for our 2 layer neural network
    Theta1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)],
                        (hidden_layer_size, (input_layer_size + 1)))

    Theta2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size + 1)):],
                        (num_labels, (hidden_layer_size + 1)))

    # Setup some useful variables
    m = y.shape[0]
         
    # You need to return the following variables correctly 
    J = 0
    Theta1_grad = np.zeros(Theta1.shape)
    Theta2_grad = np.zeros(Theta2.shape)

    # part 1: feedforward   
    # adding column of 1's to X (bias terms)
    a1 = np.concatenate([np.ones((m,1)),X], axis=1)
    
    z2 = np.dot(a1, Theta1.T)
    a2 = utils.sigmoid(z2)
    a2 = np.concatenate([np.ones((a2.shape[0], 1)), a2], axis=1)
    
    z3 = np.dot(a2, Theta2.T)
    h = utils.sigmoid(z3)
                         
    J = 1/m * np.sum(-y*np.log(h)-(1-y)*np.log(1-h))
    # add regularization
    J = J + (lambda_ / (2*m)) * (np.sum(Theta1[:, 1:]**2) +np.sum(Theta2[:, 1:]**2))
    
    # part 2: backpropagation
    delta3 = h - y
    delta2 = np.dot(delta3, Theta2)[:, 1:] * sigmoidGradient(z2)
    
    DELTA1 = np.dot(delta2.T, a1)
    DELTA2 = np.dot(delta3.T, a2)
    
    
    Theta1_grad = 1/m * DELTA1
    Theta1_grad[:, 1:] = Theta1_grad[:, 1:] + (lambda_ / m) * Theta1[:, 1:]
    Theta2_grad = 1/m * DELTA2
    Theta2_grad[:, 1:] = Theta2_grad[:, 1:] + (lambda_ / m) * Theta2[:, 1:]
    
    # Unroll gradients
    # grad = np.concatenate([Theta1_grad.ravel(order=order), Theta2_grad.ravel(order=order)])
    grad = np.concatenate([Theta1_grad.ravel(), Theta2_grad.ravel()])

    return J, grad

## initializing the theta parameters
This must be done in a random way to avoid symmetry. 

In [7]:
def randInitializeWeights(L_in, L_out, epsilon_init=0.12):
    """
    Randomly initialize the weights of a layer in a neural network.
    
    Parameters
    ----------
    L_in : int
        Number of incomming connections.
    
    L_out : int
        Number of outgoing connections. 
    
    epsilon_init : float, optional
        Range of values which the weight can take from a uniform 
        distribution.
    
    Returns
    -------
    W : array_like
        The weight initialiatized to random values.  Note that W should
        be set to a matrix of size(L_out, 1 + L_in) as
        the first column of W handles the "bias" terms.
    """

    # You need to return the following variables correctly 
    W = np.zeros((L_out, 1 + L_in))

    W = np.random.rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init

    return W

The above function requires us to pass the layer sizes. These are initialized below and the function is then executed for each layer. The theta vectors for each layer are then combined and unrolled.
The value for $\epsilon_{init}$ is chosen with the following function.

$$\epsilon_{init} = \frac{\sqrt{6}}{\sqrt{L_{in} + L_{out}}}$$

In [8]:
input_layer_size = X.shape[1]
hidden_layer_size = 7500
num_labels = y.shape[1]

eps1 = math.sqrt(6)/math.sqrt(input_layer_size + hidden_layer_size)
eps2 = math.sqrt(6)/math.sqrt(hidden_layer_size + num_labels)
print('epsilon init 1 = ' + str(eps1))
print('epsilon init 2 = ' + str(eps2))

initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size, eps1)
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels, eps2)

initial_nn_params = np.concatenate([initial_Theta1.ravel(), initial_Theta2.ravel()], axis=0)

epsilon init 1 = 0.019999999999999997
epsilon init 2 = 0.02825979003336604


## learing the parameters with `scipy.optimize.minimize`
next we will use the scipy.optimize.minimize function to minimize the randomly initialized parameters.

In [9]:
# we initialize the minimize function options
options = {'maxiter':100}

#choose a lambda
lambda_ = 1

#creating a lambda function for the cost function
costFunction = lambda p: nnCostFunction(p, input_layer_size,
                                        hidden_layer_size,
                                        num_labels, X, y, lambda_)

# execute the optimization function
res = optimize.minimize(costFunction,
                        initial_nn_params,
                        jac=True,
                        method='TNC',
                        options=options)

# get the solution parameters
nn_params = res.x

#reshape nn_params to retrieve the seperate theta matrixes 
Theta1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)],
                    (hidden_layer_size, (input_layer_size + 1)))

Theta2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size + 1)):],
                    (num_labels, (hidden_layer_size + 1)))

  return 1.0 / (1.0 + np.exp(-z))


## checking the cost function and the gradients
next we will verify the outcome of the cost function

In [10]:
lambda_ = 1
J, grads = nnCostFunction(nn_params, input_layer_size, hidden_layer_size,
                      num_labels, X, y, lambda_)

print('Cost at parameters: %.6f' % J)

Cost at parameters: 1.946889


## save the theta values to csv file
after minimizing theta it might be useful to save the values to a csv. The naming of the output files is `nnParametersxLy.csv` where x is the number of layers and y is the number of nodes in the hidden layer(s)
> do not forget to set the correct name/path for the output file as pleased

In [None]:
with open('../nnParameters3L7500.csv', 'w', newline='') as csvfile:
        writer = csv.writer(csvfile, delimiter=',')
        writer.writerow(nn_params)

In [11]:
print(nn_params)

[ 0.01269617  0.01514906 -0.00621697 ...  0.01981407 -0.02612224
 -0.00453004]
