# Machine learning project  

## Introduction

In this project, an implementation of the backpropagation algorithm for neural networks is made and applied to the task of [`micro_PCB`](https://www.kaggle.com/frettapper/micropcb-images) recognition. This project is made for the machine learning course [`Machine learning`](https://uhintra03.uhasselt.be/studiegidswww/opleidingsonderdeel.aspx?a=2021&i=4483&n=4&t=04) given at the joint training of kuleuven and uhasselt. The authors of this project are Molenaers Arno and Purnal Lennert.


The libraries that need to be imported for this project are the following:
- [`numpy`](http://www.numpy.org/) for all arrays and matrix operations.
- [`matplotlib`](https://matplotlib.org/) for plotting.
- [`scipy`](https://docs.scipy.org/doc/scipy/reference/) for scientific and numerical computation functions and tools.
- [`csv`](https://docs.python.org/3/library/csv.html) for importing the image data
- [`utils`]() utilities from exercise 4 from the coursera machine learning course
- [`math`](https://docs.python.org/3/library/math.html) math library used for functions like square root

In [2]:
# used for manipulating directory paths
import os

# Scientific and vector computation for python
import numpy as np

# math library 
import math

# Plotting library
from matplotlib import pyplot

# Optimization module in scipy
from scipy import optimize

# Used for imorting csv data
import csv

# utilies library from the exercises of the machine learning course
import utils

# custom made functions for this Machine Learning dataset
import customUtils as cu

# tells matplotlib to embed plots within the notebook
%matplotlib inline

## importing the training data
the training data is imported from a csv file. The X matrix contains the input features as a '6500x7500' matrix. y is a matrix containing the labels, for the neural network each label is encoded as a 13 dimensional vector with a 1 at the corresponding correct label and the other elements set to 0, making y a '6500x13' vector.
> do not forget to specify the correct path to the csv file and to set the size of the imported data as well as the resolution of the images

In [3]:
X, y = cu.importImageDataFromCSV('../channeldata/channeldata50x50train.csv', data_size=6500)

## cost function 
The cost funtion used is as follows:

$$ J(\theta) = \frac{1}{m} \sum_{i=1}^{m}\sum_{k=1}^{K} \left[ - y_k^{(i)} \log \left( \left( h_\theta \left( x^{(i)} \right) \right)_k \right) - \left( 1 - y_k^{(i)} \right) \log \left( 1 - \left( h_\theta \left( x^{(i)} \right) \right)_k \right) \right] + \frac{\lambda}{2 m} \left[ \sum_{j=1}^{sl+1} \sum_{k=1}^{sl} \left( \Theta_{j,k}^{(1)} \right)^2 + \sum_{j=1}^{sl+1} \sum_{k=1}^{sl} \left( \Theta_{j,k}^{(2)} \right)^2 \right] $$


## initializing the theta parameters and hyperparameters
The theta parameters must be initialized in a random way to avoid symmetry.

In the random initialize function the following value for epsilon is used:

$$\epsilon_{init} = \frac{\sqrt{6}}{\sqrt{L_{in} + L_{out}}}$$

In [4]:
num_layers = 3
input_layer_size = X.shape[1]
hidden_layer_size = 5000
num_labels = y.shape[1]

#choose a lambda
lambda_ = 1

In [5]:
eps1 = math.sqrt(6)/math.sqrt(input_layer_size + hidden_layer_size)
eps2 = math.sqrt(6)/math.sqrt(hidden_layer_size + num_labels)
print('epsilon init 1 = ' + str(eps1))
print('epsilon init 2 = ' + str(eps2))

initial_Theta1 = cu.randInitializeWeights(input_layer_size, hidden_layer_size, eps1)
initial_Theta2 = cu.randInitializeWeights(hidden_layer_size, num_labels, eps2)

initial_nn_params = np.concatenate([initial_Theta1.ravel(), initial_Theta2.ravel()], axis=0)

epsilon init 1 = 0.02190890230020664
epsilon init 2 = 0.03459607045552276


## learing the parameters with `scipy.optimize.minimize`
next we will use the scipy.optimize.minimize function to minimize the randomly initialized parameters.

In [10]:
# we initialize the minimize function options
options = {'maxiter':50}

#creating a lambda function for the cost function
costFunction = lambda p: cu.nnCostFunction(p, input_layer_size,
                                        hidden_layer_size,
                                        num_labels, X, y, lambda_)

# execute the optimization function
res = optimize.minimize(costFunction,
                        initial_nn_params,
                        jac=True,
                        method='TNC',
                        options=options)

# get the solution parameters
nn_params = res.x

#reshape nn_params to retrieve the seperate theta matrixes 
Theta1, Theta2 = cu.retrieveThetas(nn_params, input_layer_size, hidden_layer_size, num_labels)

## checking the cost function and the gradients
next we will verify the outcome of the cost function

In [9]:
J, grads = cu.nnCostFunction(nn_params, input_layer_size, hidden_layer_size,
                      num_labels, X, y, lambda_)

print('Cost at parameters: %.6f' % J)

NameError: name 'nn_params' is not defined

## try the trained neural network on the training data
the trained parameters are used to recognize the `micro PCB` in the images from the training data itself. This is a first test to check if the trained parameters can be correct. To get an actual idea of the accuracy of the neural network, check te next segment where the same predictions are done on the test data.

In [None]:
#retrieve the theta's from nn_params
Theta1, Theta2 = cu.retrieveThetas(nn_params, input_layer_size, hidden_layer_size, num_labels)

In [15]:
pred = utils.predict(Theta1, Theta2, X)
y_vec = np.zeros(y.shape[0])


for j in range(0,y.shape[0]):
    y_vec[j] = np.where(y[j,:] == 1)[0][0]
    print("y = " + str(y_vec[j]) + " <=> pred = " + str(pred[j]))

print('Training Set Accuracy: %f' % (np.mean(pred == y_vec) * 100))

y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 3
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 3
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 7
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 9
y = 0.0 <=> pred = 9
y = 0.0 <=> pred = 9
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 0
y = 0.0 <=> p

y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 2
y = 7.0 <=> pred = 2
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 12
y = 7.0 <=> pred = 12
y = 7.0 <=> pred = 12
y = 7.0 <=> pred = 12
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 3
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 0
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 4
y = 7.0 <=> pred = 7
y = 7.0 <=> pred = 6
y = 7.0 <=> pred = 6
y = 7.0 <

## try the trained neural network on the test data
first the test data is imported. Next the accuracy of the neural network is tested on the test data.

In [5]:
X_test, y_test = cu.importImageDataFromCSV('../channeldata/channeldata50x50test.csv', data_size=1625)

In [18]:
pred_test = utils.predict(Theta1, Theta2, X_test)
y_test_vec = np.zeros(y_test.shape[0])

print(y_test_vec.size)
print(y_test.shape[0])

for j in range(y_test_vec.shape[0]):
    y_test_vec[j] = np.where(y_test[j,:] == 1)[0][0]
    print("y = " + str(y_test_vec[j]) + " <=> pred = " + str(pred_test[j]))

print('Test Set Accuracy: %f' % (np.mean(pred_test == y_test_vec) * 100))

1625
1625
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 5
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 7
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 3
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 12
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 9
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 6
y = 0.0 <=> pred = 2
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 0
y = 0.0 <=> pred = 3
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 1
y = 0.0 <=> pred = 3
y 

## save/retrieve the theta values to/from csv file
after minimizing theta it might be useful to save the values to a csv. The naming of the output files is `nnParameters_xL_y_lmZ.csv` where x is the number of layers and y is the number of nodes in the hidden layer(s) and Z is the chosen lambda.
> do not forget to set the correct name/path for the output file as pleased

In [38]:
filePath = '../nnParameters/nnParameters_' + str(num_layers) +'L_' + str(hidden_layer_size) + '_lm' + str(lambda_) +'.csv'
cu.ThetasToCSV(filePath, Theta1, Theta2)

(37570013,)


In [6]:
filePath = '../nnParameters/nnParameters_' + str(num_layers) +'L_' + str(hidden_layer_size) + '_lm' + str(lambda_) +'.csv'
Theta1, Theta2 = cu.ThetasFromCSV(filePath, initial_Theta1.shape, initial_Theta2.shape)