# Final Exam Second Semester 2566 - Neural Network (Dry Bean Problem)

This exam problem has an objective to develop a neural network model to classify the dry bean as belonging to one out of seven types (classes) from 16 features where

type 0 = Seker,

type 1 = Barbunya,

type 2 = Bombay,

type 3 = Cali,

type 4 = Horoz,

type 5 = Sira, and

type 6 = Dermosan

Cr: This dataset is adapted from KOKLU, M. and OZKAN, I.A., (2020), “Multiclass Classification of Dry Beans Using Computer Vision and Machine Learning Techniques.” Computers and Electronics in Agriculture, 174, 105507. DOI: https://doi.org/10.1016/j.compag.2020.105507

In [1]:
# used for manipulating directory paths
import os

# Scientific and vector computation for python
import numpy as np

# Plotting library
from matplotlib import pyplot

# Optimization module in scipy
from scipy import optimize

# library written for this exam
import utilsNN as utils

# tells matplotlib to embed plots within the notebook
%matplotlib inline

import random 
random.seed(10)

## We start the exam by first loading the dataset. 

In [2]:
# Load training dataset and test dataset

# Read tab separated data
data = np.loadtxt(os.path.join('Data', 'NN_BeanData_Train.txt'))

# First 16 columns of data are features and the last column is the label.
# Matrix X contains 16 features while vector y contains the label.

X, y = data[:, :16], data[:, 16].astype(int)

m = y.size  # number of training examples

# Read tab separated testing data
data_test = np.loadtxt(os.path.join('Data', 'NN_BeanData_Test.txt'))

X_test, y_test = data_test[:, :16], data_test[:, 16].astype(int)

In [3]:
X.shape, X_test.shape

((6000, 16), (1500, 16))

You have been provided with a set of initialized network parameters ($\Theta^{(1)}, \Theta^{(2)}$). These are stored in `InitBeanWeight1.txt` and `InitBeanWeight2.txt` which will be loaded in the next cell of this notebook into `Theta1` and `Theta2`. The parameters have dimensions that are sized for a neural network with 40 units in the second layer (hidden layer) and 7 output units (corresponding to 7 dry bean types).

In [4]:
# Load initiallized network parameters

Theta1 = np.loadtxt(os.path.join('Data', 'InitBeanWeight1.txt'))
Theta2 = np.loadtxt(os.path.join('Data', 'InitBeanWeight2.txt'))

# Unroll parameters 
# To unroll the matrix into vector (1-D array), we use `np.ravel()` 
nn_params = np.concatenate([np.ravel(Theta1), np.ravel(Theta2)])
initial_nn_params = nn_params

In [8]:
Theta1.shape, Theta2.shape

((40, 17), (7, 41))

### Initial parameters to be used in optimize.minimize

#### *** Do not initialize parameters by yourself in this exam problem. ***

In [9]:
initial_nn_params = nn_params

### Model representation

This neural network has 3 layers - an input layer, a hidden layer and an output layer. 

The inputs are **16** features of the dry beans.

The hidden layer has **40** neurons.

The outputs are **7** dry bean types (0 to 6).

The training data was loaded into the variables `X` and `y` above.

In [10]:
# Setup the parameters you will use for this exam by yourself!
input_layer_size  = 16
hidden_layer_size = 40
num_labels = 7

In [11]:
def sigmoid(z):
    """
    Compute sigmoid function given the input z.
    
    Parameters
    ----------
    z : array_like
        The input to the sigmoid function. This can be a 1-D vector 
        or a 2-D matrix. 
    
    Returns
    -------
    g : array_like
        The computed sigmoid function. g has the same shape as z, since
        the sigmoid is computed element-wise on z.
        
    Instructions
    ------------
    Compute the sigmoid of each value of z (z can be a matrix, vector or scalar).
    """
    # convert input to a numpy array
    z = np.array(z)
    
    # You need to return the following variables correctly 
    g = np.zeros(z.shape)

    # ====================== YOUR CODE HERE ======================
    g = 1 / (1 + np.exp(-z))

    # =============================================================
    return g

In [51]:
def nnCostFunction(nn_params, #weight
                   input_layer_size,
                   hidden_layer_size,
                   num_labels, #จำนวน output
                   X, y, lambda_=1):
    """
    Implements the neural network cost function and gradient for a one-hidden-layer neural 
    network which performs classification. 
    
    Parameters
    ----------
    nn_params : array_like
        The parameters for the neural network which are "unrolled" into 
        a vector. This needs to be converted back into the weight matrices Theta1
        and Theta2.
    
    input_layer_size : int
        Number of features for the input layer. 
    
    hidden_layer_size : int
        Number of hidden units in the second layer.
    
    num_labels : int
        Total number of labels, or equivalently number of units in output layer. 
    
    X : array_like
        Input dataset. A matrix of shape (m x input_layer_size).
    
    y : array_like
        Dataset labels. A vector of shape (m,).
    
    lambda_ : float, optional
        Regularization parameter.
 
    Returns
    -------
    J : float
        The computed value for the cost function at the current weight values.
        
    grad : array_like
        An "unrolled" vector of the partial derivatives of the concatenatation of
        neural network weights Theta1 and Theta2.
        
    Instructions
    ------------
    You should complete the code by working through the following parts.
    1) Section 1.3 Feedforward and cost function
    2) Section 1.4 Regularized cost function
    3) Section 2.3 Backpropagation
    4) Section 2.5 Regularized Neural Network
    
    Complete each part after you finish reading that part.
    
    """        

    # Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
    # for our one-hidden-layer neural network
    Theta1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)],
                        (hidden_layer_size, (input_layer_size + 1)))  # dimension = 25 X 401

    Theta2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size + 1)):],
                        (num_labels, (hidden_layer_size + 1)))  # dimension = 10 X 26

    # Setup some useful variables
    m = y.size
         
    # You need to return the following variables correctly 
    J = 0
    grad = []
    Theta1_grad = np.zeros(Theta1.shape)
    Theta2_grad = np.zeros(Theta2.shape)

    
    # ====================== Code for Section 1.3 Feedforward and cost function ======================
    
    #Forward propagation
    a1 = np.concatenate([np.ones((m, 1)), X], axis=1)
    z2 = np.dot(a1, Theta1.T) #dimension = 5000 x 25
    a2 = sigmoid(z2)
    a2 = np.concatenate([np.ones((m, 1)), a2], axis=1)  
    z3 = np.dot(a2, Theta2.T)
    h = sigmoid(z3) 
    
    #Cost Function
    y_matrix = np.eye(num_labels)[y]  #dimension = 3 x 10 #np.eye คือฟังก์ชัน identity matrix
    J = (-1 / m) * np.sum(y_matrix * np.log(h) + (1 - y_matrix) * np.log(1 - h))
    
    # ====================== End of Code for Section 1.3 Feedforward and cost function ======================
    #
    #
    # ====================== Code for Section 1.4 Regularized cost function ======================
    # Add regularization term to the cost function from Section 1.3
    # reg_term does not include the square of theta of bias units, 
    # For the matrices Theta1 and Theta2, this corresponds to the first column of each matrix.
    # Therefore we use [:, 1:] to square every theta except the first column of the matrix (the theta of bias units)
    
    reg_term = (lambda_ / (2*m)) * (np.sum(np.square(Theta1[:,1:])) + np.sum(np.square(Theta2[:,1:])))
    J += reg_term
    
    # ====================== End of Code for Section 1.4 Regularized cost function ======================
    #
    #
    # ====================== Code for Section 2.3 Backpropagation ======================
    delta_3 = h - y_matrix #dimension = 5000 x 10
    
    delta_2 = np.dot(delta_3,Theta2[:,1:]) * sigmoidGradient(z2) #dimension = 5000 x 25
    
    Delta1 = np.dot(delta_2.T,a1) #dimension = 25 x 401
    Delta2 = np.dot(delta_3.T,a2) #dimension = 10 x 26
    
    Theta1_grad = (1/m) * Delta1 #dimension = 25 x 401
    Theta2_grad = (1/m) * Delta2 #dimension = 10 x 26
    
    # To unroll the matrix into vector (1-D array), we use "np.roll()"
    grad = np.concatenate([np.ravel(Theta1_grad), np.ravel(Theta2_grad)])
    
    # ====================== End of Code for Section 2.3 Backpropagation ======================
    #
    #
    # ====================== Code for Section 2.5 Regularized Neural Network ======================

    Theta1_grad[1:] += (lambda_/m)*Theta1[1:]
    Theta2_grad[1:] += (lambda_/m)*Theta2[1:]
    
    grad = np.concatenate([np.ravel(Theta1_grad), np.ravel(Theta2_grad)])
    
    
    # ====================== End of Code for Section 2.5 Regularized Neural Network ======================
    
    return J, grad


    return J, grad

In [52]:
lambda_ = 0
J, _ = nnCostFunction(nn_params, input_layer_size, hidden_layer_size,
                   num_labels, X, y, lambda_)
J

4.898535771545263

In [53]:
lambda_ = 1
J, _ = nnCostFunction(nn_params, input_layer_size, hidden_layer_size,
                   num_labels, X, y, lambda_)
J

4.898583813225331

In [54]:
def sigmoidGradient(z):
    """
    Computes the gradient of the sigmoid function evaluated at z. 
    This should work regardless if z is a matrix or a vector. 
    In particular, if z is a vector or matrix, you should return
    the gradient for each element.
    
    Parameters
    ----------
    z : array_like
        A vector or matrix as input to the sigmoid function. 
    
    Returns
    --------
    g : array_like
        Gradient of the sigmoid function. Has the same shape as z. 
    
    Instructions
    ------------
    Compute the gradient of the sigmoid function evaluated at
    each value of z (z can be a matrix, vector or scalar).    

    """

    g = np.zeros(z.shape)

    # ====================== YOUR CODE HERE ======================
    g = sigmoid(z)*(1-sigmoid(z))


    # =============================================================
    return g

In [55]:
utils.predict

<function utilsNN.predict(Theta1, Theta2, X)>

In [56]:
train = utils.predict(Theta1, Theta2, X)
print('Training Set Accuracy: %f' % (np.mean(train == y) * 100))
test = utils.predict(Theta1, Theta2, X_test)
print('Training Set Accuracy: %f' % (np.mean(test == y_test) * 100))

Training Set Accuracy: 9.500000
Training Set Accuracy: 10.400000


In [64]:
#  After you have completed the assignment, change the maxfun to a larger
#  value to see how more training helps.
options= {'maxfun': 2000}

#  You should also try different values of lambda
lambda_ = 1

# Create "short hand" for the cost function to be minimized
costFunction = lambda p: nnCostFunction(p, input_layer_size,
                                        hidden_layer_size,
                                        num_labels, X, y, lambda_)

# Now, costFunction is a function that takes in only one argument
# (the neural network parameters)
res = optimize.minimize(costFunction,
                        initial_nn_params,
                        jac=True,
                        method='TNC',
                        options=options)

# get the solution of the optimization
nn_params = res.x
        
# Obtain the optimal Theta1 and Theta2 back from nn_params
Theta1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)],
                    (hidden_layer_size, (input_layer_size + 1)))

Theta2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size + 1)):],
                    (num_labels, (hidden_layer_size + 1)))

In [66]:
debug_J, _  = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lambda_ = 1)
debug_J

0.3982737247519125

In [65]:
train = utils.predict(Theta1, Theta2, X)
print('Training Set Accuracy: %f' % (np.mean(train == y) * 100))
test = utils.predict(Theta1, Theta2, X_test)
print('Training Set Accuracy: %f' % (np.mean(test == y_test) * 100))

Training Set Accuracy: 93.700000
Training Set Accuracy: 93.333333


In [67]:
#  After you have completed the assignment, change the maxfun to a larger
#  value to see how more training helps.
options= {'maxfun': 2000}

#  You should also try different values of lambda
lambda_ = 2

# Create "short hand" for the cost function to be minimized
costFunction = lambda p: nnCostFunction(p, input_layer_size,
                                        hidden_layer_size,
                                        num_labels, X, y, lambda_)

# Now, costFunction is a function that takes in only one argument
# (the neural network parameters)
res = optimize.minimize(costFunction,
                        initial_nn_params,
                        jac=True,
                        method='TNC',
                        options=options)

# get the solution of the optimization
nn_params = res.x
        
# Obtain the optimal Theta1 and Theta2 back from nn_params
Theta1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)],
                    (hidden_layer_size, (input_layer_size + 1)))

Theta2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size + 1)):],
                    (num_labels, (hidden_layer_size + 1)))

In [68]:
debug_J, _  = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lambda_ = 2)
debug_J

0.45100847716976517

In [69]:
train = utils.predict(Theta1, Theta2, X)
print('Training Set Accuracy: %f' % (np.mean(train == y) * 100))
test = utils.predict(Theta1, Theta2, X_test)
print('Training Set Accuracy: %f' % (np.mean(test == y_test) * 100))

Training Set Accuracy: 93.416667
Training Set Accuracy: 93.000000


### End of Neural Network Problem