# Using different optimizers for Neural Network
In this part, we will change the optimizers for Neural Network, which may receive a better weights updating process.

We choose digits data in sklearn and Bank Marketing data from UCI as the datasets which we used in our assignments to compare the performance. The basic algorithm is based on the code of Assignment 8.

## 1. View the implement of Neural Network in Assignment 8

Firstly, we used Neural Network in Assignment 8 to see what accuracy would get for digits data and Fashion MNIST data.

### Load digits data

We used digits data from sklearn.datasets.

In [1]:
from sklearn.datasets import load_digits 
from sklearn.preprocessing import StandardScaler  
from sklearn.model_selection import train_test_split  
from sklearn.metrics import accuracy_score
import keras

import numpy as np
import numpy.random as r
import random

digits = load_digits()
X = digits.data
y = digits.target

Using TensorFlow backend.


### Preprocess data set

The training features range from 0 to 15. To help the algorithm converge, we will scale the data to have a mean of 0 and unit variance.

In [2]:
X_scale = StandardScaler()
X = X_scale.fit_transform(digits.data)

Then we used sklearn to split data into train data and test data.

In [3]:
# We will use sklearn's method for seperating the data
# This part of code is based on assignment 3
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Looking at the train/test split
print("The number of training examples: ", X_train.shape[0])
print("The number of test exampels: ", X_test.shape[0])

The number of training examples:  1347
The number of test exampels:  450


### One hot encoding

Our target is an integer in the range [0,..,9], so we would have 10 output neuron's in our network. We changed the label based on below code.

In [4]:
num_classes = 10
y_v_train = keras.utils.to_categorical(y_train, num_classes)
y_v_test = keras.utils.to_categorical(y_test, num_classes)

# A quick check to see that our code performs as we expect
print(y_train[0:4])
print(y_v_train[0:4])

[2 8 9 7]
[[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]]


### Build Neural Network

This part of code is almost the same as code in Assignment 8. We chose Sigmoid, and initialized weights randomly.

In [5]:
# The activation function and its derivative
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_d(z):
    return sigmoid(z) * (1 - sigmoid(z))


# Creating and initialing W and b
def setup_and_init_weights(nn_structure):
    W = {} # creating a dictionary i.e. a set of key: value pairs
    b = {}
    for l in range(1, len(nn_structure)):
        W[l] = r.random_sample((nn_structure[l], nn_structure[l-1])) # Return “continuous uniform” random floats in the half-open interval [0.0, 1.0). 
        b[l] = r.random_sample((nn_structure[l],))
    return W, b


Initializing $\triangledown W$ and $\triangledown b$

In [6]:
def init_tri_values(nn_structure):
    tri_W = {}
    tri_b = {}
    for l in range(1, len(nn_structure)):
        tri_W[l] = np.zeros((nn_structure[l], nn_structure[l-1]))
        tri_b[l] = np.zeros((nn_structure[l],))
    return tri_W, tri_b

Perform a forward pass throught the network. The function returns the values of  𝑎  and  𝑧

In [7]:
def feed_forward(x, W, b):
    a = {1: x} # create a dictionary for holding the a values for all levels
    z = { } # create a dictionary for holding the z values for all the layers
    for l in range(1, len(W) + 1): # for each layer
        node_in = a[l]
        z[l+1] = W[l].dot(node_in) + b[l]  # z^(l+1) = W^(l)*a^(l) + b^(l)
        a[l+1] = sigmoid(z[l+1]) # a^(l+1) = f(z^(l+1))
        
    return a, z

Compute $\delta$

In [8]:
def calculate_out_layer_delta(y, a_out, z_out):
    return -(y-a_out) * sigmoid_d(z_out) 

def calculate_hidden_delta(delta_plus_1, w_l, z_l):
    return np.dot(np.transpose(w_l), delta_plus_1) * sigmoid_d(z_l)

The Back Propagation Algorithm. Here we used SGD instead of BGD to make training step more quickly.

In [9]:
def train_nn(nn_structure, X, y, iter_num=3000, alpha=0.25, lamb = 0.01):
    W, b = setup_and_init_weights(nn_structure)
    cnt = 0
    N = len(y)
    avg_cost_func = []
    print('Starting gradient descent for {} iterations'.format(iter_num))
    while cnt < iter_num:
        if cnt%1000 == 0:
            print('Iteration {} of {}'.format(cnt, iter_num))
        tri_W, tri_b = init_tri_values(nn_structure)
        avg_cost = 0
        i = random.randint(0, N-1)
        
        delta = {}
        # perform the feed forward pass and return the stored a and z values, to be used in the
        # gradient descent step
        a, z = feed_forward(X[i, :], W, b)
        # loop from nl-1 to 1 backpropagating the errors
        for l in range(len(nn_structure), 0, -1):
            if l == len(nn_structure):
                delta[l] = calculate_out_layer_delta(y[i,:], a[l], z[l])
                avg_cost += np.linalg.norm((y[i,:]-a[l]))
            else:
                if l > 1:
                    delta[l] = calculate_hidden_delta(delta[l+1], W[l], z[l])
                tri_W[l] += np.dot(delta[l+1][:,np.newaxis], np.transpose(a[l][:,np.newaxis]))# np.newaxis increase the number of dimensions
                tri_b[l] += delta[l+1]
        # perform the gradient descent step for the weights in each layer
        for l in range(len(nn_structure) - 1, 0, -1):
            # add regularization
            W[l] += -alpha * (1.0 * tri_W[l] + lamb/2 * W[l])
            b[l] += -alpha * (1.0 * tri_b[l] + lamb/2 * b[l])
        # complete the average cost calculation
        avg_cost = 1.0 * avg_cost
        avg_cost_func.append(avg_cost)
        cnt += 1
    return W, b, avg_cost_func


def predict_y(W, b, X, n_layers):
    N = X.shape[0]
    y = np.zeros((N,))
    for i in range(N):
        a, z = feed_forward(X[i, :], W, b)
        y[i] = np.argmax(a[n_layers])
    return y

### Run the Neural Network

The architecture is the same as the Neural Network in Assignment 8. The input layer will have 64 neurons (one for each pixel in our 8 by 8 pixelated digit).  Our hidden layer has 30 neurons (you can change this value).  The output layer has 10 neurons.

In [35]:
nn_structure = [64, 30, 10]
    
# train the NN
W, b, avg_cost_func = train_nn(nn_structure, X_train, y_v_train, 30000, 0.25, 0.01)

Starting gradient descent for 30000 iterations
Iteration 0 of 30000
Iteration 1000 of 30000
Iteration 2000 of 30000
Iteration 3000 of 30000
Iteration 4000 of 30000
Iteration 5000 of 30000
Iteration 6000 of 30000
Iteration 7000 of 30000
Iteration 8000 of 30000
Iteration 9000 of 30000
Iteration 10000 of 30000
Iteration 11000 of 30000
Iteration 12000 of 30000
Iteration 13000 of 30000
Iteration 14000 of 30000
Iteration 15000 of 30000
Iteration 16000 of 30000
Iteration 17000 of 30000
Iteration 18000 of 30000
Iteration 19000 of 30000
Iteration 20000 of 30000
Iteration 21000 of 30000
Iteration 22000 of 30000
Iteration 23000 of 30000
Iteration 24000 of 30000
Iteration 25000 of 30000
Iteration 26000 of 30000
Iteration 27000 of 30000
Iteration 28000 of 30000
Iteration 29000 of 30000


### Check the accuracy

In [36]:
y_pred = predict_y(W, b, X_test, 3)
print('Prediction accuracy (digits data) is {0:.5}%'.format(accuracy_score(y_test, y_pred) * 100))

Prediction accuracy (digits data) is 85.556%


### Try on Bank Marketing data

Fisrtly, we loaded this dataset and handled categorical values. The idea was from https://becominghuman.ai/multi-layer-perceptron-mlp-models-on-real-world-banking-data-f6dd3d7e998f

In [12]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('bank-additional-full.csv',sep=';')

LE = LabelEncoder()
# Work on the Categorical Values 
df['job_code'] = LE.fit_transform(df['job'])
df['marital_code'] = LE.fit_transform(df['marital'])
df['education_code'] = LE.fit_transform(df['education'])
df['housing_code'] = LE.fit_transform(df['housing'])
df['loan_code'] = LE.fit_transform(df['loan'])
df['contact_code'] = LE.fit_transform(df['contact'])
df['poutcome_code'] = LE.fit_transform(df['poutcome'])
df['subscribed'] = LE.fit_transform(df['y'])
# Drop categorical columns 
df = df.drop(['job','marital','education','housing','loan','contact','poutcome','y','day_of_week','month','default'] ,axis=1)


Then we splited data into train data and test data. For label, we did the ont-hot encoding.

In [13]:
# Get data
X_2 = df.drop('subscribed',axis=1)
y_2 = df['subscribed']


scaler = StandardScaler()
scaler.fit(X_2)
X_2 = scaler.transform(X_2)

X_train_2, X_test_2, y_train_2, y_test_2 = train_test_split(X_2, y_2, random_state=0)

# Looking at the train/test split
print("The number of training examples: ", X_train_2.shape[0])
print("The number of test exampels: ", X_test_2.shape[0])

# one-hot encoding
num_classes = 2
y_v_train_2 = keras.utils.to_categorical(y_train_2, num_classes)
y_v_test_2 = keras.utils.to_categorical(y_test_2, num_classes)

# A quick check to see that our code performs as we expect
print(y_train_2[0:4])
print(y_v_train_2[0:4])

The number of training examples:  30891
The number of test exampels:  10297
10685    1
224      0
29638    0
4804     0
Name: subscribed, dtype: int64
[[0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]]


### Run the Neural Network

The input layer will have 17 neurons (one for each pixel in our 8 by 8 pixelated digit).  Our hidden layer has 30 neurons. The output layer has 2 neurons.

In [14]:
nn_structure = [17, 30, 2]
    
# train the NN
W, b, avg_cost_func = train_nn(nn_structure, X_train_2, y_v_train_2, 30000, 0.25, 0.01)

Starting gradient descent for 30000 iterations
Iteration 0 of 30000
Iteration 1000 of 30000
Iteration 2000 of 30000
Iteration 3000 of 30000
Iteration 4000 of 30000
Iteration 5000 of 30000
Iteration 6000 of 30000
Iteration 7000 of 30000
Iteration 8000 of 30000
Iteration 9000 of 30000
Iteration 10000 of 30000
Iteration 11000 of 30000
Iteration 12000 of 30000
Iteration 13000 of 30000
Iteration 14000 of 30000
Iteration 15000 of 30000
Iteration 16000 of 30000
Iteration 17000 of 30000
Iteration 18000 of 30000
Iteration 19000 of 30000
Iteration 20000 of 30000
Iteration 21000 of 30000
Iteration 22000 of 30000
Iteration 23000 of 30000
Iteration 24000 of 30000
Iteration 25000 of 30000
Iteration 26000 of 30000
Iteration 27000 of 30000
Iteration 28000 of 30000
Iteration 29000 of 30000


### Check the accuracy

In [15]:
y_pred = predict_y(W, b, X_test_2, 3)
print('Prediction accuracy (Bank Marketing data) is {0:.5}%'.format(accuracy_score(y_test_2, y_pred) * 100))

Prediction accuracy (Bank Marketing data) is 90.007%


## 2. Try RMSprop optimizer based on Keras

We used keras to implement a Neural Network added RMSprop optimizer to see whether there would be some improment of accuracy.

### Build Neural Network

Firstly, we tried on digits data

In [16]:
import keras
from keras.models import Sequential
from keras.layers import Dense


# hyperparameter. They are the same as before 
batch_size = 1
num_classes = 10
epochs = 5
input_shape = X_train.shape[1]

model = Sequential()
model.add(Dense(30, activation='sigmoid', input_dim=input_shape))
model.add(Dense(num_classes, activation='sigmoid'))
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.RMSprop(rho=0.9),
              metrics=['accuracy'])

### Train Neural Network

In [17]:
model.fit(X_train, y_v_train, batch_size=batch_size, epochs=epochs, verbose=1)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x1a3183e590>

### Check the accuracy on test data

In [18]:
score = model.evaluate(X_test, y_v_test, verbose=0)
print('Prediction accuracy (digits data) is {0:.5}%'.format(score[1]*100))

Prediction accuracy (digits data) is 95.111%


### Try on Bank Marketing data
Because this data has only 2 classes, we changed our Neural Network's architecture.

In [19]:
batch_size = 1
num_classes = 2
epochs = 3
input_shape = X_train_2.shape[1]

model = Sequential()
model.add(Dense(30, activation='sigmoid', input_dim=input_shape))
model.add(Dense(num_classes, activation='sigmoid'))
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.RMSprop(rho=0.9),
              metrics=['accuracy'])

Train Neural Network and test

In [20]:
model.fit(X_train_2, y_v_train_2, batch_size=batch_size, epochs=epochs, verbose=1)

score = model.evaluate(X_test_2, y_v_test_2, verbose=0)
print('Prediction accuracy (Bank Marketing data) is {0:.5}%'.format(score[1]*100))

Epoch 1/3
Epoch 2/3
Epoch 3/3
Prediction accuracy (Bank Marketing data) is 91.201%


## 3. Implement our own RMSprop optimizer

### Creating and initialing W, b

Here we added two more matrixes called W_v and b_v, to store momentum, which would affect the gradient direction of this step

In [21]:
def setup_and_init_weights(nn_structure):
    W = {} # creating a dictionary i.e. a set of key: value pairs
    b = {}
    W_v = {}
    b_v = {}
    
    for l in range(1, len(nn_structure)):
        W[l] = r.random_sample((nn_structure[l], nn_structure[l-1])) # Return “continuous uniform” random floats in the half-open interval [0.0, 1.0). 
        b[l] = r.random_sample((nn_structure[l],))
        # initalize as zero
        W_v[l] = np.zeros((nn_structure[l], nn_structure[l-1]))
        b_v[l] = np.zeros((nn_structure[l],))
    return W, b, W_v, b_v

### The Back Propagation Algorithm

Here we used SGD instead of BGD to make training step more quickly. And we added RMSprop optimizer.

In [22]:
def train_nn(nn_structure, X, y, iter_num=3000, alpha=0.001, lamb = 0.01, momentum=0.9):
    W, b, W_v, b_v = setup_and_init_weights(nn_structure)
    cnt = 0
    N = len(y)
    avg_cost_func = []
    print('Starting gradient descent for {} iterations'.format(iter_num))
    while cnt < iter_num:
        if cnt%1000 == 0:
            print('Iteration {} of {}'.format(cnt, iter_num))
            
        tri_W, tri_b = init_tri_values(nn_structure)
        avg_cost = 0
        # SGD
        i = random.randint(0, N-1)
        delta = {}
        # perform the feed forward pass and return the stored a and z values, to be used in the
        # gradient descent step
        a, z = feed_forward(X[i, :], W, b)
        # loop from nl-1 to 1 backpropagating the errors
        for l in range(len(nn_structure), 0, -1):
            if l == len(nn_structure):
                delta[l] = calculate_out_layer_delta(y[i,:], a[l], z[l])
                avg_cost += np.linalg.norm((y[i,:]-a[l]))
            else:
                if l > 1:
                    delta[l] = calculate_hidden_delta(delta[l+1], W[l], z[l])
                tri_W[l] += np.dot(delta[l+1][:,np.newaxis], np.transpose(a[l][:,np.newaxis]))# np.newaxis increase the number of dimensions
                tri_b[l] += delta[l+1]
        # perform the gradient descent step for the weights in each layer
        for l in range(len(nn_structure) - 1, 0, -1):
            W_v[l] *= 0.99
            W_v[l] += (1 - 0.99) * tri_W[l] * tri_W[l]
        
            b_v[l] *= 0.99
            b_v[l] += (1 - 0.99) * tri_b[l] * tri_b[l]

            W[l] -= alpha * 1.0 * tri_W[l] / (np.sqrt(W_v[l]) + 0.5)
            b[l] -= alpha * 1.0 * tri_b[l] / (np.sqrt(b_v[l]) + 0.5)

        # complete the average cost calculation
        avg_cost = 1.0 * avg_cost
        avg_cost_func.append(avg_cost)
        cnt += 1
    return W, b, avg_cost_func


def predict_y(W, b, X, n_layers):
    N = X.shape[0]
    y = np.zeros((N,))
    for i in range(N):
        a, z = feed_forward(X[i, :], W, b)
        y[i] = np.argmax(a[n_layers])
    return y

### Run the Neural Network

The architecture and hyperparameters are the same as before. Firstly we tried on digits data.

In [23]:
nn_structure = [64, 30, 10]
    
# train the NN
W, b, avg_cost_func = train_nn(nn_structure, X_train, y_v_train, 30000, 0.25, 0.01, 0.9)

Starting gradient descent for 30000 iterations
Iteration 0 of 30000
Iteration 1000 of 30000
Iteration 2000 of 30000
Iteration 3000 of 30000
Iteration 4000 of 30000
Iteration 5000 of 30000
Iteration 6000 of 30000
Iteration 7000 of 30000
Iteration 8000 of 30000
Iteration 9000 of 30000
Iteration 10000 of 30000
Iteration 11000 of 30000
Iteration 12000 of 30000
Iteration 13000 of 30000
Iteration 14000 of 30000
Iteration 15000 of 30000
Iteration 16000 of 30000
Iteration 17000 of 30000
Iteration 18000 of 30000
Iteration 19000 of 30000
Iteration 20000 of 30000
Iteration 21000 of 30000
Iteration 22000 of 30000
Iteration 23000 of 30000
Iteration 24000 of 30000
Iteration 25000 of 30000
Iteration 26000 of 30000
Iteration 27000 of 30000
Iteration 28000 of 30000
Iteration 29000 of 30000


### Check the accuracy

In [24]:
y_pred = predict_y(W, b, X_test, 3)
print('Prediction accuracy (digits data) is {0:.5}%'.format(accuracy_score(y_test, y_pred) * 100))

Prediction accuracy (digits data) is 96.0%


### Try on Bank Marketing data

In [25]:
nn_structure = [17, 30, 2]
    
# train the NN
W, b, avg_cost_func = train_nn(nn_structure, X_train_2, y_v_train_2, 30000, 0.25, 0.01)

y_pred = predict_y(W, b, X_test_2, 3)
print('Prediction accuracy (Bank Marketing data) is {0:.5}%'.format(accuracy_score(y_test_2, y_pred) * 100))

Starting gradient descent for 30000 iterations
Iteration 0 of 30000
Iteration 1000 of 30000
Iteration 2000 of 30000
Iteration 3000 of 30000
Iteration 4000 of 30000
Iteration 5000 of 30000
Iteration 6000 of 30000
Iteration 7000 of 30000
Iteration 8000 of 30000
Iteration 9000 of 30000
Iteration 10000 of 30000
Iteration 11000 of 30000
Iteration 12000 of 30000
Iteration 13000 of 30000
Iteration 14000 of 30000
Iteration 15000 of 30000
Iteration 16000 of 30000
Iteration 17000 of 30000
Iteration 18000 of 30000
Iteration 19000 of 30000
Iteration 20000 of 30000
Iteration 21000 of 30000
Iteration 22000 of 30000
Iteration 23000 of 30000
Iteration 24000 of 30000
Iteration 25000 of 30000
Iteration 26000 of 30000
Iteration 27000 of 30000
Iteration 28000 of 30000
Iteration 29000 of 30000
Prediction accuracy (Bank Marketing data) is 90.968%


## 4. Try AdaGrad optimizer based on Keras

We tried another optimizer called AdaGrad. Firstly, we used Keras to check whether it would receive some improvement of accuracy

### Build Neural Network

In [41]:
import keras
from keras.models import Sequential
from keras.layers import Dense


# hyperparameter. They are the same as before 
batch_size = 1
num_classes = 10
epochs = 5
input_shape = X_train.shape[1]

model = Sequential()
model.add(Dense(30, activation='sigmoid', input_dim=input_shape))
model.add(Dense(num_classes, activation='sigmoid'))
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adagrad(),
              metrics=['accuracy'])

### Train Neural Network

In [42]:
model.fit(X_train, y_v_train, batch_size=batch_size, epochs=epochs, verbose=1)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x1a336046d0>

### Check the accuracy on test data

In [43]:
score = model.evaluate(X_test, y_v_test, verbose=0)
print('Prediction accuracy (digits data) is {0:.5}%'.format(score[1]*100))

Prediction accuracy (digits data) is 93.111%


### Try on Bank Marketing data

Because this data has only 2 classes, we changed our Neural Network's architecture.

In [29]:
batch_size = 1
num_classes = 2
epochs = 5
input_shape = X_train_2.shape[1]

model = Sequential()
model.add(Dense(30, activation='sigmoid', input_dim=input_shape))
model.add(Dense(num_classes, activation='sigmoid'))
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adagrad(),
              metrics=['accuracy'])

Train Neural Network and test

In [30]:
model.fit(X_train_2, y_v_train_2, batch_size=batch_size, epochs=epochs, verbose=1)

score = model.evaluate(X_test_2, y_v_test_2, verbose=0)
print('Prediction accuracy (Bank Marketing data) is {0:.5}%'.format(score[1]*100))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Prediction accuracy (Bank Marketing data) is 91.658%


## 5. Implement our own Optimizer with AdaGrad


### Change the Back Propagation Algorithm

Here we changed the code in gradient descent step to AdaGrad optimizer

In [31]:
def train_nn(nn_structure, X, y, iter_num=3000, alpha=0.25, lamb = 0.01):
    W, b, W_v, b_v = setup_and_init_weights(nn_structure)
    cnt = 0
    N = len(y)
    avg_cost_func = []
    print('Starting gradient descent for {} iterations'.format(iter_num))
    while cnt < iter_num:
        if cnt%1000 == 0:
            print('Iteration {} of {}'.format(cnt, iter_num))
        tri_W, tri_b = init_tri_values(nn_structure)
        avg_cost = 0
        i = random.randint(0, N-1)
        delta = {}
        # perform the feed forward pass and return the stored a and z values, to be used in the
        # gradient descent step
        a, z = feed_forward(X[i, :], W, b)
        # loop from nl-1 to 1 backpropagating the errors
        for l in range(len(nn_structure), 0, -1):
            if l == len(nn_structure):
                delta[l] = calculate_out_layer_delta(y[i,:], a[l], z[l])
                avg_cost += np.linalg.norm((y[i,:]-a[l]))
            else:
                if l > 1:
                    delta[l] = calculate_hidden_delta(delta[l+1], W[l], z[l])
                tri_W[l] += np.dot(delta[l+1][:,np.newaxis], np.transpose(a[l][:,np.newaxis]))# np.newaxis increase the number of dimensions
                tri_b[l] += delta[l+1]
        # perform the gradient descent step for the weights in each layer
        for l in range(len(nn_structure) - 1, 0, -1):
            W_v[l] += tri_W[l]*tri_W[l]         
            b_v[l] += tri_b[l]*tri_b[l]
            
            W[l] -= alpha * tri_W[l] / (np.sqrt(W_v[l]) + 1e-7)
            b[l] -= alpha * tri_b[l] / (np.sqrt(b_v[l]) + 1e-7)
        # complete the average cost calculation
        avg_cost = 1.0 * avg_cost
        avg_cost_func.append(avg_cost)
        cnt += 1
    return W, b, avg_cost_func


def predict_y(W, b, X, n_layers):
    N = X.shape[0]
    y = np.zeros((N,))
    for i in range(N):
        a, z = feed_forward(X[i, :], W, b)
        y[i] = np.argmax(a[n_layers])
    return y

### Run the Neural Network

The architecture and hyperparameters are the same as before.

In [32]:
nn_structure = [64, 30, 10]
    
# train the NN
W, b, avg_cost_func = train_nn(nn_structure, X_train, y_v_train, 30000, 0.25, 0.01)

Starting gradient descent for 30000 iterations
Iteration 0 of 30000
Iteration 1000 of 30000
Iteration 2000 of 30000
Iteration 3000 of 30000
Iteration 4000 of 30000
Iteration 5000 of 30000
Iteration 6000 of 30000
Iteration 7000 of 30000
Iteration 8000 of 30000
Iteration 9000 of 30000
Iteration 10000 of 30000
Iteration 11000 of 30000
Iteration 12000 of 30000
Iteration 13000 of 30000
Iteration 14000 of 30000
Iteration 15000 of 30000
Iteration 16000 of 30000
Iteration 17000 of 30000
Iteration 18000 of 30000
Iteration 19000 of 30000
Iteration 20000 of 30000
Iteration 21000 of 30000
Iteration 22000 of 30000
Iteration 23000 of 30000
Iteration 24000 of 30000
Iteration 25000 of 30000
Iteration 26000 of 30000
Iteration 27000 of 30000
Iteration 28000 of 30000
Iteration 29000 of 30000


### Check the accuracy on test data

In [33]:
y_pred = predict_y(W, b, X_test, 3)
print('Prediction accuracy (digits data) is {0:.5}%'.format(accuracy_score(y_test, y_pred) * 100))

Prediction accuracy (digits data) is 96.889%


### Try on Bank Marketing data

In [34]:
nn_structure = [17, 30, 2]
    
# train the NN
W, b, avg_cost_func = train_nn(nn_structure, X_train_2, y_v_train_2, 30000, 0.25, 0.01)

y_pred = predict_y(W, b, X_test_2, 3)
print('Prediction accuracy (Bank Marketing data) is {0:.5}%'.format(accuracy_score(y_test_2, y_pred) * 100))

Starting gradient descent for 30000 iterations
Iteration 0 of 30000
Iteration 1000 of 30000
Iteration 2000 of 30000
Iteration 3000 of 30000
Iteration 4000 of 30000
Iteration 5000 of 30000
Iteration 6000 of 30000
Iteration 7000 of 30000
Iteration 8000 of 30000
Iteration 9000 of 30000
Iteration 10000 of 30000
Iteration 11000 of 30000
Iteration 12000 of 30000
Iteration 13000 of 30000
Iteration 14000 of 30000
Iteration 15000 of 30000
Iteration 16000 of 30000
Iteration 17000 of 30000
Iteration 18000 of 30000
Iteration 19000 of 30000
Iteration 20000 of 30000
Iteration 21000 of 30000
Iteration 22000 of 30000
Iteration 23000 of 30000
Iteration 24000 of 30000
Iteration 25000 of 30000
Iteration 26000 of 30000
Iteration 27000 of 30000
Iteration 28000 of 30000
Iteration 29000 of 30000
Prediction accuracy (Bank Marketing data) is 91.541%
