# **Tutorial for A Fully Modular Numpy-Restricted Artificial Neural Network with Adam/AdamW Optimization, Element-wise, and Input-wise Dropouts**
Author: Patrick Erickson

---



The following notebook details the Artificial Neural Network I had made from scratch, and details how to use it. We will primarily be using a breast cancer screening dataset for the majority of our model demonstrations. Note that this data is almost completely linearly seperable, and neural networks might be too complex to gain any more predictive power. This is merely for demonstrations. The model can be found on the hyperlink of the README. If wou want more information about this dataset, look at my EDA article under my logistic regression github.

### For Classification

**Step 1**: Load the following imports:

In [1]:
import sys
scripts_path = r"../"

if scripts_path not in sys.path:
    sys.path.append(scripts_path)
    
from NeuralNetworkScripts.NeuralNetwork import NeuralNetwork
from NeuralNetworkScripts.Layer import FullyConnectedLayer
from NeuralNetworkScripts.DataHandler import DataHelper
import numpy as np
import pandas as pd

**Step 2**: We will load in the dataset and do some preliminary cleaning.

In [2]:
np.random.seed(100)
df = (pd.read_csv("../datasets/breast_cancer.csv")).dropna()

columns = df.columns.to_list()
df.replace({'Class': {2: 0, 4: 1}}, inplace=True)
df = df.to_numpy()
df = df.astype(int)
split = np.split(df, [9], axis=1)
dataframe = split[0]
labels = split[1]

**Step 3:** We need to split, standardize, and one-hot encode our labels for model testing. We can do that with the following Data Helper functions:

In [3]:
labels,_ = DataHelper.toOneHot(labels)

trainSet,trainLabels,testSet,testLabels = DataHelper.trainTestSplit(dataframe,labels,testSize=.2,trainSize=.8, randomState=None)

Standardizer = DataHelper.standardizer(trainSet)
trainSet = DataHelper.standardizeCompute(trainSet,Standardizer)
testSet = DataHelper.standardizeCompute(testSet,Standardizer)

Note that if specified, the toOneHot will also give us the order of the classfication heads.

**Step 4**: Construct the model. Notice that this is the same thing as logistic regression, with the only difference that softmax splits logistic binary outcomes into 2 seperate categories.

**Therefore, this is essentially logistic regression, represented with 2 categorical variables instead of a binary variable with a decision threshold:**

In [4]:
model = NeuralNetwork(
                output = FullyConnectedLayer(numNodes=2,activation='softmax')
                )

We can look at how our model does initially:

In [5]:
loss, guesses = model.evaluate(testSet,testLabels)

predictedTrain = np.argmax(guesses, axis=1)
trueTrain = np.argmax(testLabels, axis=1)
trainAccuracy = np.mean(predictedTrain == trueTrain) * 100

print("\n Initial Loss:", loss)
print("Accuracy: {:.2f}%".format(trainAccuracy))
print(predictedTrain)
print(trueTrain)


 Initial Loss: 2.004762491639277
Accuracy: 3.65%
[1 1 1 1 1 0 1 0 1 0 0 1 1 0 0 1 0 1 1 1 0 1 0 1 1 1 0 0 0 1 1 0 1 0 0 1 1
 1 1 1 1 1 0 1 0 1 1 1 1 0 0 0 1 1 1 1 0 0 1 1 0 0 0 0 0 1 1 1 1 1 0 1 0 1
 1 1 0 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 0 0 0 1 0 0 1
 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1]
[0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 1 1 0 0
 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 0
 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0
 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0]


We can see that without any sort of training the model can't discern which value is which. Something interesting to note, however, is that the data is so well seperated that the model already has some very high seperation ability. If we were to switch classification heads in this example, we would get a really high accuracy. Let us save this model so we can come back to the original weights, after doing a little bit of training:

In [6]:
model.saveModel('tutorialModel')

We can reload our model as such:

In [7]:
model1 = NeuralNetwork(model='tutorialModel')

We will train our model and report our predictions as such:

In [8]:
model1.train(trainSet, trainLabels) 
loss1, guesses1 = model1.evaluate(testSet,testLabels)

predictedTest = np.argmax(guesses1, axis=1)
trueTest = np.argmax(testLabels, axis=1)
testAccuracy = np.mean(predictedTest == trueTest) * 100

print("\n Testing Loss:", loss1)
print("Accuracy: {:.2f}%".format(testAccuracy))
print("Predicted values:\n" , predictedTest)
print("True test labels:\n ", trueTrain)

loss for epoch:  100 : 0.07723121923789908
loss for epoch:  200 : 0.08064279365232271
loss for epoch:  300 : 0.08066605935007758
loss for epoch:  400 : 0.07786792053886489
loss for epoch:  500 : 0.07859150750233064
loss for epoch:  600 : 0.08050427394562472
loss for epoch:  700 : 0.08089962734148001
loss for epoch:  800 : 0.0809128803583
loss for epoch:  900 : 0.08144508963522622
loss for epoch:  1000 : 0.0787870045531967

 Testing Loss: 0.07165134527756539
Accuracy: 97.08%
Predicted values:
 [0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 1 0 0
 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 1 0 1 0
 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0
 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0]
True test labels:
  [0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 1 1 0 0
 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 0
 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 

We can see that the model now correctly classifies our labels with a fairly good probability. However, notice that the model stayed at roughly the same loss the entire time. This signals that we are doing a lot more training than we need to. We can specify the hyperparameters we want to change to reduce training time with similar values. We can grab from our model of original weights:

In [9]:
model2 = NeuralNetwork(model='tutorialModel')
model2.train(trainSet, trainLabels , epochs = 10, learningRate=.01) 
loss2, guesses2 = model2.evaluate(testSet,testLabels)

predicted2 = np.argmax(guesses2, axis=1)
trueTest = np.argmax(testLabels, axis=1)
accuracy2 = np.mean(predicted2 == trueTest) * 100

print("\n Model 2's Loss:", loss2)
print("Accuracy: {:.2f}%".format(accuracy2))
print("Predicted values:\n" , predicted2)
print("True test labels:\n ", trueTest)

loss for epoch:  1 : 0.5605352918596782
loss for epoch:  2 : 0.11589345490598545
loss for epoch:  3 : 0.09786853191487037
loss for epoch:  4 : 0.0906754686300652
loss for epoch:  5 : 0.08718543804011794
loss for epoch:  6 : 0.08139638979058879
loss for epoch:  7 : 0.08068651366497223
loss for epoch:  8 : 0.08291385213193285
loss for epoch:  9 : 0.08237392201745547
loss for epoch:  10 : 0.08103371731942934

 Model 2's Loss: 0.06565157594302315
Accuracy: 97.81%
Predicted values:
 [0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 1 1 0 0
 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 1 0 1 0
 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0
 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0]
True test labels:
  [0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 1 1 0 0
 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 0
 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0

As you can see, we get roughly the same testing accuracy with only 20 epochs and even a lower learning rate (.001). But what about Using a neural network instead of simple logistic regression? Well, we can test it out by doing the following:

In [10]:
multiLayerModel = NeuralNetwork(
                hidden2 = FullyConnectedLayer(numNodes=10,activation='ReLU'),
                output = FullyConnectedLayer(numNodes=2,activation='softmax')
                )

Here we create a model with 2 hidden layers of 10 nodes each. We specify our activation as ReLU. Let's look at the initial test:

In [11]:
lossNN, guessesNN = multiLayerModel.evaluate(testSet,testLabels)

predictedNN = np.argmax(guesses, axis=1)
trueNN = np.argmax(testLabels, axis=1)
accuracyNN = np.mean(predictedNN == trueNN) * 100

print("\n Initial Loss:", loss)
print("Accuracy: {:.2f}%".format(accuracyNN))
print(predictedNN)
print(trueNN)


 Initial Loss: 2.004762491639277
Accuracy: 3.65%
[1 1 1 1 1 0 1 0 1 0 0 1 1 0 0 1 0 1 1 1 0 1 0 1 1 1 0 0 0 1 1 0 1 0 0 1 1
 1 1 1 1 1 0 1 0 1 1 1 1 0 0 0 1 1 1 1 0 0 1 1 0 0 0 0 0 1 1 1 1 1 0 1 0 1
 1 1 0 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 0 0 0 1 0 0 1
 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1]
[0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 1 1 0 0
 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 0
 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0
 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0]


We can see that the extra layer added more variability. Let's see if we can get a better accuracy:

In [12]:
multiLayerModel.train(trainSet, trainLabels , epochs = 100, learningRate=.01) 
loss2, guesses2 = multiLayerModel.evaluate(testSet,testLabels)

predicted2 = np.argmax(guesses2, axis=1)
trueTest = np.argmax(testLabels, axis=1)
accuracy2 = np.mean(predicted2 == trueTest) * 100

print("\n Model 2's Loss:", loss2)
print("Accuracy: {:.2f}%".format(accuracy2))
print("Predicted values:\n" , predicted2)
print("True test labels:\n ", trueTest)

loss for epoch:  10 : 0.07967712560314037
loss for epoch:  20 : 0.06443768278229951
loss for epoch:  30 : 0.057603562713047646
loss for epoch:  40 : 0.043352506452312554
loss for epoch:  50 : 0.03306733031952119
loss for epoch:  60 : 0.03382662055128359
loss for epoch:  70 : 0.02576288279154249
loss for epoch:  80 : 0.02553480721125658
loss for epoch:  90 : 0.017964572001106075
loss for epoch:  100 : 0.012802245228397867

 Model 2's Loss: 0.19848948866276955
Accuracy: 97.81%
Predicted values:
 [0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 1 1 0 0
 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0
 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0
 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0]
True test labels:
  [0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 1 1 0 0
 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 0
 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1

We can play around with some regularization techniques:

In [13]:
multiLayerModel2 = NeuralNetwork(
                inputDropout=.01,
                hidden2 = FullyConnectedLayer(numNodes=10,activation='ReLU',dropout=.1),
                output = FullyConnectedLayer(numNodes=2,activation='softmax')
                )
multiLayerModel2.train(trainSet, trainLabels , epochs = 1000, learningRate=.01, loss='AdamW', weightDecay=.001) 
loss2, guesses2 = multiLayerModel2.evaluate(testSet,testLabels)

predicted2 = np.argmax(guesses2, axis=1)
trueTest = np.argmax(testLabels, axis=1)
accuracy2 = np.mean(predicted2 == trueTest) * 100

print("\n Model 2's Loss:", loss2)
print("Accuracy: {:.2f}%".format(accuracy2))
print("Predicted values:\n" , predicted2)
print("True test labels:\n ", trueTest)

loss for epoch:  100 : 0.07118403531023085
loss for epoch:  200 : 0.03209212796737963
loss for epoch:  300 : 0.07158275560650544
loss for epoch:  400 : 0.023710853600024555
loss for epoch:  500 : 0.07100867813611793
loss for epoch:  600 : 0.06156461314423168
loss for epoch:  700 : 0.07743677374999897
loss for epoch:  800 : 0.03992193661393943
loss for epoch:  900 : 0.039324094557690065
loss for epoch:  1000 : 0.04819933432216001

 Model 2's Loss: 0.24210974031518742
Accuracy: 97.08%
Predicted values:
 [0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 1 1 0 0
 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 1 0 1 0
 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0
 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0]
True test labels:
  [0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 1 1 0 0
 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 0
 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0

or a different activation, with more layers of differing nodes:

In [14]:
multiLayerModel3 = NeuralNetwork(
                inputDropout=.01,
                hidden1 = FullyConnectedLayer(numNodes=10,activation='tanh',dropout=.1),
                hidden2 = FullyConnectedLayer(numNodes=10,activation='tanh',dropout=.1),
                hidden3 = FullyConnectedLayer(numNodes=7,activation='tanh',dropout=.1),
                hidden4 = FullyConnectedLayer(numNodes=5,activation='tanh',dropout=.1),
                output = FullyConnectedLayer(numNodes=2,activation='softmax')
                )
multiLayerModel3.train(trainSet, trainLabels , epochs = 10090, learningRate=.0001, loss='AdamW', weightDecay=.1) 
loss2, guesses2 = multiLayerModel3.evaluate(testSet,testLabels)

predicted2 = np.argmax(guesses2, axis=1)
trueTest = np.argmax(testLabels, axis=1)
accuracy2 = np.mean(predicted2 == trueTest) * 100

print("\n Model 2's Loss:", loss2)
print("Accuracy: {:.2f}%".format(accuracy2))
print("Predicted values:\n" , predicted2)
print("True test labels:\n ", trueTest)

loss for epoch:  1009 : 0.09176907703504801
loss for epoch:  2018 : 0.061719116182119074
loss for epoch:  3027 : 0.07641179263352049
loss for epoch:  4036 : 0.05720140620517416
loss for epoch:  5045 : 0.08638890339198194
loss for epoch:  6054 : 0.0916573373993677
loss for epoch:  7063 : 0.04093082005058776
loss for epoch:  8072 : 0.042597118133121996
loss for epoch:  9081 : 0.023214693636770867
loss for epoch:  10090 : 0.0592108828641166

 Model 2's Loss: 0.0809297302755145
Accuracy: 96.35%
Predicted values:
 [0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 1 1 0 0
 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0
 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0
 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0]
True test labels:
  [0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 1 1 0 0
 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 0
 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0

### Regression

Let's take one of the features of our dataset, and see if we can predict its value:

In [15]:
newTrainSet = trainSet[:,:-1]
newTrainLabels = trainSet[:,-1:]
newTestSet = testSet[:,:-1]
newTestLabels = testSet[:,-1:]


Let's also use the entire set as the batch size for full gradient descent. We can specify a regression model as such:

In [21]:
multiLayerModel4 = NeuralNetwork(
                batchSize=len(newTrainSet),
                hidden1 = FullyConnectedLayer(numNodes=10,activation='sigmoid'),
                hidden2 = FullyConnectedLayer(numNodes=10,activation='sigmoid'),
                output = FullyConnectedLayer(numNodes=1,activation='mse')
                )

Let's see what our model predicts initially for the first value in our dataset:

In [22]:
guesses = multiLayerModel4.predict(newTestSet)
print("Test true Label:", newTestLabels[0])
print("Guess:",guesses[0])

Test true Label: [-0.35472957]
Guess: [-1.5362973]


We will now see what our model predicts after training:

In [23]:
multiLayerModel4.train(newTrainSet, newTrainLabels,epochs=100,learningRate=.01) 
guesses2 = multiLayerModel4.predict(newTestSet)
print("Test true Label:", newTestLabels[0])
print("Guess:",guesses2[0])

loss for epoch:  10 : 1.0558547563860399
loss for epoch:  20 : 0.7766654785231176
loss for epoch:  30 : 0.7469716364359363
loss for epoch:  40 : 0.8384351748352712
loss for epoch:  50 : 0.658487025089794
loss for epoch:  60 : 0.7493767981582926
loss for epoch:  70 : 0.8367609143865186
loss for epoch:  80 : 0.7444770863316438
loss for epoch:  90 : 0.6834451971105168
loss for epoch:  100 : 0.6225768136872939
Test true Label: [-0.35472957]
Guess: [-0.33379226]


**Note:** You can use both predict and evaluate for both classification and regression. For example:

In [24]:
loss2,guesses2 = multiLayerModel4.evaluate(newTestSet,newTestLabels)
print("Total Loss: ", loss)
print("Test true Label:", newTestLabels[0])
print("Guess:",guesses2[0])

Total Loss:  2.004762491639277
Test true Label: [-0.35472957]
Guess: [-0.33379226]


Much Closer!
**Note That these are the standardized values**. If you want to see the actual vs predicted, we can do the following with the Data handler class:

In [25]:
recombinedTest = np.c_[newTestSet,newTestLabels]
recombinedGuesses = np.c_[newTestSet,guesses2]

actualTest = DataHelper.unNormalize(recombinedTest, Standardizer)
actualGuess = DataHelper.unNormalize(recombinedGuesses, Standardizer)

print("Actual true value: " , actualTest[0,-1])
print("Guess Value: " , actualGuess[0,-1])

Actual true value:  1.0
Guess Value:  1.0364301399213287


# Conclusion
---

This was a short tutorial on how to use The Neural Network. When doing classification, do not forget to take the max of a row in order to get the correct prediction. Happy training!

# Other Features

---

### Custom Trained Handwritten Digit identifier: 

Try out drawer.py and see how well the model does that was trained in this neural network architecture with your own drawn numbers!