In [1]:
import numpy as np
from random import randint
from sklearn.utils import shuffle
from sklearn.preprocessing import MinMaxScaler

In [2]:
train_labels = []
train_samples = []

-As motivation for this data, let’s suppose that an experimental drug was tested on individuals ranging from age -13 to 100 in a clinical trial. The trial had 2100 participants. Half of the participants were under 65 years -old, and the other half was 65 years of age or older.

The trial showed that around 95% of patients 65 or older experienced side effects from the drug, and around 95% of patients under 65 experienced no side effects, generally showing that elderly individuals were more likely to experience side effects.

Ultimately, we want to build a model to tell us whether or not a patient will experience side effects solely based on the patient's age. The judgement of the model will be based on the training data.

Note that with the simplicity of the data along with the conclusions drawn from it, a neural network may be overkill, but understand this is just to first get introduced to working with data for deep learning, and later, we'll be making use of more advanced data sets.

The block of code below shows how to generate this dummy data.



In [3]:
   for i in range(60):
    # The ~5% of younger individuals who did experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(1)
#     print(train_samples)

    # The ~5% of older individuals who did not experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(0)

for i in range(1000):
    # The ~95% of younger individuals who did not experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(0)

    # The ~95% of older individuals who did experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(1)
    

Data Processing
We now convert both lists into numpy arrays due to what we discussed the fit() function expects, and we then shuffle the arrays to remove any order that was imposed on the data during the creation process.

In [4]:
train_labels = np.array(train_labels)
train_samples = np.array(train_samples)
train_labels, train_samples = shuffle(train_labels, train_samples)

In this form, we now have the ability to pass the data to the model because it is now in the required format, however, before doing that, we'll first scale the data down to a range from 0 to 1.

We'll use scikit-learn’s MinMaxScaler class to scale all of the data down from a scale ranging from 13 to 100 to be on a scale from 0 to 1.

In [5]:
scaler = MinMaxScaler(feature_range=(0,1))
scaled_train_samples = scaler.fit_transform(train_samples.reshape(-1,1))

In [6]:
type(scaled_train_samples)

numpy.ndarray

In [7]:
scaled_train_samples.shape

(2120, 1)

In [8]:
train_labels.shape

(2120,)

# Create An Artificial Neural Network With TensorFlow's Keras API

In [9]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.optimizers import Adadelta
from tensorflow.keras.metrics import categorical_crossentropy

In [10]:
model = Sequential([
    Dense(units=32, input_shape=(1,), activation='relu'),
    Dense(units=64, activation='relu'),
#     Dense(units=2, activation='sigmoid')
    Dense(units=4, activation='softmax')
])    

model is an instance of a Sequential object. A tf.keras.Sequential model is a linear stack of layers. It accepts a list, and each element in the list should be a layer.

As you can see, we have passed a list of layers to the Sequential constructor. Let's go through each of the layers in this list now.
Note, if you don’t explicitly set an activation function, then Keras will use the linear activation function.

First Hidden Layer Our first layer is a Dense layer. This type of layer is our standard fully-connected or densely-connected neural network layer. The first required parameter that the Dense layer expects is the number of neurons or units the layer has, and we’re arbitrarily setting this to 32.

Additionally, the model needs to know the shape of the input data. For this reason, we specify the shape of the input data in the first hidden layer in the model (and only this layer). The parameter called input_shape is how we specify this.

As discussed, we’ll be training our network on the data that we generated and processed in the previous episode, and recall, this data is one-dimensional. The input_shape parameter expects a tuple of integers that matches the shape of the input data, so we correspondingly specify (1,) as the input_shape of our one-dimensional data.

You can think of the way we specify the input_shape here as acting as an implicit input layer. The input layer of a neural network is the underlying raw data itself, therefore we don't create an explicit input layer. This first Dense layer that we're working with now is actually the first hidden layer.

Lastly, an optional parameter that we’ll set for the Dense layer is the activation function to use after this layer. We’ll use the popular choice of relu.

2nd hidden layer also the same as the above menioned Output Layer Lastly, we specify the output layer. This layer is also a Dense layer, and it will have 2 neurons. This is because we have two possible outputs: either a patient experienced side effects, or the patient did not experience side effects.

This time, the activation function we’ll use is softmax, which will give us a probability distribution among the possible outputs.

In [11]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 32)                64        
                                                                 
 dense_1 (Dense)             (None, 64)                2112      
                                                                 
 dense_2 (Dense)             (None, 4)                 260       
                                                                 
Total params: 2,436
Trainable params: 2,436
Non-trainable params: 0
_________________________________________________________________


In [12]:
model.compile(optimizer=Adam(learning_rate=0000.1),loss='sparse_categorical_crossentropy', metrics=['accuracy'])

This function configures the model for training and expects a number of parameters. First, we specify the optimizer Adam. Adam accepts an optional parameter learning_rate, which we’ll set to 0.0001.

The next parameter we specify is loss. We’ll be using sparse_categorical_crossentropy, given that our labels are in integer format.

Note that when we have only two classes, we could instead configure our output layer to have only one output, rather than two, and use binary_crossentropy as our loss, rather than categorical_crossentropy. Both options work equally well and achieve the exact same result.

With binary_crossentropy, however, the last layer would need to use sigmoid, rather than softmax, as its activation function.

Moving on, the last parameter we specify in compile() is metrics. This parameter expects a list of metrics that we’d like to be evaluated by the model during training and testing. We’ll set this to a list that contains the string ‘accuracy’.

In [13]:
model.fit(x=scaled_train_samples, y=train_labels, batch_size=10, epochs=40, verbose=2)

Epoch 1/40
212/212 - 1s - loss: 0.3603 - accuracy: 0.8575 - 586ms/epoch - 3ms/step
Epoch 2/40
212/212 - 0s - loss: 0.2978 - accuracy: 0.9071 - 179ms/epoch - 842us/step
Epoch 3/40
212/212 - 0s - loss: 0.3022 - accuracy: 0.9052 - 184ms/epoch - 866us/step
Epoch 4/40
212/212 - 0s - loss: 0.2947 - accuracy: 0.9019 - 183ms/epoch - 861us/step
Epoch 5/40
212/212 - 0s - loss: 0.3037 - accuracy: 0.9099 - 186ms/epoch - 875us/step
Epoch 6/40
212/212 - 0s - loss: 0.2920 - accuracy: 0.9075 - 179ms/epoch - 842us/step
Epoch 7/40
212/212 - 0s - loss: 0.2735 - accuracy: 0.9160 - 182ms/epoch - 856us/step
Epoch 8/40
212/212 - 0s - loss: 0.2677 - accuracy: 0.9231 - 176ms/epoch - 828us/step
Epoch 9/40
212/212 - 0s - loss: 0.2561 - accuracy: 0.9236 - 183ms/epoch - 861us/step
Epoch 10/40
212/212 - 0s - loss: 0.2595 - accuracy: 0.9179 - 178ms/epoch - 837us/step
Epoch 11/40
212/212 - 0s - loss: 0.2574 - accuracy: 0.9241 - 181ms/epoch - 851us/step
Epoch 12/40
212/212 - 0s - loss: 0.2803 - accuracy: 0.9123 - 187m

<keras.callbacks.History at 0x124bb9d8fa0>

we specify verbose=2. This just specifies how much output to the console we want to see during each epoch of training. The verbosity levels range from 0 to 2, so we’re getting the most verbose output.

When we call fit() on the model, the model trains, and we get this output.

What Is A Validation Set?

Recall that we previously built a training set on which we trained our model. With each epoch that our model is trained, the model will continue to learn the features and characteristics of the data in this training set.

The hope is that later we can take this model, apply it to new data, and have the model accurately predict on data that it hasn’t seen before based solely on what it learned from the training set.

Now, let’s discuss where the addition of a validation set comes into play.

Before training begins, we can choose to remove a portion of the training set and place it in a validation set. Then, during training, the model will train only on the training set, and it will validate by evaluating the data in the validation set.

Essentially, the model is learning the features of the data in the training set, taking what it's learned from this data, and then predicting on the validation set. During each epoch, we will see not only the loss and accuracy results for the training set, but also for the validation set.

This allows us to see how well the model is generalizing on data it wasn’t trained on because, recall, the validation data should not be part of the training data.

This also helps us see whether or not the model is overfitting. Overfitting occurs when the model only learns the specifics of the training data and is unable to generalize well on data that it wasn’t trained on.

In [14]:
model.fit(x=scaled_train_samples, y=train_labels,validation_split = 0.2, batch_size=10, epochs=40, verbose=2)

Epoch 1/40
170/170 - 1s - loss: 0.2523 - accuracy: 0.9316 - val_loss: 0.2494 - val_accuracy: 0.9269 - 623ms/epoch - 4ms/step
Epoch 2/40
170/170 - 0s - loss: 0.2383 - accuracy: 0.9328 - val_loss: 0.3970 - val_accuracy: 0.8821 - 265ms/epoch - 2ms/step
Epoch 3/40
170/170 - 0s - loss: 0.2516 - accuracy: 0.9334 - val_loss: 0.3472 - val_accuracy: 0.9033 - 272ms/epoch - 2ms/step
Epoch 4/40
170/170 - 0s - loss: 0.2393 - accuracy: 0.9340 - val_loss: 0.2525 - val_accuracy: 0.9151 - 266ms/epoch - 2ms/step
Epoch 5/40
170/170 - 0s - loss: 0.2413 - accuracy: 0.9328 - val_loss: 0.2630 - val_accuracy: 0.9363 - 264ms/epoch - 2ms/step
Epoch 6/40
170/170 - 0s - loss: 0.2472 - accuracy: 0.9322 - val_loss: 0.2636 - val_accuracy: 0.9269 - 266ms/epoch - 2ms/step
Epoch 7/40
170/170 - 0s - loss: 0.2311 - accuracy: 0.9346 - val_loss: 0.2823 - val_accuracy: 0.9057 - 346ms/epoch - 2ms/step
Epoch 8/40
170/170 - 0s - loss: 0.2375 - accuracy: 0.9357 - val_loss: 0.2691 - val_accuracy: 0.9127 - 266ms/epoch - 2ms/step


<keras.callbacks.History at 0x124bbc96b20>

# Neural Network Predictions With TensorFlow's Keras AP

We’ll create a test set in the same fashion for which we created the training set. In general, the test set should always be processed in the same way as the training set.

We won’t go step-by-step over the code that generates and processes the test data below, as it has already been covered in detail where we generated the training data,

In [15]:
test_labels =  []
test_samples = []

for i in range(10):
    # The 5% of younger individuals who did experience side effects
    random_younger = randint(13,64)
    test_samples.append(random_younger)
    test_labels.append(1)
    
    # The 5% of older individuals who did not experience side effects
    random_older = randint(65,100)
    test_samples.append(random_older)
    test_labels.append(0)

for i in range(200):
    # The 95% of younger individuals who did not experience side effects
    random_younger = randint(13,64)
    test_samples.append(random_younger)
    test_labels.append(0)
    
    # The 95% of older individuals who did experience side effects
    random_older = randint(65,100)
    test_samples.append(random_older)
    test_labels.append(1)

In [16]:
test_labales =np.array(test_labels)
test_samples =np.array(test_samples)
test_labales,test_samples=shuffle(test_labales, test_samples)

In [17]:
scaled_test_samples = scaler.fit_transform(train_samples.reshape(-1,1))

To this function, we pass in the test samples x, specify a batch_size, and specify which level of verbosity we want from log messages during prediction generation. The output from the predictions won't be relevant for us, so we're setting verbose=0 for no output.

Note that, unlike with training and validation sets, we do not pass the labels of the test set to the model during the inference stage.

To see what the model's predictions look like, we can iterate over them and print them out.

In [24]:
type(predictions)

numpy.ndarray

In [20]:
predictions = model.predict(x=scaled_test_samples, batch_size=10,verbose=0)

In [21]:
for i in predictions:
    print(i)

[7.7140898e-02 9.2285907e-01 5.7489494e-08 5.4387307e-08]
[7.7140898e-02 9.2285907e-01 5.7489494e-08 5.4387307e-08]
[7.7140898e-02 9.2285907e-01 5.7489494e-08 5.4387307e-08]
[9.6879399e-01 3.1206056e-02 3.6207981e-22 6.5617480e-22]
[9.6879399e-01 3.1206056e-02 3.6207981e-22 6.5617480e-22]
[7.7140898e-02 9.2285907e-01 5.7489494e-08 5.4387307e-08]
[9.6879399e-01 3.1206056e-02 3.6207981e-22 6.5617480e-22]
[7.7140898e-02 9.2285907e-01 5.7489494e-08 5.4387307e-08]
[7.7140898e-02 9.2285907e-01 5.7489494e-08 5.4387307e-08]
[9.6879399e-01 3.1206056e-02 3.6207981e-22 6.5617480e-22]
[9.6879399e-01 3.1206056e-02 3.6207981e-22 6.5617480e-22]
[7.7140898e-02 9.2285907e-01 5.7489494e-08 5.4387307e-08]
[7.7140898e-02 9.2285907e-01 5.7489494e-08 5.4387307e-08]
[7.7140898e-02 9.2285907e-01 5.7489494e-08 5.4387307e-08]
[7.7140898e-02 9.2285907e-01 5.7489494e-08 5.4387307e-08]
[7.7140898e-02 9.2285907e-01 5.7489494e-08 5.4387307e-08]
[9.6879399e-01 3.1206056e-02 3.6207981e-22 6.5617480e-22]
[9.6879399e-01

In [22]:
rounded_predictions= np.argmax(predictions,axis=1)

In [23]:
for i in rounded_predictions:
    print(i)

1
1
1
0
0
1
0
1
1
0
0
1
1
1
1
1
0
0
1
0
1
1
0
1
0
1
0
1
0
0
0
0
0
1
0
0
0
1
0
1
0
0
0
0
1
0
0
1
1
0
1
0
1
1
1
1
0
0
1
1
0
1
1
0
0
0
0
1
1
1
0
1
0
0
0
1
1
1
1
0
1
0
0
0
0
0
1
1
1
1
1
1
0
0
0
1
0
0
0
1
0
1
0
0
0
1
1
1
0
0
1
1
0
0
0
1
1
1
0
1
0
1
1
0
1
1
1
0
0
0
0
0
1
0
0
0
1
1
0
0
0
0
1
0
1
0
1
1
1
0
1
1
0
1
1
1
1
0
1
1
0
0
0
1
1
1
0
1
0
1
0
1
1
0
1
1
1
0
0
0
1
0
0
0
1
1
1
0
1
0
1
0
0
0
1
0
1
1
1
0
1
1
1
0
1
0
0
1
1
1
1
1
1
0
0
1
0
0
0
1
0
0
0
0
0
0
1
1
1
1
0
1
0
1
0
1
0
1
0
1
1
0
0
1
1
0
1
0
1
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
1
0
1
0
0
0
0
1
1
0
0
1
1
0
1
0
0
1
0
1
1
0
0
0
0
0
1
1
0
1
0
1
1
0
1
1
0
1
1
1
1
0
0
1
1
1
1
1
1
1
1
1
1
0
1
0
0
1
1
0
1
0
1
0
0
0
1
0
0
0
0
0
0
0
0
1
1
0
1
0
0
1
0
1
1
0
0
0
1
1
1
1
0
1
0
0
0
0
0
1
1
1
0
1
1
0
0
0
0
0
0
1
1
1
0
1
1
0
1
1
1
1
0
1
1
0
1
1
1
0
1
0
1
1
1
0
1
0
1
0
0
0
0
1
1
0
0
0
0
1
0
0
1
1
0
0
1
0
1
1
1
1
1
0
1
0
1
1
1
0
1
1
1
0
1
0
1
0
1
1
0
0
0
0
1
1
1
0
1
0
1
1
0
1
0
1
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
0
1
0
1
0
1
0
0
1
1
1
1
0
1
0
0
1
0
1
1


From the printed prediction results, we can observe the underlying predictions from the model, however, we cannot judge how accurate these predictions are just by looking at the predicted output.

If we have corresponding labels for the test set, (for which, in this case, we do), then we can compare these true labels to the predicted labels to judge the accuracy of the model's evaluations. We'll see how to visualize this using a tool called a confusion matrix

Create A Confusion Matrix For Neural Network Predictions
We’ll now demonstrate how to create a confusion matrix, which will aid us in being able to visually observe how well a neural network is predicting during inference.

We’ll continue working with the predictions we obtained from the tf.keras.Sequential model

As we showed how to use a trained model for inference on new data in a test set it hasn’t seen before. Also mentioned before, we had the labels for the test set, but we didn’t provide these labels to the network.

Additionally, we were able to see the values that the model was predicting for each of the samples in the test set by just observing the predictions themselves.