DATA PREPROCESSING

We will create our own daa example set with keras. the data consists of a clinical trial conducted on 2100 patients ranging from 
ages 13 to 100 with half the patients under 65 and the other half over 65 years of age.
We want to find the possibility of a patient experoencing side effects due to thier age

In [40]:
import numpy as np
from random import randint 
from sklearn.utils import shuffle
from sklearn.preprocessing import MinMaxScaler

In [41]:
train_labels = []
train_samples =[]

CREATING A RANDOM DATA

In [42]:
for i in range(50):
    #the 5% of younger individuals who did experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(1)

    #the 5% of older individuals who did not experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(0)

for i in range(1000):
    #the 95% of younger individuals who did not experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(0)

    #the 5% of older individuals who did experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(1)

In [43]:
#printing random samples that we created above(only first 5)
print(train_samples[:5])
#print total train_samples
print(len(train_samples))

[34, 75, 49, 85, 30]
2100


In [44]:
#printing train labels that we created above(only first 5)
print(train_labels[:5])
#print total train_labels
print(len(train_labels))

[1, 0, 1, 0, 1]
2100


In [45]:
#convert the train_labels and the train_samples intp an numpy array
train_labels = np.array(train_labels)
train_samples = np.array(train_samples)
#we can shuffle to randomize the data
train_labels, train_samples = shuffle(train_labels, train_samples)

In [46]:
#we have to convert the data between 0 and 1 otherwise we create a bias
scaler = MinMaxScaler(feature_range=(0,1))
scaled_train_samples= scaler.fit_transform(train_samples.reshape(-1,1))
scaled_train_samples[:5]

array([[0.44827586],
       [0.91954023],
       [0.24137931],
       [0.44827586],
       [0.45977011]])

CREATING AN ARTIFICIAL NEURAL NETWORK

In [47]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy

In [48]:
model = Sequential([
    Dense(units = 16, input_shape=(1,), activation = 'relu'),
    Dense(units = 32 , activation = 'relu'),
    Dense(units = 2, activation = 'softmax')
])

In [49]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 16)                32        
                                                                 
 dense_4 (Dense)             (None, 32)                544       
                                                                 
 dense_5 (Dense)             (None, 2)                 66        
                                                                 
Total params: 642
Trainable params: 642
Non-trainable params: 0
_________________________________________________________________


MODEL TRAINING

In [50]:
model.compile(optimizer= Adam(learning_rate = 0.001), loss = 'sparse_categorical_crossentropy', metrics=['accuracy'])

In [51]:
model.fit(x=scaled_train_samples, y=train_labels, validation_split = 0.1, batch_size=10, shuffle=True, epochs=30, verbose=2)

Epoch 1/30
189/189 - 2s - loss: 0.5870 - accuracy: 0.7122 - val_loss: 0.4737 - val_accuracy: 0.8333 - 2s/epoch - 9ms/step
Epoch 2/30
189/189 - 0s - loss: 0.3490 - accuracy: 0.9021 - val_loss: 0.3151 - val_accuracy: 0.9143 - 375ms/epoch - 2ms/step
Epoch 3/30
189/189 - 0s - loss: 0.2690 - accuracy: 0.9270 - val_loss: 0.3078 - val_accuracy: 0.8952 - 366ms/epoch - 2ms/step
Epoch 4/30
189/189 - 0s - loss: 0.2540 - accuracy: 0.9354 - val_loss: 0.2984 - val_accuracy: 0.9286 - 288ms/epoch - 2ms/step
Epoch 5/30
189/189 - 0s - loss: 0.2454 - accuracy: 0.9370 - val_loss: 0.2985 - val_accuracy: 0.9143 - 318ms/epoch - 2ms/step
Epoch 6/30
189/189 - 0s - loss: 0.2422 - accuracy: 0.9450 - val_loss: 0.2910 - val_accuracy: 0.9381 - 301ms/epoch - 2ms/step
Epoch 7/30
189/189 - 0s - loss: 0.2390 - accuracy: 0.9460 - val_loss: 0.2920 - val_accuracy: 0.9143 - 299ms/epoch - 2ms/step
Epoch 8/30
189/189 - 0s - loss: 0.2371 - accuracy: 0.9392 - val_loss: 0.2863 - val_accuracy: 0.9286 - 300ms/epoch - 2ms/step
Epo

<keras.callbacks.History at 0x2bfb3f0dc70>

BUILDING A TEST SET AND PREDICTING

In [52]:
test_labels = []
test_samples = []

In [53]:
for i in range(50):
    #the 5% of younger individuals who did experience side effects
    random_younger = randint(13,64)
    test_samples.append(random_younger)
    test_labels.append(1)

    #the 5% of older individuals who did not experience side effects
    random_older = randint(65,100)
    test_samples.append(random_older)
    test_labels.append(0)

for i in range(200):
    #the 95% of younger individuals who did not experience side effects
    random_younger = randint(13,64)
    test_samples.append(random_younger)
    test_labels.append(0)

    #the 5% of older individuals who did experience side effects
    random_older = randint(65,100)
    test_samples.append(random_older)
    test_labels.append(1)
    
test_labels = np.array(test_labels)
test_samples = np.array(test_samples)
test_labels, test_samples = shuffle(test_labels, test_samples)
scaled_test_samples= scaler.fit_transform(test_samples.reshape(-1,1))
    

In [54]:
predictions = model.predict( x=scaled_test_samples, batch_size=10 , verbose=0)

In [55]:
print(predictions[:5])

[[0.06581298 0.934187  ]
 [0.9341335  0.06586648]
 [0.9665808  0.03341918]
 [0.9673518  0.03264814]
 [0.9487827  0.05121731]]


In [56]:
#rounding off the predictions to get b/w 0 and 1
rounded_predictions = np.argmax(predictions, axis= -1)
rounded_predictions[:5]

array([1, 0, 0, 0, 0], dtype=int64)

CONFUSION MATRIX FOR ACCURACY

In [57]:
%matplotlib inline
from sklearn.metrics import confusion_matrix
import itertools
import matplotlib.pyplot as plt

In [64]:
cm = confusion_matrix(y_true = test_labels, y_pred = rounded_predictions)
print(cm)

[[200  50]
 [ 50 200]]


In [59]:
def plot_confusion_matrix(cm, classes,
                          normalize = False,
                          title = 'Confusion Matrix',
                          cmap = plt.cm.Blues):
    """ THIS FUNCTION PRINTS AND PLOTS THE CONFUSION MATRIX,
        NORMALIZATION CAN BE APPLIED BY SETTING normalization = True """
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arrange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)
    
    if normalize:
        cm=cm.astype('float')/cm.sum(axis=1)[:, np.newaxis]
        print("Normalized Confusion Matrix")
    else:
        print('Confusion matrix, without normalization')
    print(cm)
    
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
        horizontalalignment="center",
        color= "white" if cm[i, j] > thresh else "black" )
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')