### Data Preparation and Processing

- In data processing we have to convert the data into the format which **fit function** accepts. Which can be known using API docs for fit function. https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#fit




In [2]:
import numpy as np
from random import randint
from sklearn.utils import shuffle
from sklearn.preprocessing import MinMaxScaler

In [3]:
train_labels = []
train_samples = []


Example data:  
    - An experimental drug was tested on the individuals from ages 13 to 100 in a clinical trial.  
    - The trial had 2100 participants. Half were 65 years old, half were 65 years or older.   
    - 95% of 65 years or older experienced side effects.  
    - 95% of patients under 65 years experienced no side effects.  

# Dataset generation



In [4]:
for i in range(50):
    # 5% of patients under 65 experienced side effects.
    random_age = randint(13, 65)
    train_samples.append(random_age)
    train_labels.append(1)
    
    # 5% of patients above 65 experienced no side effects.
    random_age = randint(65, 100)
    train_samples.append(random_age)
    train_labels.append(0)
    
for i in range(1000):
    # 95% of patients under 65 experienced no side effects.
    random_age = randint(13, 65)
    train_samples.append(random_age)
    train_labels.append(0)
    
    # 95% of patients above 65 experienced side effects.
    random_age = randint(65,100)
    train_samples.append(random_age)
    train_labels.append(1)

In [5]:
# Convert the list into np array
train_labels = np.array(train_labels)
train_samples = np.array(train_samples)

# shuffle train_labels and train_samples
(train_labels, train_samples) = shuffle(train_labels, train_samples)

In [6]:
scaler = MinMaxScaler(feature_range=(0,1))
scaled_train_samples = scaler.fit_transform(train_samples.reshape(-1,1))
scaled_train_samples

array([[0.27586207],
       [0.95402299],
       [0.12643678],
       ...,
       [0.13793103],
       [0.93103448],
       [0.72413793]])

## tf.keras Sequential Model

In [7]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy

In [8]:
model = Sequential([
    Dense(units=16, input_shape=(1,), activation='relu'), # here units are arbitrary, 
    #since 1d arr input shape is (1,) nd activation fn is rectified linear units
    Dense(units=32, activation='relu'),
    Dense(units=2, activation='softmax') # softmax converts vector of no to vector of probabilities
])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 16)                32        
_________________________________________________________________
dense_1 (Dense)              (None, 32)                544       
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 66        
Total params: 642
Trainable params: 642
Non-trainable params: 0
_________________________________________________________________


In [11]:
model.compile(optimizer=Adam(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x=scaled_train_samples, y=train_labels, validation_split=0.1,batch_size=10, epochs=30, shuffle=True, verbose=2)

# epochs = The model is going to train 30 times on all the data
# batch_size = 10 samples will be sent and processed at a time by model
# validation split = x, 0<x<1
# it is goint to take the last 10% of the data before shuffle. The shuffle occurs after split.

Epoch 1/30
189/189 - 1s - loss: 0.2545 - accuracy: 0.9323 - val_loss: 0.2386 - val_accuracy: 0.9381
Epoch 2/30
189/189 - 0s - loss: 0.2541 - accuracy: 0.9354 - val_loss: 0.2385 - val_accuracy: 0.9381
Epoch 3/30
189/189 - 0s - loss: 0.2537 - accuracy: 0.9349 - val_loss: 0.2386 - val_accuracy: 0.9381
Epoch 4/30
189/189 - 0s - loss: 0.2533 - accuracy: 0.9381 - val_loss: 0.2383 - val_accuracy: 0.9381
Epoch 5/30
189/189 - 0s - loss: 0.2530 - accuracy: 0.9323 - val_loss: 0.2378 - val_accuracy: 0.9381
Epoch 6/30
189/189 - 0s - loss: 0.2524 - accuracy: 0.9344 - val_loss: 0.2373 - val_accuracy: 0.9476
Epoch 7/30
189/189 - 0s - loss: 0.2523 - accuracy: 0.9392 - val_loss: 0.2375 - val_accuracy: 0.9381
Epoch 8/30
189/189 - 0s - loss: 0.2518 - accuracy: 0.9370 - val_loss: 0.2372 - val_accuracy: 0.9381
Epoch 9/30
189/189 - 0s - loss: 0.2514 - accuracy: 0.9376 - val_loss: 0.2373 - val_accuracy: 0.9381
Epoch 10/30
189/189 - 0s - loss: 0.2511 - accuracy: 0.9402 - val_loss: 0.2367 - val_accuracy: 0.9476

<keras.callbacks.History at 0x1cbe05ee940>

In [12]:
test_samples = []
test_labels = []

In [14]:
for i in range(50):
    # 5% of patients under 65 experienced side effects.
    random_age = randint(13, 65)
    test_samples.append(random_age)
    test_labels.append(1)
    
    # 5% of patients above 65 experienced no side effects.
    random_age = randint(65, 100)
    test_samples.append(random_age)
    test_labels.append(0)

for i in range(200):
    # 95% of patients under 65 experienced no side effects.
    random_age = randint(13, 65)
    test_samples.append(random_age)
    test_labels.append(0)
    
    # 95% of patients above 65 experienced side effects.
    random_age = randint(65,100)
    test_samples.append(random_age)
    test_labels.append(1)

In [15]:
test_samples = np.array(test_samples)
test_labels = np.array(test_labels)

In [21]:
scaler = MinMaxScaler(feature_range=(0,1))
scaled_test_samples = scaler.fit_transform(test_samples.reshape(-1,1))

### Prediction

In [30]:
predictions = model.predict(x=scaled_test_samples, batch_size=10, verbose=0)
predictions
rounded_predictions = np.argmax(predictions, axis=-1)
rounded_predictions

array([[0.9646796 , 0.03532033],
       [0.21405077, 0.78594923],
       [0.94512767, 0.05487234],
       [0.02219243, 0.9778075 ],
       [0.96502995, 0.03497002],
       [0.05324061, 0.94675934],
       [0.8651669 , 0.13483314],
       [0.12229387, 0.87770617],
       [0.9644835 , 0.03551642],
       [0.11063191, 0.8893681 ],
       [0.9652232 , 0.03477684],
       [0.4570404 , 0.5429596 ],
       [0.96475786, 0.0352422 ],
       [0.11063191, 0.8893681 ],
       [0.9514339 , 0.04856611],
       [0.05925589, 0.94074416],
       [0.8651669 , 0.13483314],
       [0.40180564, 0.5981943 ],
       [0.67490816, 0.32509187],
       [0.01986017, 0.9801398 ],
       [0.83660764, 0.16339234],
       [0.01776856, 0.98223144],
       [0.9650687 , 0.03493129],
       [0.21405077, 0.78594923],
       [0.90147924, 0.09852072],
       [0.09995546, 0.90004456],
       [0.9650687 , 0.03493129],
       [0.25445408, 0.745546  ],
       [0.7652775 , 0.23472247],
       [0.01986017, 0.9801398 ],
       [0.