<a href="https://colab.research.google.com/github/ShaunakSen/Deep-Learning/blob/master/FreecodeCamp_DeepLizard_Keras_with_Tensorflow_Course.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## FreecodeCamp DeepLizard Keras with Tensorflow Course

> Written notes on the tutorial by [DeepLizard](https://youtube.com/deeplizard) and [FreeCodeCamp](https://www.youtube.com/channel/UC8butISFwT-Wl7EV0hUK0BQ): https://www.youtube.com/watch?v=qFJeN9V1ZsI

---

### Data preparation and processing

In [1]:
import numpy as np
from random import randint
from sklearn.utils import shuffle
from sklearn.preprocessing import MinMaxScaler

In [2]:
train_samples, train_labels = [], []

For this simple task, we'll be creating our own example data set.

#### Data Creation

As motivation for this data, let’s suppose that an experimental drug was tested on individuals ranging from age 13 to 100 in a clinical trial. The trial had 2100 participants. Half of the participants were under 65 years old, and the other half was 65 years of age or older.

The trial showed that around 95% of patients 65 or older experienced side effects from the drug, and around 95% of patients under 65 experienced no side effects, generally showing that elderly individuals were more likely to experience side effects.

Ultimately, we want to build a model to tell us whether or not a patient will experience side effects solely based on the patient's age. The judgement of the model will be based on the training data.

Note that with the simplicity of the data along with the conclusions drawn from it, a neural network may be overkill, but understand this is just to first get introduced to working with data for deep learning, and later, we'll be making use of more advanced data sets.



In [3]:
young_population = old_population = int(2100/2)

minority = int(0.05*old_population)*2

print (minority)

104


So the minority population is around 100 people (50 old and 50 young)

In [4]:
for i in range(50):
    # The ~5% of younger individuals who did experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(1)

    # The ~5% of older individuals who did not experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(0)

    ### we have added 100 of the minority to the data

for i in range(1000):
    # The ~95% of younger individuals who did not experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(0)

    # The ~95% of older individuals who did experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(1)

In [5]:
print (len(train_samples), len(train_labels))

2100 2100


This code creates 2100 samples and stores the age of the individuals in the train_samples list and stores whether or not the individuals experienced side effects in the train_labels list.

Convert the data to numpy arrays:

In [6]:
train_labels = np.array(train_labels)
train_samples = np.array(train_samples)
### before shuffling
print (train_labels[:10], train_samples[:10])

train_labels, train_samples = shuffle(train_labels, train_samples) ### consistent order; so keeps track of the correspondence bw the 2 as we shuffle
print (train_labels[:10], train_samples[:10])


[1 0 1 0 1 0 1 0 1 0] [36 92 57 93 27 70 21 71 34 71]
[1 1 1 0 1 1 1 0 1 1] [78 66 72 14 85 67 77 37 67 79]


In [7]:
### test for shuffle
t1 = np.array([23,34,12,11,34,65,32])
t2 = np.array([1,1,0,0,0,1,1])

t1_new,t2_new = shuffle(t1,t2)

print(t1_new, t2_new)

[11 34 32 65 12 34 23] [0 1 1 1 0 0 1]


> Ok, so shuffle does keep track of the correspondence

In this form, we now have the ability to pass the data to the model because it is now in the required format, however, before doing that, we'll first scale the data down to a range from 0 to 1.

We'll use scikit-learn’s MinMaxScaler class to scale all of the data down from a scale ranging from 13 to 100 to be on a scale from 0 to 1.

We reshape the data as a technical requirement just since the fit_transform() function doesn’t accept 1D data by default.



In [8]:
scaler = MinMaxScaler(feature_range=(0,1))
scaled_train_samples = scaler.fit_transform(X=train_samples.reshape(-1, 1))

print (scaled_train_samples[:5])

[[0.74712644]
 [0.6091954 ]
 [0.67816092]
 [0.01149425]
 [0.82758621]]


### Create An Artificial Neural Network With TensorFlow's Keras API

---



In [9]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy

In [12]:
physical_devices = tf.config.experimental.list_physical_devices("GPU")
print (physical_devices)

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


set_memory_growth() attempts to allocate only as much GPU memory as needed at a given time, and continues to allocate more when needed. If this is not enabled, then we may end up running into the error below when we train the model later.

`Blas GEMM launch failed`

In [13]:
tf.config.experimental.set_memory_growth(device=physical_devices[0], enable=True)

#### Build A Sequential Model


In [14]:
model = Sequential([
    Dense(units=16, input_shape=(1,), activation='relu'),
    Dense(units=32, activation='relu'),
    Dense(units=2, activation='softmax')
])

As discussed, we’ll be training our network on the data that we generated and processed in the previous episode, and recall, this data is one-dimensional. The input_shape parameter expects a tuple of integers that matches the shape of the input data, so we correspondingly specify (1,) as the input_shape of our one-dimensional data.

You can think of the way we specify the input_shape here as acting as an **implicit input layer**. The input layer of a neural network is the underlying raw data itself, therefore we don't create an explicit input layer. **This first Dense layer that we're working with now is actually the first hidden layer**.

Lastly, an optional parameter that we’ll set for the Dense layer is the activation function to use after this layer. We’ll use the popular choice of relu. Note, if you don’t explicitly set an activation function, then Keras will use the linear activation function.

Our next layer will also be a Dense layer, and this one will have 32 nodes. The choice of how many neurons this node has is also arbitrary, as the idea is to create a simple model, and then test and experiment with it. If we notice that it is insufficient, then at that time, we can troubleshoot the issue and begin experimenting with changing parameters, like number of layers, nodes, etc.

Lastly, we specify the output layer. This layer is also a Dense layer, and it will have 2 neurons. This is because we have two possible outputs: either a patient experienced side effects, or the patient did not experience side effects.

This time, the activation function we’ll use is softmax, which will give us a probability distribution among the possible outputs.

In [15]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 16)                32        
_________________________________________________________________
dense_1 (Dense)              (None, 32)                544       
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 66        
Total params: 642
Trainable params: 642
Non-trainable params: 0
_________________________________________________________________


#### How do we arrive at 642 trainable params:

1. 1 ip. Then 1 hidden layer with 16 nodes. So 16 connections (wts) and 16 (biases): `16x2`

2. Next layer has 32 nodes. `16*32` wts + 32 biases: `(16*32)+32`

3. Last layer has 2 nodes. So `32*2` wts + 2 biases: `32*2+2`

`16*2 + (16*32)+32 + (32*2)+2 = 642`

### Train An Artificial Neural Network With Keras

---

The first thing we need to do to get the model ready for training is call the compile() function on it.



In [21]:
model.compile(optimizer=Adam(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [22]:
model.fit(x=scaled_train_samples, y=train_labels, batch_size=10, epochs=30, verbose=2, shuffle=True)

Epoch 1/30
210/210 - 0s - loss: 0.6549 - accuracy: 0.5810
Epoch 2/30
210/210 - 0s - loss: 0.6068 - accuracy: 0.6795
Epoch 3/30
210/210 - 0s - loss: 0.5653 - accuracy: 0.7414
Epoch 4/30
210/210 - 0s - loss: 0.5210 - accuracy: 0.7914
Epoch 5/30
210/210 - 0s - loss: 0.4807 - accuracy: 0.8333
Epoch 6/30
210/210 - 0s - loss: 0.4456 - accuracy: 0.8467
Epoch 7/30
210/210 - 0s - loss: 0.4138 - accuracy: 0.8671
Epoch 8/30
210/210 - 0s - loss: 0.3850 - accuracy: 0.8819
Epoch 9/30
210/210 - 0s - loss: 0.3610 - accuracy: 0.8943
Epoch 10/30
210/210 - 0s - loss: 0.3410 - accuracy: 0.9038
Epoch 11/30
210/210 - 0s - loss: 0.3252 - accuracy: 0.9100
Epoch 12/30
210/210 - 0s - loss: 0.3125 - accuracy: 0.9138
Epoch 13/30
210/210 - 0s - loss: 0.3025 - accuracy: 0.9200
Epoch 14/30
210/210 - 0s - loss: 0.2944 - accuracy: 0.9238
Epoch 15/30
210/210 - 0s - loss: 0.2882 - accuracy: 0.9248
Epoch 16/30
210/210 - 0s - loss: 0.2832 - accuracy: 0.9319
Epoch 17/30
210/210 - 0s - loss: 0.2788 - accuracy: 0.9314
Epoch 

<tensorflow.python.keras.callbacks.History at 0x7f5d20282358>

We set shuffle to True as we do not want the model to learn any implicit order by which it sees the training samples