<a href="https://colab.research.google.com/github/ShaunakSen/Deep-Learning/blob/master/FreecodeCamp_DeepLizard_Keras_with_Tensorflow_Course.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## FreecodeCamp DeepLizard Keras with Tensorflow Course

> Written notes on the tutorial by [DeepLizard](https://youtube.com/deeplizard) and [FreeCodeCamp](https://www.youtube.com/channel/UC8butISFwT-Wl7EV0hUK0BQ): https://www.youtube.com/watch?v=qFJeN9V1ZsI

---

### Data preparation and processing

In [1]:
import numpy as np
from random import randint
from sklearn.utils import shuffle
from sklearn.preprocessing import MinMaxScaler

In [2]:
train_samples, train_labels = [], []

For this simple task, we'll be creating our own example data set.

#### Data Creation

As motivation for this data, let’s suppose that an experimental drug was tested on individuals ranging from age 13 to 100 in a clinical trial. The trial had 2100 participants. Half of the participants were under 65 years old, and the other half was 65 years of age or older.

The trial showed that around 95% of patients 65 or older experienced side effects from the drug, and around 95% of patients under 65 experienced no side effects, generally showing that elderly individuals were more likely to experience side effects.

Ultimately, we want to build a model to tell us whether or not a patient will experience side effects solely based on the patient's age. The judgement of the model will be based on the training data.

Note that with the simplicity of the data along with the conclusions drawn from it, a neural network may be overkill, but understand this is just to first get introduced to working with data for deep learning, and later, we'll be making use of more advanced data sets.



In [10]:
young_population = old_population = int(2100/2)

minority = int(0.05*old_population)*2

print (minority)

104


So the minority population is around 100 people (50 old and 50 young)

In [11]:
for i in range(50):
    # The ~5% of younger individuals who did experience side effects#
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(1)

    # The ~5% of older individuals who did not experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(0)

    ### we have added 100 of the minority to the data

for i in range(1000):
    # The ~95% of younger individuals who did not experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(0)

    # The ~95% of older individuals who did experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(1)

In [12]:
print (len(train_samples), len(train_labels))

2100 2100


This code creates 2100 samples and stores the age of the individuals in the train_samples list and stores whether or not the individuals experienced side effects in the train_labels list.

Convert the data to numpy arrays:

In [13]:
train_labels = np.array(train_labels)
train_samples = np.array(train_samples)
### before shuffling
print (train_labels[:10], train_samples[:10])

train_labels, train_samples = shuffle(train_labels, train_samples) ### consistent order; so keeps track of the correspondence bw the 2 as we shuffle
print (train_labels[:10], train_samples[:10])


[1 0 1 0 1 0 1 0 1 0] [ 33  88  23 100  45  70  43  93  42  77]
[1 0 1 0 1 0 1 1 1 1] [74 53 65 38 78 49 65 68 76 24]


In [14]:
### test for shuffle
t1 = np.array([23,34,12,11,34,65,32])
t2 = np.array([1,1,0,0,0,1,1])

t1_new,t2_new = shuffle(t1,t2)

print(t1_new, t2_new)

[65 34 23 12 11 34 32] [1 0 1 0 0 1 1]


> Ok, so shuffle does keep track of the correspondence

In this form, we now have the ability to pass the data to the model because it is now in the required format, however, before doing that, we'll first scale the data down to a range from 0 to 1.

We'll use scikit-learn’s MinMaxScaler class to scale all of the data down from a scale ranging from 13 to 100 to be on a scale from 0 to 1.

We reshape the data as a technical requirement just since the fit_transform() function doesn’t accept 1D data by default.



In [18]:
scaler = MinMaxScaler(feature_range=(0,1))
scaled_train_samples = scaler.fit_transform(X=train_samples.reshape(-1, 1))

print (scaled_train_samples[:5])

[[0.70114943]
 [0.45977011]
 [0.59770115]
 [0.28735632]
 [0.74712644]]
