## Preprocessing Data

In [1]:
import numpy as np
from random import randint
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

In [2]:
train_labels =  []
train_samples = []

Example data: 
- An experiemental drug was tested on individuals from ages 13 to 100. 
- The trial had 2100 participants. Half were under 65 years old, half were over 65 years old.
- 95% of patientes 65 or older experienced side effects.
- 95% of patients under 65 experienced no side effects.

In [3]:
for i in range(50):
    # The 5% of younger individuals who did experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(1)
    
    # The 5% of older individuals who did not experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(0)

for i in range(1000):
    # The 95% of younger individuals who did not experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(0)
    
    # The 95% of older individuals who did experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(1)

## Storing the data in Pandas Dataframe

In [4]:
df = pd.DataFrame(data = {"Age": train_samples, "Side Effect?": train_labels})

In [5]:
df.head(10) # Top 10 datapoints, 1 means Yes, 0 means No

Unnamed: 0,Age,Side Effect?
0,39,1
1,94,0
2,16,1
3,67,0
4,30,1
5,90,0
6,27,1
7,98,0
8,27,1
9,85,0


In [6]:
df.describe()

Unnamed: 0,Age,Side Effect?
count,2100.0,2100.0
mean,60.29381,0.5
std,25.665675,0.500119
min,13.0,0.0
25%,37.0,0.0
50%,64.5,0.5
75%,83.0,1.0
max,100.0,1.0


## Scalling Input in range 0 to 1 (Normalize)

In [7]:
scaler = MinMaxScaler(feature_range=(0,1))
df['Age'] = scaler.fit_transform((df['Age']).values.reshape(-1,1))



In [8]:
df.head(10)

Unnamed: 0,Age,Side Effect?
0,0.298851,1
1,0.931034,0
2,0.034483,1
3,0.62069,0
4,0.195402,1
5,0.885057,0
6,0.16092,1
7,0.977011,0
8,0.16092,1
9,0.827586,0


## Simple Sequential Model


In [9]:
import keras
from keras import backend as K
from keras.models import Sequential
from keras.layers import Activation
from keras.layers.core import Dense
from keras.optimizers import Adam
from keras.metrics import categorical_crossentropy

Using TensorFlow backend.


The Sequential model is a linear stack of layers.

We can pass in an array each of which element will represents one layer.

In [10]:
model = Sequential([
    Dense(16, input_shape=(1,), activation='relu'),
    Dense(32, activation='relu'),
    Dense(2, activation='softmax')
])

Instead of passing the layer in constructor we can also use add() method.

```
model = Sequential()
model.add(Dense(16, input_shape(1, ), activation = 'relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(2, activation='softmax'))
```

***

Here **Dense()** represents the first hidden layer in NN.

Example from [here](http://keras.dhpit.com/):
```
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
```

It means 8 input parameters, with 12 neurons in the FIRST hidden layer.

![](http://keras.dhpit.com/img/nn.png)

So for our case we have 1 input parameter and in 1's hidden layer we have 16 neurons, for 2nd hidden layer we have 32 neurons and for output layer we have 2 neurons.

***

NOTE: *The model needs to know what input shape it should expect. For this reason, the first layer in a Sequential model (and only the first, because following layers can do automatic shape inference) needs to receive information about its input shape.*

***

Other then this we have pass the activation function which needed to be applied to convert each input signal into output signals (Activation functions are applied to the weighted sum and based on the value we get after applying activation fucntion we deside whrther to pass on the signal to next neuron or not).

There are quite a few activation function we should look for:

**Treshold**:

![](threshold.png)

**Sigmoid**:

![](sigmoid.png)

**Rectifier**:

![](rectifier.png)

**Hyperbolic Tangent**:

![](hyperbolicTangent.png)

There are many more.

***

This is a simple structure of neural network

![](structure.png)

We can see the summary of the neural network. 

In [11]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 16)                32        
_________________________________________________________________
dense_2 (Dense)              (None, 32)                544       
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 66        
Total params: 642
Trainable params: 642
Non-trainable params: 0
_________________________________________________________________


In [12]:
model.compile(optimizer = Adam(lr=.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

Here we are using Adam optimization function. Optimization function are the way by which we update the weight and bias in our neural network (It does it by minimizing (or maximizing) the **Objective function** or sometime called **Error Function**. There are different optimazation fuction which may affect the way you produce the output (it may be slightly better or faster).

[There are many optimization function to use from](https://keras.io/optimizers/):

* Gradient Descent
* Adagrad
* AdaDelta
* Adam

and many more.

***

Loss function is used to measure the inconsistency between predicted value (y') and actual label (y).

[There are many loss fucntion to use from](https://keras.io/losses/):

* Mean Squared Error
* Mean Absolute Error
* Mean Squared Logarithmic Error
* Categorical Cross Entropy

and many more

***

A **[metric](https://keras.io/metrics/)** is a function that is used to judge the performance of your model. Metric functions are to be supplied in the metrics parameter when a model is compiled.

A metric function is similar to a loss function, except that the results from evaluating a metric are not used when training the model.

***

All we are doing here is compiling the model. This is only neccesary when we are *training* the model, but not when we are *predicting* something using a pretrained model.

This is because when training the model we need to do both *forward pass* and *back pass*. So we need to specify which optimization function we need to use to update the weight and biases or what loss function we need to use. But when predicting we just need one forward pass. Hence no compiling is required while predicting.

## Training

In [13]:
model.fit(x = df['Age'], y = df['Side Effect?'], batch_size=10, epochs=20, shuffle=True, verbose=2)

Epoch 1/20
 - 0s - loss: 0.6716 - acc: 0.5129
Epoch 2/20
 - 0s - loss: 0.6504 - acc: 0.5867
Epoch 3/20
 - 0s - loss: 0.6288 - acc: 0.6338
Epoch 4/20
 - 0s - loss: 0.6074 - acc: 0.6724
Epoch 5/20
 - 0s - loss: 0.5862 - acc: 0.7024
Epoch 6/20
 - 0s - loss: 0.5648 - acc: 0.7357
Epoch 7/20
 - 0s - loss: 0.5433 - acc: 0.7624
Epoch 8/20
 - 0s - loss: 0.5213 - acc: 0.7867
Epoch 9/20
 - 0s - loss: 0.4970 - acc: 0.8152
Epoch 10/20
 - 0s - loss: 0.4737 - acc: 0.8305
Epoch 11/20
 - 0s - loss: 0.4516 - acc: 0.8505
Epoch 12/20
 - 0s - loss: 0.4309 - acc: 0.8643
Epoch 13/20
 - 0s - loss: 0.4104 - acc: 0.8790
Epoch 14/20
 - 0s - loss: 0.3915 - acc: 0.8876
Epoch 15/20
 - 0s - loss: 0.3739 - acc: 0.8933
Epoch 16/20
 - 0s - loss: 0.3582 - acc: 0.9038
Epoch 17/20
 - 0s - loss: 0.3448 - acc: 0.9048
Epoch 18/20
 - 0s - loss: 0.3335 - acc: 0.9133
Epoch 19/20
 - 0s - loss: 0.3236 - acc: 0.9176
Epoch 20/20
 - 0s - loss: 0.3154 - acc: 0.9186


<keras.callbacks.History at 0x7f287fdeb588>

Here we pass the independent and dependent parameter (x and y) folowed by batch size.

Batch size is **number of samples** passed through a network at one time. 

Why not pass one by one?

It's better to pass more than one sample at a time if our machine can easily haddle them. It make the process of training the model fast. But there is a trade off.

Larger batch sizes => faster progress in training but don't always converge as fast. 

Smaller batch sizes => train slower, but can converge faster.

So it depends on the type of problem and is one of the very important hyperparameter.

***

Epoch is one pass over the entire dataset.

***

Think, we have **dataset of 1000 samples** and we are training out model using **batch size of 10 samples**. Now 1 epoch will be completed when we have passed 100 batch.

i.e. `1000 samples / 10 samples per batch = 100 batch per epoch`

***

`Shuffle = True` tell that we should shuffle the data and in each epoch the data (samples) are going to be in different order.

***

verbose is just specifying how we should see the output.

0 = silent, 1 = progress bar, 2 = one line per epoch.