## Deep Learning Platforms in Python

1- Keras

2- Tensorflow

3- Pytorch

4- Caffe

5- Theano

6- CNTK

7- MXNET


## Why we use Keras in DS 2.2 ?

- A focus on user experience, easy to build and train a deep learning model

- Easy to learn and easy to use

- Large adoption in the industry and research community

- Multi-backend, multi-platform

- Easy productization of models


<img src = 'https://github.com/Make-School-Courses/DS-2.2-Deep-Learning/raw/master/Notebooks/Images/why_keras.png' width='500' height='500'>

Review of Pseudoode for Neural Network:
1. Have the training data (x_train, y_train)
2. define the hyper-parameters of our model:
    - initialize weight(wi, wh) and biases(bi, bh)
    - number of iterations (epoch)
3. Define the error function (cost function)
    - MSE (mean square error used in DS2.1)
    - Classification Problem can use:
        - binary cross entropy
        - categorical cross entropy
        
for i in range(epoch):
    - pass the data in NN
    - updates the weights and biases to decrease the error function

## Keras has two API Styles

### The Sequential API

- Dead simple

- Only for single-input, single-output, sequential layer stacks

- Good for 70+% of use cases

<img src = 'https://github.com/Make-School-Courses/DS-2.2-Deep-Learning/raw/master/Notebooks/Images/keras_sequential_api_2.png' width='800' height='800' >

### The functional API

- Like playing with Lego bricks

- Multi-input, multi-output, arbitrary static graph topologies

- Good for 95% of use cases

- Great if we want to have acess to hidden layers or if we want to do branching

<img src = 'https://github.com/Make-School-Courses/DS-2.2-Deep-Learning/raw/master/Notebooks/Images/keras_functional_api_2.png' width='700' height='700' >

## Activity: Apply NN with Keras on iris data

- Use Sequential API for Keras

- Use 70 percent of data for train

- Use one-hot encoding for labels with `from keras.utils import np_utils`

- Define two layers fully connected network with 16 neurons as hidden layer

- Define `categorical_crossentropy` as the loss (cost) function

In [20]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from keras.utils import np_utils
from keras.models import Sequential 
from keras.layers.core import Dense, Activation

# importing iris - 4 features in this dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# split the data: train 70%, test 30%
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

y_train_one_hot = np_utils.to_categorical(y_train)
y_test_one_hot = np_utils.to_categorical(y_test)


# instantiate Sequential model
model = Sequential()

# define one hidden layer with 16 neurons, input_shape = 4 for 4 features
model.add(Dense(16, input_shape=(4,)))
model.add(Activation('sigmoid'))

# 3 outputs for the 3 types of iris classes --> there should be 3 output neurons
model.add(Dense(3))
model.add(Activation('softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=["accuracy"])
model.fit(X_train, y_train_one_hot, epochs=100, batch_size=1, verbose=0);
loss, accuracy = model.evaluate(X_test, y_test_one_hot, verbose=0)
print("Accuracy = {:.2f}".format(accuracy))

Accuracy = 0.98


In [17]:
y_train_one_hot[1]

array([0., 0., 1.], dtype=float32)

In [28]:
X_test[0]

array([5.8, 2.8, 5.1, 2.4])

In [26]:
import numpy as np

# the probability of the true label
model.predict(np.array([X_test[0]]))   # the sum of this model.predict will always be 1

# [7.4588655e-07, 1.8208060e-02, 9.8179126e-01] --> [probability of class 1, probability of class 2, probability of class 3]

array([[7.4588655e-07, 1.8208060e-02, 9.8179126e-01]], dtype=float32)

In [27]:
# 98%

9.8179126e-01*100

98.17912600000001

In [23]:
sum(model.predict(np.array([X_test[0]]))[0])

1.000000064158428

In [24]:
# the true label
y_train_one_hot[0]

array([0., 1., 0.], dtype=float32)

In [30]:
y_test[0]    # 2 = class 2

2

In [32]:
y_train[0]

1

In [25]:
model.predict(X_test)

array([[7.45888542e-07, 1.82080641e-02, 9.81791139e-01],
       [5.33477589e-03, 9.83437657e-01, 1.12275472e-02],
       [9.97275293e-01, 2.72471714e-03, 9.82724213e-09],
       [5.49272499e-06, 6.68600947e-02, 9.33134437e-01],
       [9.94595945e-01, 5.40410075e-03, 2.40600038e-08],
       [3.63892525e-07, 1.12386895e-02, 9.88760889e-01],
       [9.95333850e-01, 4.66620596e-03, 2.10877769e-08],
       [2.69492529e-03, 9.57964063e-01, 3.93409804e-02],
       [1.69107993e-03, 9.30826128e-01, 6.74827769e-02],
       [7.79246865e-03, 9.83786881e-01, 8.42071045e-03],
       [4.52484637e-05, 2.42636964e-01, 7.57317722e-01],
       [4.12997650e-03, 9.72947657e-01, 2.29223333e-02],
       [2.64735543e-03, 9.60251808e-01, 3.71008888e-02],
       [1.33323646e-03, 9.00562763e-01, 9.81039852e-02],
       [1.49383361e-03, 9.10458744e-01, 8.80474299e-02],
       [9.96236384e-01, 3.76361120e-03, 1.39056731e-08],
       [1.39593449e-03, 8.98373187e-01, 1.00230798e-01],
       [1.87641697e-03, 9.35549

## Appropriate Loss Function !(very important)!

- When we have two class classification problem

    - The loss function should be `binary_crossentropy`
    - We need one output neuron
    - The activation function of last layer would be `sigmoid`
    
 
- When we have multi-class classification problem

    - The loss function should be `categorical_crossentropy`
    - We need N output neuron where N is the number of classes we have
    - The activation function of last layer would be `softmax`
    
    
- When we have regression problem

    - The loss function should be `mse` or `mae`
    - We need one output neuron
    - The activation function of last layer can be `linear`

## Activity: Apply NN with Keras on iris data with Functional API

In [37]:
from keras.layers import Input, Dense
from keras.models import Model
from keras.utils import np_utils
from sklearn.preprocessing import LabelEncoder
from sklearn import datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

y_train_one_hot = np_utils.to_categorical(y_train)
y_test_one_hot = np_utils.to_categorical(y_test)

# 4 features
inp = Input(shape=(4,))

# 16 neurons in hidden layer
x = Dense(16, activation = 'sigmoid')(inp)
out = Dense(3, activation='softmax')(x)

# build the model with inputs at inp and outputs at out
predictions = Model(inputs=inp, outputs=out)

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=["accuracy"])
model.fit(X_train, y_train_one_hot, epochs=100, batch_size=1, verbose=0);
loss, accuracy = model.evaluate(X_test, y_test_one_hot, verbose=0)
print("Accuracy = {:.2f}".format(accuracy))

Accuracy = 0.98
