## Deep Learning Glossary

We will learn about deep learning components and terminologies

### Activation Function

- To allow Neural Networks to learn complex decision boundaries 

- we apply a nonlinear activation function to some of its layers

- Commonly used functions include:

    - sigmoid, tanh, ReLU (Rectified Linear Unit) and variants of these.

Imagine x is our feature matrix, then 

what is x.shape[0]? - the number of samples

what is x.shape[1]? - the number of features (columns) we have

In [2]:
model = Sequential()
model.add(Dense(1, activation='sigmoid', input_dim=x.shape[1]))

NameError: name 'Sequential' is not defined

### Loss Function (Cost Function)

- When we build a neural network, the neural network tries to predict the output as close as possible to the actual value


- For prediction type problem the cost functions are:

    - MSE, MAE, ...
    - `model.compile(optimizer='rmsprop', loss='mse')`   --> keras wants to minimize MSE
    
    
- For classification type problem the cost functions are:

    - Categorical Cross-Entropy, Binary Cross-Entropy
    - `model.compile(optimizer='rmsprop', loss='binary_crossentropy')`


### Algorithms (or Optimization Methods) to Minimize Error

- Gradient Descent (GD): To think of it intuitively, while climbing down a hill you should take small steps and walk down instead of just jumping down at once. Therefore, what we do is, if we start from a point x, we move down a little i.e. delta h, and update our position to x-delta h and we keep doing the same till we reach the bottom


- Stochastic gradient descent (SGD)


- Learning rate: Both GD and SGD need learning rate to adjust the new weight 

    - w1_new= w1 - (learning rate)* (derivative of cost function wrt w1)
    
    - RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam
    
    - Check out book pdf Ch. 16 Optimizers


### Dropout

- Dropout is a regularization technique for Neural Networks that prevents overfitting

- `model.add(Dropout(0.25))` --> removes 25% of connections from the bottom layer to the next layer

<img src = 'https://github.com/Make-School-Courses/DS-2.2-Deep-Learning/raw/master/Notebooks/Images/dropout.png' >

### Epoch and Batch

- Epoch is when an ENTIRE dataset is passed forward and backward through the neural network

    - `model.fit(x, y, epochs=10, validation_data=(x_val, y_val))`
    
- Batch is number of samples per gradient update

    - `model.fit(x, y, batch_size=2, epochs=10)`

## Activity: Apply NN with Keras on iris data

- Use 100 samples for training and 50 samples for validation

- Set the value of epoch to 5 

- Change the `batch_size` value from 1 to 100 and plot the accuracy versus batch_size

- Change the `verbose` to 0, 1 and 2

### Observations:
1. Lower batch size better accuracy at the price of a slower computation

2. The entire model should be in the for loop, if not then our starting points for the weighs and biases are at good condition

In [58]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from keras.utils import np_utils
from keras.models import Sequential 
from keras.layers.core import Dense, Activation

# importing iris - 4 features in this dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# split the data: train 70%, test 30%
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

y_train_one_hot = np_utils.to_categorical(y_train)
y_test_one_hot = np_utils.to_categorical(y_test)


# instantiate Sequential model
model = Sequential()

# define one hidden layer with 16 neurons, input_shape = 4 for 4 features
model.add(Dense(16, input_shape=(4,)))
model.add(Activation('sigmoid'))

# 3 outputs for the 3 types of iris classes --> there should be 3 output neurons
model.add(Dense(3))
model.add(Activation('softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=["accuracy"])
model.fit(X_train, y_train_one_hot, epochs=100, batch_size=1, verbose=0);
loss, accuracy = model.evaluate(X_test, y_test_one_hot, verbose=0)



IndentationError: unexpected indent (<ipython-input-58-d39bd2d59f11>, line 38)

## Activity: Apply Lambda Layer in Keras and test how it works

- Write a code that takes a array with size 3 and apply a Lambda Layer in Keras to double the arrays elements

In [59]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

from keras.layers import Lambda, Input
from keras.models import Model

import numpy as np

input = Input(shape=(3,))
double = Lambda(lambda x: 2 * x)(input)

model = Model(input=input, output=double)
model.compile(optimizer='sgd', loss='mse')

data = np.array([[5, 12, 1]])    # 10, 24, 2
print(model.predict(data))

  if sys.path[0] == '':


[[10. 24.  2.]]


### Batch Normalization

- Batch Normalization is a technique that normalizes the data even at hidden layers 

- `model.add(BatchNormalization())`

<img src = 'https://github.com/Make-School-Courses/DS-2.2-Deep-Learning/raw/master/Notebooks/Images/batch_normalization.png' width='600' height ='600'>

1. What is activation function? - allows NN to learn complex boundaries


2. What is the loss function for two and three class, classification problem? - 


3. What is optimizer in Keras? - It is the weight updating rule.


4. What is dropout? - It reduces the number of connections to prevent overfitting. 


5. What is Batch Normalization? - 