<img src="./images/cads-logo.png" style="height: 100px;" align=left> 
<img src="./images/tf-logo-2.png" style="height: 70px;" align=right>


# Introduction to Deep Learning

**Pre-requisites:**

1. Supervised and Unsupervised Machine Learning
2. Linear Alegbra, Calculus
3. ```Keras``` Basics

## Table of Contents

- [1. Solving a Single Layer Neural Network](#1.-Solving-a-Single-Layer-Neural-Network)
- [2. Parametric Approach to Solving the Neural Network Problem](#2.-Parametric-Approach-to-Solving-the-Single-Layer-Neural-Network)
    - [2.1. Weights](#2.1.-Weights)
    - [2.2. Bias](#2.2.-Bias)
    - [2.3. Activation Function](#2.3.-Activation-Function)
    - [2.4. Loss Function](#2.4.-Loss-Function)
    - [2.5. Cost Function](#2.5.-Cost-Function)
    - [2.6. Gradient Descent](#2.6.-Gradient-Descent)
- [3. Single Layer Neural Network from Scratch](#3.-Single-Layer-Neural-Network-from-Scratch)
    
- [4. Summary of Steps in Training a Single Layer Neural Network for Binary Classification](#4.-Summary-of-Steps-in-Training-a-Single-Layer-Neural-Network-for-Binary-Classification)
- [Important Notes](#Important-Notes)
    - [Why Need a non-Linear Activation Function?](#Why-Need-a-non-Linear-Activation-Function?)
    - [What really are Neural Networks?](#What-really-are-Neural-Networks?)
- [5. Building Neural Networks in ```tf.keras```](#5.-Building-Neural-Networks-in-tf.keras)
    - [Solve MNIST data using Neural Networks with Hidden Layer](#Solve-MNIST-data-using-Neural-Networks-with-Hidden-Layer)
    - [Varying neural networks hyperparameter](#Vary-neural-networks-hyperparameter)

In [None]:
%matplotlib inline

In [None]:
# install helper packages
import matplotlib.pyplot as plt
from pylab import rcParams
rcParams['figure.figsize'] = 9, 5

import seaborn as sns
sns.set(style='whitegrid', palette='muted', font_scale=1.5)

import numpy as np
import pandas as pd
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'


np.random.seed(42) # we need to set the seed for all the random initialization in the future

## 1. Solving a Single Layer Neural Network


Let's start by solving a simple (**but provocative**) problem if wellbore will collapse based from whether it has surpassed the physical limits (stress, pressure, sand production).

<img src="./images/wellbore.png" style="height: 200px;" align=left>
<img src="./images/no_bias.png" style="height: 200px;" align=center>

#### Create feature sets (sand production, pore-pressure, thermal stress) and corresponding label (collapse)


In [None]:

# SC

X =  '''Your Code Here'''
y = '''Your Code Here'''

print dimensions of both X and y

print("Feature Set shape:", X.shape)
print("Feature Set size: ", y.size)

print("")

print("Labels Set shape:", y.shape)
print("Labels Set size: ", y.size)

## 2. Parametric Approach to Solving the Single Layer Neural Network

Let's write down a  function $z(X,W)$ that takes in both the input data $x$, and a set parameters (otherwise known as weights) $w$, and outputs a number $y$ describing wether a person is diabetic or not. 

What would this $z(X,W)$  look like?

$$z  = X.W$$   

$$z  = W^{T}X$$

$$z= x_{1}w_{1} + x_{2}w_{2} + x_{3}w_{3}$$

### 2.1. Weights

Imagine these $w$'s as knobs or buttons that we can tweak to correctly predict if a person is diabetic or not.

#### Initialize the weights


In [None]:
# we assign initial weights

weights = np.random.rand(3,1)

print("Weights shape: ", weights.shape)
print("Weights size: ", weights.size)
weights

### 2.2. Bias

Is a constant term that does not interact with the input data. 

It instead gives us some data-independet prefences of a class over the other. 

<img src="./images/bias_neuron.png" style="height: 300px;" align=center>

### $$z = X.W + b$$

### $$z = W^{T}X + b$$

### $$z= x_{1}w_{1} + x_{2}w_{2} + x_{3}w_{3} + b $$ 


#### Initialize the bias

In [None]:
# SC

bias = '''Your Input Here'''
bias

#### Define the parameteric equation

In [None]:
# SC

z = '''Your Code Here'''

print("z shape: ", z.shape)
print("z size: ", z.size)


Do these numbers make sense? We need a number from the output that is either 1 or 0. 

### 2.3. Activation Function

We need another function to squash the results of $X.W$ between 0 and 1. 

One such function is called the **sigmoid function**.
### $$\sigma_{z} = \frac{1} {{1+ e^{-z}}}$$

### $$a = \sigma(z) = \frac{1} {{1+ e^{-z}}}$$

<img src="./images/fprop.png" style="height: 300px;" align=center>

#### Define the Sigmoid Function

In [None]:
# MC

# define the sigmoid function
def sigmoid(z):
    return '''Your Code Here'''

In [None]:
input_ = np.linspace(-10, 10, 100)
plt.plot(input_, sigmoid(input_), c="r")
plt.title("Sigmoid Function")

#### Apply activation function

In [None]:
# SC

z = '''Your Code Here'''

# let the predicted output be y_hat

y_hat = sigmoid(z)
print("y_hat shape: ", y_hat.shape)
print("y_hat size: ", y_hat.size)

y_hat

In [None]:
plt.plot(z,  y_hat, 'ko')
plt.plot(input_, sigmoid(input_), c="r")
plt.title("Sigmoid Function and Predicted Y_hat")

After letting the neural net predict the output, we compare this output to the actual value. 




### 2.4. Loss Function

The loss function determines how good our predicted label $a$ with respect to the true label is $y$ for a **single training example**. That means we want to minimize this loss. A usual form we know is half of the square difference between the predicted value $a$ and true value $y$. 

$$L(a,y) = \frac{1}{2} (a - y)^2 $$

<img src="./images/DL_12.png" style="height: 300px;" align=center>
<img src="./images/DL_13.png" style="height: 300px;" align=center>

However, this form of loss function **doesn't work well** in binary classification problems. 

A more appropriate form of the loss function is defined by the following equation:

### $$L(\hat{y},y) =  - ( y\log{\hat{y}} + (1 - y)\log(1 -\hat{y} ))$$

This form of a loss function is otherwise known as **Binary Cross-Entropy**

#### To minimize $L(a,y)$:
#### If y =1: $L(\hat{y},y) = -\log{\hat{y}}$ ; we want $\hat{y}$ to be LARGE for  $dL\dashrightarrow decrease$
#### If y =0-: $L(\hat{y},y) = -\log{(1-\hat{y})}$ ; we want $\hat{y}$ to be small for $dL\dashrightarrow decrease$

In [None]:
# define binary cross-entropy/Loss function L
def Cross_Entropy(y_hat, y):
    if y == 1:
        return -np.log(y_hat)
    else:
        return -np.log(1 - y_hat)

### 2.5. Cost Function

The Cost Function measures how well we are doing on the **ENTIRE TRAINING SET**. This is given by the following equation. 

$$J(W,b) =  - \frac{1}{m} \sum_{i=1}^{m} [( y^{(i)}\log{a^{(i)}} + (1 - y^{(i)})\log(1 -a^{(i)} ))$$

where $m$ denotes the number of training examples. 

### 2.6. Gradient Descent

By minimizing the **Cost Function** $J(W,b)$, i.e., we fine tune the values of weights $W$ and bias $b$ that help us achieve zero loss. This is what it means by **training the neural network**. 

It turns out that we are solving a optimization problem. An algorithm that solves this problem is called **Gradient Descent**.

It finds the **changes** in parameters $W$ and $b$ that minimizes the cost function $J$. 



```Repeat until convergence:```{

$$ w_{i}:=  w_{i} - \alpha \frac{dJ(w_{i})}{dw_{i}}$$}

$\alpha$ here is the learning rate. 


Information is propagated back to the input, hence the term **Backpropagation**. This is the **central algorithm** in deep learning.

<img src="./images/gdesc2.png" style="height: 300px;" align=center>

In [None]:
# and its derivative for later use
def derivative_sigmoid(x):
    return x*(1-x)

# and its derivative, for later use

def derivative_Cross_Entropy(y_hat, y):
    if y == 1:
        return -1/y_hat
    else:
        return 1 / (1 - y_hat)

## 3. Single Layer Neural Network from Scratch

In [None]:
epochs = 5
lr = 0.001 

In [None]:
L = []
for epoch in range(epochs):
    random_index = np.arange(X.shape[0])
    np.random.shuffle(random_index)
    
    loss = []
    for i in random_index:
        Z = weights*X[i] + bias
        y_hat = sigmoid(Z) 

        loss.append(Cross_Entropy(y_hat, y[i]))
        
        dEdW = '''Your code here'''
        dEdb = '''Your code here'''
        
        weights = '''Your code here'''
        bias = '''Your code here'''
        
    L.append(np.mean(loss))

In [None]:
# Create a new figure
plt.figure()
plt.xlabel("Epoch")
plt.ylabel("The Binary Cross-Entropy Values")
plt.plot(L)
plt.grid()
plt.show()

## 4. Summary of Steps in Training a Single Layer Neural Network for Binary Classification

**Feedforward** 
1. Draw a batch of training examples $X$.
2. Dot product of feature matrix $X$ and weight matrix $W$
3. Pass the result to an activation function, which in this case is a sigmoid function
4. The output from the sigmoid function is the predicted output

**Backpropagation**
1. Calculate the cost using the cost function $J$, that is the difference between the $y_{pred}$ and $y_{obs}$
2. Find $W$s and $b$ that minimizes the cost by doing Gradient Descent
3. Simultaneously update $W$s and $b$. 

## Important Notes

### Why Need a non-Linear Activation Function?

The purpose of activation function is to capture non-linearity.

Our neural network will collapse to a linear function if we don't use non-linear activation functions. 

<img src="./images/DL_6.png" style="height: 300px;" align=center>


### What really are Neural Networks?

#### They are a class of functions where we have simplier functions stacked on top of each other in heirarchicala manner (with non-linear functions in between) in order to make a more complex non-linear function.



<img src="./images/function_pile.png" style="height: 400px;" align=center>

# 5. Building Neural Networks in ```tf.keras```

Install ```tf.keras``` by running the following:

```!pip3 install --upgrade tensorflow```

### Solve MNIST data using Neural Networks with Hidden Layer

The problem we’re trying to solve here is to classify grayscale images of handwritten digits (28 × 28 pixels) into their 10 categories (0 through 9). You can think of “solving” MNIST as the “Hello World” of deep learning

In [None]:
import tensorflow as tf

# load the MNIST data. See how simple it is to implement in tf.keras
mnist = tf.keras.datasets.mnist

(X_train, y_train),(X_test, y_test) = mnist.load_data() # how easy to split the train and test sets

In [None]:

# let's see how an mnist image looks like
digit = X_test[0]
plt.imshow(digit, cmap=plt.cm.binary)

In [None]:
# by dividing we transfrom the pixel values to so it wall from [0,1]
X_train, X_test = X_train / 255.0, X_test / 255.0  

In [None]:
X_train.shape

In [None]:
y_train

In [None]:
X_test.shape

In [None]:
y_test

In [None]:
# Lets build our neural network!
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)), 
  tf.keras.layers.Dense(128, activation='relu'), 
  tf.keras.layers.Dense(10, activation='softmax')
])

<img src='images/relu_equation.png' style="height: 100px;" align=center>

In [None]:
# SC

# define the relu function
def relu(x):
    return '''Your Code Here'''

# define softmax function
def softmax(x):
    return '''Your Code Here'''

In [None]:
input_ = np.linspace(-10, 10, 100)
plt.plot(input_, relu(input_), c="r")
plt.title("ReLU Function")

In [None]:
input_ = np.linspace(-10, 10, 100)
plt.plot(input_, softmax(input_), c="r")
plt.title("Softmax Function")

In [None]:
model.compile(optimizer='adam', lr = 0.01, 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
model.fit(X_train, y_train, epochs=5, batchsize =1)

In [None]:
model.evaluate(X_test, y_test)

###  Vary neural networks hyperparameter

In [None]:
# SC

'''Your Code Here'''

In [None]:
model.fit(X_train, y_train, epochs=5)

In [None]:
model.evaluate(X_test, y_test)