<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/Logo.png?alt=media&token=06318ee3-d7a0-44a0-97ae-2c95f110e3ac" width="100" height="100" align="right"/>

## 4 Neural Networks in TensorFlow - Advanced Techniques

<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3Sup%2C%20Unsup%2C%20Rein.png?alt=media&token=4baee322-267b-4aab-b7b9-101b2c88685e" width="800" align="center"/>

## 4.1 Sequential Model

<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/1Keras.png?alt=media&token=9f4add09-14d3-49ed-bc11-f0497f6e96f1" width="200" height="200" align="right"/>

<font size="3">**Keras is a simple tool for constructing a neural network. It is a high-level API of TensorFlow 2:**</font> 

> <font size="3">**an approachable, highly-productive interface for solving machine learning problems, with a focus on modern deep learning.**</font>

<font size="3">**The core data structures of Keras are layers and models.**</font>

> <font size="3">**The simplest type of model is the <span style="color:#4285F4">Sequential model</span>, a linear stack of layers.**</font>

> <font size="3">**For more complex architectures, the Keras <span style="color:#4285F4">Functional API</span> should be used, which allows to build arbitrary graphs of layers, or write models entirely from scratch.**</font> 

### <font color='Orange'>*Sequential model - When to use*</font>

<font size="3">**A Sequential model is appropriate for**</font> 
> <font size="3">**<span style="color:#4285F4">a plain stack of layers</span> where each layer has <span style="color:#4285F4">exactly one input tensor and one output tensor</span>.**</font> 

<font size="3">**This is not appropriate when:**</font> 

> <font size="3">**Your model has <span style="color:#4285F4">multiple inputs or multiple outputs</span>**</font> <br>
> <font size="3">**Any of your layers has <span style="color:#4285F4">multiple inputs or multiple outputs</span>**</font> <br>
> <font size="3">**You need to do <span style="color:#4285F4">layer sharing</span>**</font><br>
> <font size="3">**You want <span style="color:#4285F4">non-linear topology</span> (e.g. a residual connection, a multi-branch model)**</font>

Reference: https://keras.io/guides/sequential_model/

### <font color='Orange'>*Sequential model - How to use*</font>

<font size="3">**You can create a <span style="color:#4285F4">Sequential model</span> by**</font> 
> <font size="3">**Passing a list of layers to a Sequential constructor**</font> 

> <font size="3">**<span style="background-color: #ECECEC; color:#0047bb">.add()</span> method to incrementally setup layers**</font> 

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>

<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="350" align="center"/>

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [None]:
model = Sequential(
    [
        Dense(2, activation='relu', input_shape=(3,)),
        Dense(1, activation='sigmoid'),
    ]
)
model.summary()

In [None]:
model = Sequential()
model.add(Dense(2, activation='relu', input_shape=(3,)))
model.add(Dense(1, activation ='sigmoid'))
model.summary()

<hr style="border:2px solid #E1F6FF"> </hr>

### <font color='Orange'>*Sequential model - Separated Activation Layer*</font>

<font size="3">**Keras also allows users to add <span style="color:#4285F4">Activation layer</span> separately.**</font>

<font size="3">**The models and functions are always the same. The only difference is the architecture.**</font>

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>
<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="100" align="right"/>

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Activation

In [None]:
model = Sequential(
    [
        Dense(2, input_shape=(3,)),
        Activation('relu'),
        Dense(1), 
        Activation('sigmoid'),
    ]
)
model.summary()

In [None]:
model = Sequential()
model.add(Dense(2, input_shape=(3,)))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.summary()

<hr style="border:2px solid #E1F6FF"> </hr>

<font size="3">**Activation function**</font>

><font size="3">**An activation function helps decide whether a neuron should be activated or not. That means it will help decide whether the neuron’s input to the network is important or not in the process of prediction using simpler mathematical operations.**</font>

><font size="3">**Therefore, the key role of an activation function is to <span style="color:#4285F4">derive output from a set of input values</span> fed to a node (or a layer).**</font>

## 4.2 Sequential Model - Attributes

### <font color='Orange'> Attributes of Layers</font>

<font size="3">**Layers are the basic building blocks of neural networks. A layer consists of**</font>
> <font size="3">**<span style="color:#4285F4">Tensor-in tensor-out computation function</span> - which performs a logic defined in the <span style="background-color: #ECECEC; color:#0047bb">call()</span> of applying the layer to the input tensors and returns output tensors**</font>

> <font size="3">**<span style="color:#4285F4">State</span> - which represents the weights of the layers and is updated when the layer receives data during training, and stored in <span style="background-color: #ECECEC; color:#0047bb">layer.weights</span>**</font>

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>
<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="100" align="right"/>

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Activation

In [None]:
model = Sequential()
model.add(Dense(2, activation='relu', input_shape=(3,)))
model.add(Dense(1, activation ='sigmoid'))
model.summary()

<font size="3">**A <span style="background-color: #ECECEC; color:#0047bb">layers</span> instance is callable, much like a function. It returns a list containing the information of**</font>
><font size="3">**Layer's name**</font>

><font size="3">**Layer's address**</font>

In [None]:
model.layers

<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/4NNTechniques1.png?alt=media&token=41b1d525-1299-4634-a8ac-8eb128a4936c" width="550" align="center"/>

<font size="3">**Since it is a list, indexing is allowable.**</font>

In [None]:
print("This is the first layer's address:", model.layers[0])
print("This is the second layer's address:", model.layers[1])

print("This is the first layer's name:", model.layers[0].name)
print("This is the second layer's name:", model.layers[1].name)

<font size="3">**Layers can be renamed by adding an argument <span style="color:#4285F4">name</span>.**</font>

In [None]:
model = Sequential()
model.add(Dense(2, activation='relu', input_shape=(3,), name='First_Layer'))
model.add(Dense(1, activation ='sigmoid',  name='Second_Layer'))
model.summary()

In [None]:
print("This is the first layer's name:", model.layers[0].name)
print("This is the second layer's name:", model.layers[1].name)

<font size="3">**Layers can be retrieved with the use of <span style="background-color: #ECECEC; color:#0047bb">get_layer()</span> function.**</font>

In [None]:
print("Using INDEX to retrieve layer", model.get_layer(index=0))

print("Using NAME to retrieve layer", model.get_layer(name='First_Layer'))
print("Using NAME to retrieve layer", model.get_layer('First_Layer'))

<hr style="border:2px solid #E1F6FF"> </hr>

### <font color='Orange'> Attributes of Inputs and Outputs</font>

<font size="3">**<span style="color:#4285F4">Model</span> groups layers into an object for training and inference features. Besides hidden layer, there are two specific layers:**</font>
> <font size="3">**<span style="color:#4285F4">Input Layer</span> - which serves as an entry point into a neural network and is callable by <span style="background-color: #ECECEC; color:#0047bb">.inputs</span>**</font>

> <font size="3">**<span style="color:#4285F4">Output Layer</span> - which serves as an exit point of a neural network and is callable by <span style="background-color: #ECECEC; color:#0047bb">.outputs</span>**</font>

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>
<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="100" align="right"/>

In [None]:
print(model.inputs)

In [None]:
print(model.outputs)

<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/4NNTechniques2.png?alt=media&token=b1191dbc-411c-46c3-a16f-f3b0db892108" width="600" align="center"/>

<hr style="border:2px solid #E1F6FF"> </hr>

<font size="5"><span style="background-color:#EA4335; color:white">&nbsp;!&nbsp;</span></font>
<font size="3">**<span style="background-color: #ECECEC; color:#0047bb">.add()</span> method can be used to incrementally setup layers, starting from first layer. There is also a corresponding <span style="background-color: #ECECEC; color:#0047bb">.pop()</span> method to remove layers, starting from last layer.**</font> 

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>
<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="100" align="right"/>

In [None]:
print("No. of layers:", len(model.layers))

In [None]:
model.pop()
print("No. of layers:", len(model.layers))

In [None]:
model.add(Dense(1, activation ='sigmoid',  name='Second_Layer'))
print("No. of layers:", len(model.layers))

<hr style="border:2px solid #E1F6FF"> </hr>

## 4.3 Sequential Model - Save and load models

### <font color='Orange'>Whole-model saving & loading</font>
<font size="3">**Model can be saved completely to a single file. It will include:**</font>
> <font size="3">**The model's architecture/config**</font><br>

> <font size="3">**The model's weight values (which were learned during training)**</font><br>

> <font size="3">**The model's compilation information, if <span style="background-color: #ECECEC; color:#0047bb">.compile()</span> is called**</font><br>

> <font size="3">**The optimizer and its state (this enables users to restart training)**</font><br>

### <font color='Orange'>APIs</font>
<font size="3">**There are two formats you can use to save an entire model to disk:**</font><br>
> <font size="3">**the <span style="color:#4285F4">TensorFlow SavedModel format</span>**</font>

> <font size="3">**the older <span style="color:#4285F4">Keras H5 format</span>**</font>

<font size="3">**<span style="color:#4285F4">SavedModel</span> is the recommended format. It is the more comprehensive save format that saves the model architecture, weights, and the traced Tensorflow subgraphs of the call functions. This enables Keras to restore both built-in layers as well as custom objects.**</font><br>

<font size="3">**There are two APIs that can be used to save the models:**</font><br>
> <font size="3">**<span style="background-color: #ECECEC; color:#0047bb">model.save()</span>**</font>
    
> <font size="3">**<span style="background-color: #ECECEC; color:#0047bb">tf.keras.models.save_model()</span>**</font>

<font size="3">**By default, the API saves model in <span style="color:#4285F4">SavedModel format</span> when <span style="background-color: #ECECEC; color:#0047bb">model.save()</span> is used. In that case, to switch to <span style="color:#4285F4">Keras H5 format</span>, either:**</font> <br>

> <font size="3">**Passing <span style="background-color: #ECECEC; color:#0047bb">save_format='h5'</span> to <span style="background-color: #ECECEC; color:#0047bb">.save()</span>; or**</font>
    
> <font size="3">**Passing a filename that ends in <span style="background-color: #ECECEC; color:#0047bb">.h5</span> or <span style="background-color: #ECECEC; color:#0047bb">.keras</span> to <span style="background-color: #ECECEC; color:#0047bb">.save()</span>**</font>

Reference: https://www.tensorflow.org/guide/keras/save_and_serialize#:~:text=There%20are%20two%20formats%20you,you%20use%20model.save()%20.

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>
<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="100" align="right"/>

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np

In [None]:
X = np.random.random((100, 3))
y = np.random.random((100, 1))

In [None]:
model = Sequential()
model.add(Dense(2, activation='relu', input_shape=(3,)))
model.add(Dense(1, activation ='sigmoid'))
model.compile(optimizer="adam", loss="mse")

In [None]:
model.fit(X, y)

<font size="3">**Calling <span style="background-color: #ECECEC; color:#0047bb">model.save()</span> creates a folder named my_model and saves model in <span style="color:#4285F4">SavedModel format</span>**</font>

In [None]:
model.save("my_model")

<font size="3">**It creates two folders and one file:**</font>

> <font size="3">**<span style="color:#4285F4">assets</span> - which stores arbitrary files, called assets, that are needed for TensorFlow graph**</font>

> <font size="3">**<span style="color:#4285F4">variables</span> - which stores weights**</font>

> <font size="3">**<span style="color:#4285F4">saved_model.pb</span> - which stores the model architecture, and training configuration (including the optimizer, losses, and metrics)**</font>

In [None]:
import os
import glob

for f in glob.glob(os.path.abspath(os.getcwd())+'\\my_model\\*'):
    print(f)

<font size="3">**Once the model is saved, <span style="background-color: #ECECEC; color:#0047bb">load_model()</span> can be used to reconstruct the model identically.**</font>

In [None]:
from tensorflow.keras.models import load_model

reconstructed_model = load_model("my_model")

<font size="3">**The reconstructed model is already compiled and has retained the weights, model architecture and training configuration, so training can resume:**</font>

In [None]:
reconstructed_model.fit(X, y)

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>
<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="100" align="right"/>


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np

In [None]:
X = np.random.random((100, 3))
y = np.random.random((100, 1))

In [None]:
model = Sequential()
model.add(Dense(2, activation='relu', input_shape=(3,)))
model.add(Dense(1, activation ='sigmoid'))
model.compile(optimizer="adam", loss="mse")

In [None]:
model.fit(X, y)

<font size="3">**Keras also supports saving a single <span style="color:#4285F4">HDF5</span> file which is a light-weight alternative to <span style="color:#4285F4">SavedModel format</span>.**</font>

In [None]:
model.save("my_model_hdf5.h5")

<font size="3">**It creates only one file:**</font>

> <font size="3">**<span style="color:#4285F4">h5</span> - which contains the model's architecture, weights values, and <span style="background-color: #ECECEC; color:#0047bb">compile()</span> information**</font>

In [None]:
import os
import glob

for f in glob.glob(os.path.abspath(os.getcwd())+'\\*.h5'):
    print(f)

<font size="3">**Similar to <span style="color:#4285F4">SavedModel format</span>, once the model is saved, <span style="background-color: #ECECEC; color:#0047bb">load_model()</span> can be used to reconstruct the model identically.**</font>

In [None]:
from tensorflow.keras.models import load_model

reconstructed_model_h5 = load_model("my_model_hdf5.h5")

<font size="3">**The reconstructed model is already compiled and has retained the weights, model architecture and training configuration, so training can resume:**</font>

In [None]:
reconstructed_model_h5.fit(X, y)

<font size="3">**Limitations**</font>
><font size="3">**Compared to the <span style="color:#4285F4">SavedModel format</span>, there are two things that don't get included in the <span style="color:#4285F4">H5</span> file: 1) <span style="color:#4285F4">External losses & metrics</span> and 2) <span style="color:#4285F4">Computation graph of custom objects</span>**</font>

Reference: https://www.tensorflow.org/guide/keras/save_and_serialize

## 4.4 Sequential Model - Compile()

<font size="3">**<span style="background-color: #ECECEC; color:#0047bb">.compile()</span> allows for different arguments. It is used to configure the model for training. The most important arguments are:**</font>

>> <font size="3">**1<sup>st</sup> argument: <span style="color:#4285F4">Optimizer</span>**</font><br>
<br>
>> <font size="3">**2<sup>nd</sup> argument: <span style="color:#4285F4">Loss function</span>**</font><br>
<br>
>> <font size="3">**3<sup>rd</sup> argument: <span style="color:#4285F4">Metrics</span>**</font>

<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN10.png?alt=media&token=9223446e-8108-4082-b9c9-225018f9f54e" width="550" align="center"/>

<font size="3">**<span style="color:#4285F4">Loss Function</span>**</font>
> <font size="3">**Once the neural network architecture is setup and added into the Sequential model object, samples are <span style="color:#4285F4">foward propagated</span> and the corresponding estimates, $\hat{y}$ are calculated.**</font><br>

> <font size="3">**<span style="color:#4285F4">Loss function</span> is then applied to estimate the <span style="color:#4285F4">loss values</span> between the true values (i.e. Labels, y) and predicted values (i.e. Estimates, $\hat{y}$).**</font>

<font size="3">**<span style="color:#4285F4">Optimizer</span>**</font>
> <font size="3">**Based on the <span style="color:#4285F4">loss values</span>, <span style="color:#4285F4">optimizer backward propagates</span> and calculates the <span style="color:#4285F4">gradients</span> w.r.t weights, W and bias, b.**</font>

><font size="3">**The training will be stopped either when:**</font>
>> <font size="3">**The maximum number of epochs in <span style="background-color: #ECECEC; color:#0047bb">.fit()</span> function is reached; OR**</font><br>
>> <font size="3">**A monitored quantity <span style="background-color: #ECECEC; color:#0047bb">.EarlyStopping()</span> function has stopped improving.**</font>

<font size="3">**<span style="color:#4285F4">Metrics</span>**</font>
> <font size="3">**A metric is an addition evaluation function that is used to judge the performance of the model**</font>

> <font size="3">**Metric functions are similar to loss functions, except that the results from evaluating a metric are not used when training the model. Therefore, any loss function can also be used as a metric**</font>

> <font size="3">**The main reason is because it is difficult to judge the performance based on loass values, such as mean squared error (MSE) and root mean squared error (RMSE). Therefore, sometimes, an extra metric, such as accuracy and mean absolute error (MAE), is used for additional evaluation.**</font>

<hr style="border:2px solid #34A853"> </hr>

### <font color='#34A853'> Frequently Used Optimizers </font>

<font size="3">**Almost all popular optimizers in deep learning are based on <span style="color:#4285F4">gradient descent</span> which estimates the slope of a given <span style="color:#4285F4">loss function</span> and update the parameters  towards a supposed global minimum.**</font>

<font size="3">**There are three different types of <span style="color:#4285F4">gradient descent</span>:**</font>
><font size="3">**Batch Gradient Descent or Vanilla Gradient Descent** - The <span style="color:#4285F4">entire dataset</span> are used to compute the gradient of the cost function for each iteration of the <span style="color:#4285F4">gradient descent</span> and then update the parameters.</font>

><font size="3">**Stochastic Gradient Descent** - A <span style="color:#4285F4">single sample</span> is randomly picked and used to compute the gradient of the cost function for each iteration of the <span style="color:#4285F4">gradient descent</span> and then update the parameters.</font>

><font size="3">**Mini batch Gradient Descent** - This is a variation of stochastic gradient descent. A <span style="color:#4285F4">mini batch of samples</span> is randomly picked and used to compute the gradient of the cost function for each iteration of the <span style="color:#4285F4">gradient descent</span> and then update the parameters.</font>

<font size="3">**Let's denote by**</font>
><font size="3">**<span style="color:#4285F4">w</span> the parameters**</font><br>
><font size="3">**<span style="color:#4285F4">g</span> the gradients**</font><br>
><font size="3">**<span style="color:#4285F4">α</span> the global learning rate**</font><br>
><font size="3">**<span style="color:#4285F4">t</span> the time step**</font>

### <font color='Orange'>*Stochastic Gradient Descent (SGD)*</font>

<font size="3">**In Keras, <span style="background-color: #ECECEC; color:#0047bb">.SGD()</span> function is applying <span style="color:#4285F4">mini batch gradient descent</span>. The optimizer estimates the direction of steepest descent based on <span style="color:#4285F4">batch_size</span> defined in <span style="background-color: #ECECEC; color:#0047bb">.fit()</span> function and takes a step in this direction. Since the step size is fixed, SGD can quickly get stuck on plateaus or in local minima.**</font>

<font size="3">***Update Rule:***</font> <img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/4NNEquation1.png?alt=media&token=7f9515ce-1817-48ac-9354-3a088ff06ad4" width="550" align="center" style="float: middle"/>

### <font color='Orange'>*SGD with Momentum*</font>

<font size="3">**With <span style="color:#4285F4">momentum</span>, SGD accelerates in directions of constant descent. The acceleration is defined by <span style="color:#4285F4">momentum term β, $<$ 1</span>, which helps  the model escape plateaus and makes it less susceptible to getting stuck in local minima.**</font>
    
<font size="3">***Update Rule:***</font> <img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/4NNEquation2.png?alt=media&token=cd5b2bbc-987d-4a02-8dbb-be4b35f9443b" width="550" align="center" style="float: middle"/>

<font size="5"><span style="background-color:#EA4335; color:white">&nbsp;!&nbsp;</span></font>
<font size="3">**In Keras, <span style="background-color: #ECECEC; color:#0047bb">.SGD()</span> functions combine SGD without and with Momentum. When <span style="color:#4285F4">momentum</span> is larger than 0, <span style="background-color: #ECECEC; color:#0047bb">.SGD()</span> functions will update gradients with velocity equation.**</font> 

### <font color='Orange'>*AdaGrad*</font>

<font size="3">**The challenge of using <span style="color:#4285F4">learning rate α</span> is that their hyperparameters have to be defined in advance and they depend heavily on the type of model and problem. Another problem is that the same learning rate is applied to all parameter updates. If data is sparse, it is better to be updated the parameters at different rates.**</font>
    
<font size="3">**AdaGrad makes use of <span style="color:#4285F4">adaptive learning</span> rates to address the problem. It scales the <span style="color:#4285F4">learning rate α</span> for each parameter based on the square root of the inverse sum of the squared gradients. This method scales sparse gradient direction up which allows for larger steps in such directions, and results a faster convergence in problems with sparse features.**</font>

<font size="3">***Update Rule:***</font> <img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/4NNEquation3.png?alt=media&token=8766e128-1da2-4499-b91f-2b183c7e9b06" width="550" align="center" style="float: middle"/>

### <font color='Orange'>*RMSprop*</font>

<font size="3">**Adagrad's main problem is its accumulation of the squared gradients in the denominator. The accumulated sum keeps growing during training. This in turn causes the learning rate to shrink and eventually become infinitesimally small, at which point the algorithm is no longer able to acquire additional knowledge.**</font>
    
<font size="3">**To solve the radically diminishing learning rates, RMSprop scales the gradient in a less aggressive way. Instead of taking the <span style="color:#4285F4">sum of squared gradients</span>, it takes a <span style="color:#4285F4">moving average of the squared gradients</span>.**</font>

<font size="3">**RMSprop is often combined with <span style="color:#4285F4">momentum</span> which helps the model escape plateaus and makes it less susceptible to getting stuck in local minima.**</font>

<font size="3">***Update Rule:***</font> <img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/4NNEquation4.png?alt=media&token=609c4197-3dae-447d-a6c9-aea9d754ff24" width="550" align="center" style="float: middle"/>

### <font color='Orange'>*Adam*</font>

<font size="3">**Adaptive Moment Estimation (Adam) combines AdaGrad, RMSprop and momentum methods into one.**</font>
    
<font size="3">**The direction of the step is determined by a <span style="color:#4285F4">moving average of the gradients</span> and the step size is approximately upper bounded by the global step size. Furthermore, each dimension of the gradient is rescaled similar to RMSprop.**</font>
    
<font size="3">**One key difference between Adam and RMSprop/AdaGrad is that the moment estimates m and v are corrected for their bias towards zero. Adam is well-known for achieving good performance with little hyper-parameter tuning.**</font>

<font size="3">***Update Rule:***</font> <img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/4NNEquation5.png?alt=media&token=62cfa064-f228-4d7d-8692-fa2385a712c0" width="550" align="center" style="float: middle"/>

### <font color='#34A853'> Which optimizer to use? </font>

><font size="3">**As a rule of thumb, if input data is <span style="color:#4285F4">sparse</span>, then using one of the <span style="color:#4285F4">adaptive learning-rate</span> methods is likely to provide a better result.**</font>

><font size="3">**However, if you have the <span style="color:#4285F4">resources</span> to find a good learning rate schedule, <span style="color:#4285F4">SGD with momentum</span> is a solid choice.**</font>

><font size="3">**<span style="color:#4285F4">RMSprop</span> is an extension of <span style="color:#4285F4">AdaGrad</span> that deals with its <span style="color:#4285F4">radically diminishing learning rates</span>. Therefore, in general, RMSprop is a better choice.**</font>
    
><font size="3">**<span style="color:#4285F4">Adam</span> adds <span style="color:#4285F4">bias-correction and momentum</span> to RMSprop. Its bias-correction helps Adam slightly outperform RMSprop towards the end of optimization as gradients become sparser. Insofar, Adam might be the best overall choice.**</font>

References:<br>
https://www.lightly.ai/post/which-optimizer-should-i-use-for-my-machine-learning-project<br>
https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1<br>
https://ruder.io/optimizing-gradient-descent/index.html#adagrad



<hr style="border:2px solid #34A853"> </hr>

### <font color='#34A853'> Frequently Used Loss Functions </font>

<font size="3">**The purpose of loss functions is to compute the quantity that a model should seek to minimize during training.**</font>

<font size="3">**There are three major categories of loss functions:**</font>

><font size="3">**<span style="color:#4285F4">Probabilistic Losses</span>**</font>

><font size="3">**<span style="color:#4285F4">Regression Losses</span>**</font>

><font size="3">**<span style="color:#4285F4">Hinge Losses for "maximum-margine" classification</span>**</font>

### <font color='Orange'>*Probabilistic Losses - Binary Crossentropy*</font>

<font size="3">**<span style="color:#4285F4">Binary Crossentropy</span> computes the crossentropy loss between true labels and predicted labels.**</font>

<font size="3">**It is recommended to use this crossentropy function for <span style="color:#4285F4">binary (0 or 1) classification problem</span>. The loss function requires the following inputs:**</font>

><font size="3">**<span style="color:#4285F4">y_true</span> (true label): This is either 0 or 1.**</font>

><font size="3">**<span style="color:#4285F4">y_pred</span> (predicted value): This is the model's prediction which either represents**</font> 
>><font size="3">**a <span style="color:#4285F4">logit</span> (i.e, value in [-inf, inf] when <span style="background-color: #ECECEC; color:#0047bb">from_logits=True</span>), or**</font><br>
>><font size="3">**a <span style="color:#4285F4">probability</span> (i.e, value in [0., 1.] when <span style="background-color: #ECECEC; color:#0047bb">from_logits=False</span>).**</font>

<font size="5"><span style="background-color:#EA4335; color:white">&nbsp;!&nbsp;</span></font> <font size="3">**It is always recommended to apply <span style="background-color: #ECECEC; color:#0047bb">from_logits=True</span>**</font>

### <font color='Orange'>*Probabilistic Losses - Categorical Crossentropy & Sparse CategoricalCrossentropy*</font>

<font size="3">**<span style="color:#4285F4">Categorical Crossentropy</span> and <span style="color:#4285F4">SparseCategoricalCrossentropy</span> compute the crossentropy loss between true labels and predicted labels.**</font>

<font size="3">**It is recommended to use this crossentropy function for <span style="color:#4285F4">two or more classes classification problem</span>.**</font> 

<font size="3">**Both loss functions compute categorical crossentropy. The only difference is in how the labels are encoded.**</font>

> <font size="3">**For <span style="color:#4285F4">one hot</span> representation, <span style="color:#4285F4">CatergoricalCrossentropy</span> can be used.**</font>

> <font size="3">**For labels as integers (i.e. 0, 1, 2), <span style="color:#4285F4">SparseCategoricalCrossentropy</span> can be used.**</font>

<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/4NNLossFunction.png?alt=media&token=8ad3c091-9925-43de-adb4-07b87f9e572a" width="700" align="center" style="float: middle"/>

### <font color='Orange'>*Regression Losses - Mean Squared Error*</font>

<font size="3">**<span style="color:#4285F4">Mean Squared Error</span> computes the mean of squares of errors between true labels and predicted labels.**</font>

<font size="3">**It is recommended to use this function for <span style="color:#4285F4">regression problem</span>. The loss function is simply:**</font>

><font size="3">**<span style="color:#4285F4">loss = (y_true - y_pred)<sup>2</sup></span>**</font>


### <font color='Orange'>*Regression Losses - Mean Absolute Error*</font>

<font size="3">**<span style="color:#4285F4">Mean Absolute Error</span> computes the mean of absolute difference between true labels and predicted labels.**</font>

<font size="3">**<span style="color:#4285F4">MAE</span> is very often applied in <span style="color:#4285F4">metrics</span>, rather than <span style="color:#4285F4">loss function</span>. If applied, it is used for <span style="color:#4285F4">regression problem</span>. The loss function is:**</font>

><font size="3">**<span style="color:#4285F4">loss = abs(y_true - y_pred)</span>**</font>

### <font color='Orange'>*Regression Losses - Mean Absolute Percentage Error*</font>

<font size="3">**<span style="color:#4285F4">Mean Absolute Percentage Error</span> computes the mean absolute percentage error between true labels and predicted labels.**</font>

<font size="3">**<span style="color:#4285F4">MAPE</span> is also very often applied in <span style="color:#4285F4">metrics</span>, rather than <span style="color:#4285F4">loss function</span>. If applied, it is used for <span style="color:#4285F4">regression problem</span>. The loss function is:**</font>

><font size="3">**<span style="color:#4285F4">loss = 100 * abs(y_true - y_pred) / y_true</span>**</font>


### <font color='#34A853'> Which loss function to use? </font>

<font size="3">**The <span style="color:#4285F4">output layer's activation function</span> determines the <span style="color:#4285F4">output values</span>, while the <span style="color:#4285F4">loss function</span> evaluates the <span style="color:#4285F4">loss values</span> based on the difference between the output values and labels.**</font> 
    
<font size="3">**Therefore, it is always important to combine the <span style="color:#4285F4">output layer's activation function</span> and <span style="color:#4285F4">loss function</span> according to the <span style="color:#4285F4">problem type</span> .**</font>

<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN11.png?alt=media&token=9d57d341-c9ad-4126-918e-526bde571a1b" width="950" align="left"/>

### <font color='#34A853'> How to create custom losses? </font>

<font size="3">**Keras allows to create custom losses. Any callable that returns an array of losses can be passed to <span style="background-color: #ECECEC; color:#0047bb">compile()</span> as a loss.**</font>

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>
<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="100" align="right"/>


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import tensorflow as tf
import numpy as np

In [None]:
X = np.random.random((100, 3))
y = np.random.random((100, 1))

In [None]:
model=Sequential()
model.add(Dense(2, activation='relu', input_shape=(3,)))
model.add(Dense(1, activation ='sigmoid'))

<font size="3">**A custom loss function can be created by defining a function that takes the true values and predicted values as required parameters. The function should return an array of losses.**</font>

In [None]:
def custom_loss_function(y_true, y_pred):
    squared_difference = tf.square(y_true - y_pred)
    return tf.reduce_mean(squared_difference, axis=-1)

<font size="3">**The function can then be passed to <span style="background-color: #ECECEC; color:#0047bb">compile()</span> as a loss.**</font>

In [None]:
model.compile(optimizer="adam", loss=custom_loss_function)

In [None]:
model.fit(X, y)

<font size="3">**Here is just a comparison between the <span style="color:#4285F4">custom MSE</span> and <span style="color:#4285F4">built-in MSE</span>.**</font>

In [None]:
model.compile(optimizer="adam", loss="mse")
model.fit(X, y)

References:<br>
https://analyticsindiamag.com/ultimate-guide-to-loss-functions-in-tensorflow-keras-api-with-python-implementation/<br>
https://keras.io/api/losses/

<hr style="border:2px solid #34A853"> </hr>

### <font color='#34A853'> Frequently Used Metrics </font>

<font size="3">**To compile a model, a <span style="color:#4285F4">loss function</span> and an <span style="color:#4285F4">optimizer</span> are needed to be specified. Optionally, some <span style="color:#4285F4">metrics</span> can also be specified to judge the performance of a model.**</font>

<font size="3">**<span style="color:#4285F4">Metric functions</span>  are similar to <span style="color:#4285F4">loss functions</span>, except that the results from evaluating a metric are not used when training the model.**</font>

<font size="3">**There are six categories of metrics:**</font>

><font size="3">**<span style="color:#4285F4">Accuracy metrics</span>**</font>

><font size="3">**<span style="color:#4285F4">Probabilistic metrics</span>**</font>

><font size="3">**<span style="color:#4285F4">Regression metrics</span>**</font>

><font size="3">**<span style="color:#4285F4">Classification metrics based on True/False positives & negatives</span>**</font>

><font size="3">**<span style="color:#4285F4">Image segmentation metrics</span>**</font>

><font size="3">**<span style="color:#4285F4">Hinge metrics for "maximum-margin" classification</span>**</font>

### <font color='Orange'>*Accuracy metrics - Accuracy*</font>

<font size="3">**<span style="color:#4285F4">Accuracy</span> calculates how often predictions equal <span style="color:#4285F4">labels</span>.**</font>

<font size="3">**This metric creates two local variables, <span style="color:#4285F4">total</span> and <span style="color:#4285F4">count</span> that are used to compute the <span style="color:#4285F4">frequency</span> with which <span style="color:#4285F4">y_pred</span> matches <span style="color:#4285F4">y_true</span>.**</font>

In [None]:
y_true = np.array([[0],[0],[1],[1],[1]]) 

y_pred = np.array([[0.0],[0.2],[0.4],[0.95],[1.0]]) 

metric = tf.keras.metrics.Accuracy()
metric.update_state(y_true, y_pred)

print('Accuracy:', round(metric.result().numpy()*100), '%')

### <font color='Orange'>*Accuracy metrics - Binary Accuracy*</font>

<font size="3">**<span style="color:#4285F4">Binary accuracy</span> calculates how often predictions match <span style="color:#4285F4">binary labels</span>.**</font>

<font size="3">**This metric creates two local variables, <span style="color:#4285F4">total</span> and <span style="color:#4285F4">count</span> that are used to compute the <span style="color:#4285F4">frequency</span> with which <span style="color:#4285F4">y_pred</span> matches <span style="color:#4285F4">y_true</span>.**</font>

In [None]:
y_true = np.array([[0],[0],[1],[1],[1]]) 

y_pred = np.array([[0.0],[0.2],[0.4],[0.95],[1.0]]) 

metric = tf.keras.metrics.BinaryAccuracy()
metric.update_state(y_true, y_pred)

print('Accuracy:', round(metric.result().numpy()*100), '%')

<font size="3">**By default, <span style="color:#4285F4">threshold</span> of <span style="background-color: #ECECEC; color:#0047bb">BinaryAccuracy()</span> is <span style="color:#4285F4">0.5</span>, i.e.**</font>

><font size="3">**If <span style="color:#4285F4">y_pred</span> $>$ <span style="color:#4285F4">0.5</span>, set value to 1.0**</font>

><font size="3">**If <span style="color:#4285F4">y_pred</span> $<$ <span style="color:#4285F4">0.5</span>, set value to 0.0**</font>

<font size="3">**Therefore, in the example, <span style="color:#4285F4">y_pred</span> becomes**</font>

In [None]:
y_pred = np.array([[0.0],[0.0],[0.0],[1.0],[1.0]])

metric = tf.keras.metrics.BinaryAccuracy()
metric.update_state(y_true, y_pred)

print('Accuracy:', round(metric.result().numpy()*100), '%')

### <font color='Orange'>*Accuracy metrics - Categorical Accuracy*</font>

<font size="3">**<span style="color:#4285F4">Categorical accuracy</span> calculates how often predictions match <span style="color:#4285F4">one hot labels</span>.**</font>

<font size="3">**This metric creates two local variables, <span style="color:#4285F4">total</span> and <span style="color:#4285F4">count</span> that are used to compute the <span style="color:#4285F4">frequency</span> with which <span style="color:#4285F4">y_pred</span> matches <span style="color:#4285F4">y_true</span>.**</font>

In [None]:
y_true = np.array([[0,   0,   0,   0,   1],
                   [0,   0,   0,   1,   0],
                   [0,   0,   1,   0,   0],
                   [0,   1,   0,   0,   0],
                   [1,   0,   0,   0,   0]]) 

y_pred = np.array([[0,   0,   0,   1,   0],
                   [0,   0,   0,   1,   0],
                   [0.1, 0.2, 0.6, 0.0, 0.1],
                   [0.1, 0.9, 0,   0,   0],
                   [0.5, 0.2, 0.1, 0.1, 0.1]]) 

metric = tf.keras.metrics.CategoricalAccuracy()
metric.update_state(y_true, y_pred)

print('Accuracy:', round(metric.result().numpy()*100), '%')

<font size="3">**The calculation involves two steps:**</font>

><font size="3">**1) An index at which the <span style="color:#4285F4">maxiumum value</span> occurs will be identified with the use of <span style="background-color: #ECECEC; color:#0047bb">argmax()</span>.**</font>

><font size="3">**2) If it is the same for both <span style="color:#4285F4">y_pred</span> and <span style="color:#4285F4">y_true</span>, it is considered accurate.**</font>

<font size="3">**The logic is like:**</font>

In [None]:
y_true_maxpo = np.argmax(y_true, axis=1)

print(y_true_maxpo)

In [None]:
y_pred_maxpo = np.argmax(y_pred, axis=1)

print(y_pred_maxpo)

In [None]:
Accuracy = sum(y_true_maxpo==y_pred_maxpo)/len(y_pred_maxpo)*100

print('Accuracy:', Accuracy, '%')

### <font color='Orange'>*Accuracy metrics - Sparse Categorical Accuracy*</font>

<font size="3">**<span style="color:#4285F4">Sparse Categorical accuracy</span> calculates how often predictions match <span style="color:#4285F4">integer labels</span>.**</font>

<font size="3">**This metric creates two local variables, <span style="color:#4285F4">total</span> and <span style="color:#4285F4">count</span> that are used to compute the <span style="color:#4285F4">frequency</span> with which <span style="color:#4285F4">y_pred</span> matches <span style="color:#4285F4">y_true</span>.**</font>

In [None]:
y_true = np.array([[0],
                   [1],
                   [2],
                   [3]]) 

y_pred = np.array([[1,   0,   0,   0],
                   [0,   0,   0,   1],
                   [0.1, 0.2, 0.7, 0.0],
                   [0.1, 0.0, 0.0, 0.9]]) 

metric = tf.keras.metrics.SparseCategoricalAccuracy()
metric.update_state(y_true, y_pred)

print('Accuracy:', round(metric.result().numpy()*100), '%')

<font size="3">**The calculation involves two steps:**</font>

><font size="3">**1) An index at which the <span style="color:#4285F4">maxiumum value in y_pred</span> occurs will be identified with the use of <span style="background-color: #ECECEC; color:#0047bb">argmax()</span>.**</font>

><font size="3">**2) If the index is same as <span style="color:#4285F4">y_true</span>, it is considered accurate.**</font>

<font size="3">**The logic is like:**</font>

In [None]:
y_pred_maxpo = np.argmax(y_pred, axis=1)

print(y_pred_maxpo)

In [None]:
y_true_integer = y_true.flatten()

print(y_true_integer)

In [None]:
Accuracy = sum(y_true_integer==y_pred_maxpo)/len(y_pred_maxpo)*100

print('Accuracy:', Accuracy, '%')

### <font color='Orange'>*Accuracy metrics - TopK Categorical Accuracy*</font>

<font size="3">**<span style="color:#4285F4">TopK Categorical accuracy</span> calculates how often <span style="color:#4285F4">one hot targets</span> are in the top <span style="color:#4285F4">K predictions</span>.**</font>

><font size="3">**<span style="color:#4285F4">y_pred</span> is firstly ranked in the descending order of probability values.**</font>

><font size="3">**If <span style="color:#4285F4">y_pred</span> present in the <span style="color:#4285F4">index of non-zero y_true</span> is less than or equal to K, it is considered accurate.**</font>

In [None]:
y_true = np.array([[0,   0,   0,   0,   1],
                   [0,   0,   0,   1,   0],
                   [0,   0,   1,   0,   0],
                   [0,   1,   0,   0,   0],
                   [1,   0,   0,   0,   0]]) 

y_pred = np.array([[0,   0,   0,   1,   0],
                   [0,   0,   0,   1,   0],
                   [0.1, 0.6, 0.3, 0.0, 0.1],
                   [0.1, 0.9, 0,   0,   0],
                   [0.2, 0.4, 0.3, 0, 0.1]]) 

In [None]:
kTop=1

metric = tf.keras.metrics.TopKCategoricalAccuracy(k=kTop)
metric.update_state(y_true,y_pred)

print('Accuracy:', round(metric.result().numpy()*100), '%')

In [None]:
kTop=2

metric = tf.keras.metrics.TopKCategoricalAccuracy(k=kTop)
metric.update_state(y_true,y_pred)

print('Accuracy:', round(metric.result().numpy()*100), '%')

In [None]:
kTop=3

metric = tf.keras.metrics.TopKCategoricalAccuracy(k=kTop)
metric.update_state(y_true,y_pred)

print('Accuracy:', round(metric.result().numpy()*100), '%')

<font size="3">**The logic is like:**</font>

><font size="3">**1) Rank the predictions**</font>

In [None]:
from scipy.stats import rankdata

rankpos = lambda x : (len(x)+1) - rankdata(x).astype(int)
y_pred_rank = np.array([rankpos(row) for row in y_pred])

print(y_pred_rank)

><font size="3">**2) Identify the <span style="color:#4285F4">index of non-zero y_true</span>**</font>

In [None]:
index_y_true = y_true == 1

print(index_y_true)

><font size="3">**3) Identify the ranks according to the <span style="color:#4285F4">index of non-zero y_true</span>**</font>

In [None]:
y_true_pred_rank = y_pred_rank[index_y_true]

print(y_true_pred_rank)

><font size="3">**4) Assign K value as thershold**</font><br>
><font size="3">**5) Count how many ranks are higher than the K value and calculate the accuracy**</font>

In [None]:
kTop = 1

Accuracy = np.sum(y_true_pred_rank <= kTop)/len(y_true_pred_rank)*100

print('Accuracy:', Accuracy, '%')

In [None]:
kTop = 2

Accuracy = np.sum(y_true_pred_rank <= kTop)/len(y_true_pred_rank)*100

print('Accuracy:', Accuracy, '%')

In [None]:
kTop = 3

Accuracy = np.sum(y_true_pred_rank <= kTop)/len(y_true_pred_rank)*100

print('Accuracy:', Accuracy, '%')

### <font color='Orange'>*Accuracy metrics - Sparse TopK Categorical Accuracy*</font>

<font size="3">**<span style="color:#4285F4">Sparse TopK Categorical accuracy</span> calculates how often <span style="color:#4285F4">integer targets</span> are in the top <span style="color:#4285F4">K predictions</span>.**</font>

><font size="3">**<span style="color:#4285F4">y_pred</span> is firstly ranked in the descending order of probability values.**</font>

><font size="3">**If <span style="color:#4285F4">y_pred</span> present in the <span style="color:#4285F4">index of non-zero y_true</span> is less than or equal to K, it is considered accurate.**</font>

In [None]:
y_true = np.array([[0],
                   [1],
                   [2],
                   [3]]) 

y_pred = np.array([[1,   0,   0,   0],
                   [0.1, 0,   0.1, 0.8],
                   [0.1, 0.6, 0.3, 0.0],
                   [0.1, 0.3, 0.4, 0.2]]) 

In [None]:
kTop=1

metric = tf.keras.metrics.SparseTopKCategoricalAccuracy(k=kTop)
metric.update_state(y_true, y_pred)

print('Accuracy:', round(metric.result().numpy()*100), '%')

In [None]:
kTop=2

metric = tf.keras.metrics.SparseTopKCategoricalAccuracy(k=kTop)
metric.update_state(y_true, y_pred)

print('Accuracy:', round(metric.result().numpy()*100), '%')

In [None]:
kTop=3

metric = tf.keras.metrics.SparseTopKCategoricalAccuracy(k=kTop)
metric.update_state(y_true, y_pred)

print('Accuracy:', round(metric.result().numpy()*100), '%')

<font size="3">**The logic is like:**</font>

><font size="3">**1) Rank the predictions**</font>

In [None]:
from scipy.stats import rankdata

rankpos = lambda x : (len(x)+1) - rankdata(x).astype(int)
y_pred_rank = np.array([rankpos(row) for row in y_pred])

print(y_pred_rank)

><font size="3">**2) Identify the <span style="color:#4285F4">index of non-zero y_true</span>**</font><Br>
><font size="3">**3) Identify the ranks according to the <span style="color:#4285F4">index of non-zero y_true</span>**</font>

In [None]:
y_true_pred_rank = y_true.astype(int)

for i in range(len(y_true)):
     y_true_pred_rank[i] = y_pred_rank[i,y_true[i]]

print(y_true_pred_rank)

><font size="3">**4) Assign K value as thershold**</font><br>
><font size="3">**5) Count how many ranks are higher than the K value and calculate the accuracy**</font>

In [None]:
kTop = 1

Accuracy = np.sum(y_true_pred_rank <= kTop)/len(y_true_pred_rank)*100

print('Accuracy:', Accuracy, '%')

In [None]:
kTop = 2

Accuracy = np.sum(y_true_pred_rank <= kTop)/len(y_true_pred_rank)*100

print('Accuracy:', Accuracy, '%')

In [None]:
kTop = 3

Accuracy = np.sum(y_true_pred_rank <= kTop)/len(y_true_pred_rank)*100

print('Accuracy:', Accuracy, '%')

### <font color='Orange'>*Regression metrics - Mean Squared Error*</font>

<font size="3">**<span style="color:#4285F4">Mean Squared Error</span> calculates the mean squared error between <span style="color:#4285F4">y_true</span> and <span style="color:#4285F4">y_pred</span>.**</font>

In [None]:
y_true = np.array([[1,   2],
                   [1,   1]]) 

y_pred = np.array([[3,   2],
                   [1,   1]]) 

In [None]:
metric = tf.keras.metrics.MeanSquaredError()
metric.update_state(y_true, y_pred)

print("Mean Squared Error:", metric.result().numpy())

<font size="3">**By calculation:**</font>

><font size="3">**1) Calculate squared error for each sample**</font>

><font size="3">**2) Calculate the mean squared error**</font>

In [None]:
squared_error = np.square((y_true-y_pred))

mean_squared_error = np.mean(squared_error)

print("Mean Squared Error:", mean_squared_error)

### <font color='Orange'>*Regression metrics - Mean Absolute Error*</font>

<font size="3">**<span style="color:#4285F4">Mean Absolute Error</span> calculates the mean absolute error between <span style="color:#4285F4">y_true</span> and <span style="color:#4285F4">y_pred</span>.**</font>

In [None]:
y_true = np.array([[1,   2],
                   [1,   1]]) 

y_pred = np.array([[3,   2],
                   [1,   1]]) 

In [None]:
metric = tf.keras.metrics.MeanAbsoluteError()
metric.update_state(y_true, y_pred)

print("Mean Absolute Error:", metric.result().numpy())

<font size="3">**By calculation:**</font>

><font size="3">**1) Calculate absolute error for each sample**</font>

><font size="3">**2) Calculate the mean absolute error**</font>

In [None]:
absolute_error = np.abs((y_true-y_pred))

mean_absolute_error = np.mean(absolute_error)

print("Mean Absolute Error:", mean_absolute_error)

### <font color='Orange'>*Regression metrics - Mean Absolute Percentage Error*</font>

<font size="3">**<span style="color:#4285F4">Mean Absolute Percentage Error</span> calculates the mean absolute percentage error between <span style="color:#4285F4">y_true</span> and <span style="color:#4285F4">y_pred</span>.**</font>

In [None]:
y_true = np.array([[1,   2],
                   [1,   1]]) 

y_pred = np.array([[3,   2],
                   [1,   1]]) 

In [None]:
metric = tf.keras.metrics.MeanAbsolutePercentageError()
metric.update_state(y_true, y_pred)

print("Mean Absolute Percentage Error:", metric.result().numpy())

<font size="3">**By calculation:**</font>

><font size="3">**1) Calculate absolute percentage error for each sample**</font>

><font size="3">**2) Calculate the mean absolute percentage error**</font>

In [None]:
absolute_error = np.abs((y_true-y_pred))

absolute_percentage_error = np.divide(absolute_error, y_true)

mean_absolute_percentage_error = np.mean(absolute_percentage_error)

print("Mean Absolute Percentage Error:", mean_absolute_percentage_error)

## 4.4 Sequential Model - Fit()

<font size="3">**Once the model is compiled with specified <span style="color:#4285F4">loss function</span>, <span style="color:#4285F4">optimizer</span>, and optionally <span style="color:#4285F4">some metrics</span> as well as data are ready, <span style="background-color: #ECECEC; color:#0047bb">.fit()</span> method can be used to "fit" the model with training data to start training.**</font>

<font size="3">**<span style="background-color: #ECECEC; color:#0047bb">.fit()</span> has two key roles:**</font>

><font size="3">**It will train the model by slicing the data into "batches" of size <span style="color:#4285F4">batch_size</span>, and repeatedly iterating over the entire dataset for a given number of <span style="color:#4285F4">epochs</span>.**</font>

><font size="3">**It will return a <span style="color:#4285F4">history object</span> which holds a record of the <span style="color:#4285F4">loss values</span> and <span style="color:#4285F4">metric values</span> during training.**</font>

### <font color='Orange'>Arguments</font>

<font size="3">**Commonly used arguments include:**</font>
    
><font size="3">**x** - vector, matrix, or array of training data (or list if the model has multiple inputs). If all inputs in the model are named, a list mapping input names to data can also be passed.</font>

><font size="3">**y** - vector, matrix, or array of target (label) data (or list if the model has multiple outputs). If all outputs in the model are named, a list mapping output names to data can also be passed.</font>

><font size="3">**batch_size** - integer or NULL. Number of samples per gradient update. If unspecified, batch_size will default to 32.</font>

><font size="3">**epochs** - number of epochs to train the model. The model is trained for a number of iterations given by epochs until the epoch of index epochs is reached.</font>
    
><font size="3">**verbose** - verbosity mode (0 = silent, 1 = progress bar, 2 = one line per epoch)</font>
    
><font size="3">**callbacks** - list of callbacks to be called during training</font>
    
><font size="3">**validation_split** - float between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling.</font>

><font size="3">**validation_data**	- data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. This could be a list (x_val, y_val) or a list (x_val, y_val, val_sample_weights). validation_data will override validation_split.</font>

><font size="3">**shuffle** - shuffle: logical (whether to shuffle the training data before each epoch) or string (for "batch"). "batch" is a special option for dealing with the limitations of HDF5 data; it shuffles in batch-sized chunks.</font>

### <font color='Orange'>Usage of Returns</font>
<font size="3">**<span style="background-color: #ECECEC; color:#0047bb">.fit()</span> returns a <span style="color:#4285F4">history object</span>. Its <span style="background-color: #ECECEC; color:#0047bb">History.history</span> attribute is a record of <span style="color:#4285F4">training loss values</span> and <span style="color:#4285F4">metrics values</span> at successive epochs, as well as validation loss values and validation metrics values (if applicable).**</font>

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>
<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="100" align="right"/>


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np

In [None]:
X = np.random.random((100, 3))
y = np.random.random((100, 1))

model=Sequential()
model.add(Dense(2, activation='relu', input_shape=(3,)))
model.add(Dense(1, activation ='sigmoid'))

model.compile(optimizer="adam", loss="mse", metrics="mae")

<font size="3">**Store the records in <span style="color:#4285F4">history object</span>**</font>

In [None]:
history = model.fit(X, y,
                    epochs = 5,
                    validation_split=0.2)

<font size="3">**Display the <span style="color:#4285F4">training history</span> which is a dictionary type**</font>

In [None]:
print(history.history)

In [None]:
print(history.history.keys())

In [None]:
print(history.history['loss'])

<hr style="border:2px solid #E1F6FF"> </hr>

<font size="5"><span style="background-color:#EA4335; color:white">&nbsp;!&nbsp;</span></font> <font size="3">**<span style="color:#4285F4">History</span> can be used to visualize the <span style="color:#4285F4">error (i.e. loss)</span> on a <span style="color:#4285F4">bias-variance graph</span> and allow us to render a diagnosis of the model**</font>

><font size="3">**<span style="color:#4285F4">Bias</span> refers to the ability to capture the true patterns in the dataset.**</font>
    
><font size="3">**<span style="color:#4285F4">Variance</span> refers to the ability to capture the range of predictions for each data record.**</font>

<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/4NNBiasVariance1.png?alt=media&token=40cf92ff-5c74-4fe4-a8cc-68763eaef7d9" width="450" align="center"/>


<font size="3">**The simplest way to understand bias-variance of a model is by looking at**</font>
    
><font size="3">**the training set error; and**</font><br>
><font size="3">**validation/test set error.**</font>

<font size="3">**However, before we are able to do so, we always need a benchmark error to justify how well/bad a model perform.**</font>
    
<font size="3">**Let's assume <span style="color:red">1%</span> error can be achieved by a benchmark algorithm. If we have the following scenarios:**</font>

><font size="3">**1. <span style="color:#4285F4"><span style="color:red">1%</span> training set error, <span style="color:red">1.5%</span> validataion set error</span>**</font><br>
><font size="3"></font><br>
><font size="3">**then it represents the case of <span style="color:green">Low bias, Low variance</span>, which indicates that the model is perfect**</font>

><font size="3">**2. <span style="color:#4285F4"><span style="color:red">1%</span> training set error, <span style="color:red">10%</span> validataion set error</span>**</font><br>
><font size="3"></font><br>
><font size="3">**then it represents the case of <span style="color:green">High variance</span>, which indicates that the model does not generalize well due to <span style="color:#4285F4">overfitting</span> the training**</font>

><font size="3">**3. <span style="color:#4285F4"><span style="color:red">10%</span> training set error, <span style="color:red">11%</span> validataion set error</span>**</font><br>
><font size="3"></font><br>
><font size="3">**then it represents the case of <span style="color:green">High bias</span>, which indicates that the model is not doing well on the training set due to <span style="color:#4285F4">underfitting</span> the training**</font>

><font size="3">**4. <span style="color:#4285F4"><span style="color:red">10%</span> training set error, <span style="color:red">20%</span> validataion set error</span>**</font><br>
><font size="3"></font><br>
><font size="3">**then it represents the case of <span style="color:green">High bias, High variance</span>, which indicates that the model performs poorly**</font>

### <font color='Orange'>Usage of callbacks</font>

<font size="3">**A <span style="background-color: #ECECEC; color:#0047bb">callback</span> is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch, before or after a single batch, etc).**</font>

<font size="3">**<span style="background-color: #ECECEC; color:#0047bb">Callbacks</span> can be used to:**</font>

><font size="3">**Write TensorBoard logs after every batch of training to monitor your metrics**</font>

><font size="3">**Periodically save your model to disk**</font>

><font size="3">**Do early stopping**</font>

><font size="3">**Get a view on internal states and statistics of a model during training**</font>

<font size="3">**To execute <span style="background-color: #ECECEC; color:#0047bb">callbacks</span>:**</font>

><font size="3">**A list of callbacks (as an argument <span style="color:#4285F4">callbacks</span>) can be passed to the <span style="background-color: #ECECEC; color:#0047bb">.fit()</span>.**</font>
    
><font size="3">**The relevant methods of the callbacks will then be called at each stage of the training.**</font>

<font size="3">**Commonly used functions include:**</font>

><font size="3">**<span style="color:#4285F4">ModelCheckPoint</span>**</font>

><font size="3">**<span style="color:#4285F4">TensorBoard</span>**</font>

><font size="3">**<span style="color:#4285F4">EarlyStopping</span>**</font>

### <font color='#34A853'>ModelCheckPoint</font>

<font size="3">**<span style="color:#4285F4">ModelCheckPoint</span> callback is used in conjunction with training using <span style="background-color: #ECECEC; color:#0047bb">.fit()</span> to save a model or weights (in a checkpoint file) at some interval, so the model or weights can be loaded later to continue the training from the state saved.**</font>

<font size="3">**A few options this callback provides include:**</font>

><font size="3">**Whether to only keep the model that has achieved the "best performance" so far, or whether to save the model at the end of every epoch regardless of performance.**</font>

><font size="3">**Definition of 'best'; which quantity to monitor and whether it should be maximized or minimized.**</font>

><font size="3">**The frequency it should save at. Currently, the callback supports saving at the end of every epoch, or after a fixed number of training batches.**</font>

><font size="3">**Whether only weights are saved, or the whole model is saved.**</font>

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>
<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="100" align="right"/>

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np

In [None]:
X = np.random.random((100, 3))
y = np.random.random((100, 1))

model=Sequential()
model.add(Dense(2, activation='relu', input_shape=(3,)))
model.add(Dense(1, activation ='sigmoid'))

model.compile(optimizer="adam", loss="mse", metrics="mae")

<font size="3">**Define the callback's arguments. Commonly used arguments include:**</font>

><font size="3">**filepath** - it can contain named <span style="color:#4285F4">formatting options</span> , which will be filled the value of <span style="color:#4285F4">epoch</span> and <span style="color:#4285F4">keys</span> in logs.</font><br>
><font size="3">e.g., if filepath is <span style="background-color: #ECECEC; color:#0047bb">weights.{epoch:02d}-{val_loss:.2f}.hdf5</span>, then the model checkpoints will be saved with the epoch number and the validation loss in the filename.</font><br>
><font size="3">The directory of the filepath should not be reused by any other callbacks to avoid conflicts.</font>

><font size="3">**monitor** - it is the metric name to monitor.</font>

><font size="3">**save_best_only** - if <span style="background-color: #ECECEC; color:#0047bb">save_best_only=True</span>, it only saves when the model is considered the <span style="color:#4285F4">"best"</span> and the latest best model according to the quantity monitored will not be overwritten.</font><br> 
><font size="3">If filepath doesn't contain <span style="color:#4285F4">formatting options</span> like {epoch} then filepath will be overwritten by each new better model.</font>

><font size="3">**mode** - it should be one of <span style="color:#4285F4">{'auto', 'min', 'max'}</span>. </font><br>
><font size="3">If <span style="background-color: #ECECEC; color:#0047bb">save_best_only=True</span>, the decision to overwrite the current save file is made based on either <span style="color:#4285F4">the maximization or the minimization of the monitored quantity</span>.</font><br>
><font size="3">e.g. for <span style="color:#4285F4">val_acc</span>, this should be <span style="color:#4285F4">max</span>, for <span style="color:#4285F4">val_loss</span> this should be <span style="color:#4285F4">min</span>, etc. In <span style="color:#4285F4">auto</span> mode, the mode is set to <span style="color:#4285F4">max</span> if the quantities monitored are <span style="color:#4285F4">'acc'</span> or start with <span style="color:#4285F4">'fmeasure'</span> and are set to <span style="color:#4285F4">min</span> for the rest of the quantities.</font>

In [None]:
from tensorflow.keras.callbacks import ModelCheckpoint

filepath = 'my_model/{epoch:02d}'

checkpoint_callback = ModelCheckpoint(
                            filepath,
                            monitor='loss',
                            mode='min',
                            save_best_only=True)

<font size="5"><span style="background-color:#EA4335; color:white">&nbsp;!&nbsp;</span></font> <font size="3">**A folder (e.g. my_model folder) must be created before training.**</font>

<font size="3">**Fit the model with defined callback in a list**</font>

In [None]:
model.fit(X, y,
          epochs = 5,
          validation_split=0.2,
          callbacks=[checkpoint_callback])

<font size="3">**There will be five folders inside the my_model folder**</font>

In [None]:
ls my_model

<hr style="border:2px solid #E1F6FF"> </hr>

### <font color='#34A853'>TensorBoard</font>

<font size="3">**<span style="color:#4285F4">TensorBoard</span> is a visualization tool provided with TensorFlow. This callback logs events for <span style="color:#4285F4">TensorBoard</span>, including:**</font>

><font size="3">**Metrics summary plots**</font>

><font size="3">**Training graph visualization**</font>

><font size="3">**Activation histograms**</font>

><font size="3">**Sampled profiling**</font>

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>
<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="100" align="right"/>

In [88]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np

In [89]:
X = np.random.random((100, 3))
y = np.random.random((100, 1))

model=Sequential()
model.add(Dense(2, activation='relu', input_shape=(3,)))
model.add(Dense(1, activation ='sigmoid'))

model.compile(optimizer="adam", loss="mse", metrics="mae")

<font size="3">**Define the callback's arguments. Commonly used arguments include:**</font>

><font size="3">**log_dir** - the <span style="color:#4285F4">path of the directory</span> where to save the log files to be parsed by TensorBoard.</font><br>
><font size="3">The directory should not be reused by any other callbacks to avoid conflicts.</font>

><font size="3">**histogram_freq** - <span style="color:#4285F4">frequency (in epochs)</span> at which to compute activation and weight histograms for the layers of the model.</font>

<font size="5"><span style="background-color:#EA4335; color:white">&nbsp;!&nbsp;</span></font> <font size="3">**A good practice is to create folder name with current date and time.**</font>

In [90]:
import datetime
from tensorflow.keras.callbacks import TensorBoard

log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)

model.fit(X, y, 
          epochs=200, 
          validation_split=0.2,
          callbacks=[tensorboard_callback])

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

Epoch 123/200
Epoch 124/200
Epoch 125/200
Epoch 126/200
Epoch 127/200
Epoch 128/200
Epoch 129/200
Epoch 130/200
Epoch 131/200
Epoch 132/200
Epoch 133/200
Epoch 134/200
Epoch 135/200
Epoch 136/200
Epoch 137/200
Epoch 138/200
Epoch 139/200
Epoch 140/200
Epoch 141/200
Epoch 142/200
Epoch 143/200
Epoch 144/200
Epoch 145/200
Epoch 146/200
Epoch 147/200
Epoch 148/200
Epoch 149/200
Epoch 150/200
Epoch 151/200
Epoch 152/200
Epoch 153/200
Epoch 154/200
Epoch 155/200
Epoch 156/200
Epoch 157/200
Epoch 158/200
Epoch 159/200
Epoch 160/200
Epoch 161/200
Epoch 162/200
Epoch 163/200
Epoch 164/200
Epoch 165/200
Epoch 166/200
Epoch 167/200
Epoch 168/200
Epoch 169/200
Epoch 170/200
Epoch 171/200
Epoch 172/200
Epoch 173/200
Epoch 174/200
Epoch 175/200
Epoch 176/200
Epoch 177/200
Epoch 178/200
Epoch 179/200
Epoch 180/200
Epoch 181/200
Epoch 182/200
Epoch 183/200
Epoch 184/200
Epoch 185/200
Epoch 186/200
Epoch 187/200
Epoch 188/200
Epoch 189/200
Epoch 190/200
Epoch 191/200
Epoch 192/200
Epoch 193/200
Epoch 

<tensorflow.python.keras.callbacks.History at 0x1c074dd3070>

<font size="3">**Load the TensorBoard notebook extension using <span style="color:#4285F4">magics</span>**</font>

In [91]:
%load_ext tensorboard

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


<font size="3">**Start <span style="color:#4285F4">TensorBoard</span> within the notebook using <span style="color:#4285F4">magics</span>**</font>

In [92]:
%tensorboard --logdir logs/fit

Reusing TensorBoard on port 6006 (pid 9908), started 0:03:56 ago. (Use '!kill 9908' to kill it.)

<font size="3">**<span style="color:#4285F4">Dashboards</span> such as scalars, graphs and histograms can now be viewed.**</font>

<font size="3">**Alternatively, we can also start TensorBoard before training to monitor it in progress.**</font>

In [None]:
%tensorboard --logdir logs

log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)

model.fit(X, y, 
          epochs=200, 
          validation_split=0.2,
          callbacks=[tensorboard_callback])

<hr style="border:2px solid #E1F6FF"> </hr>

### <font color='#34A853'>EarlyStopping</font>

<font size="3">**<span style="color:#4285F4">EarlyStopping</span> monitors the performance of a model for every epoch, and terminate the training when a monitored metric has stopped improving.**</font>

><font size="3">**Assuming the goal of a training is to <span style="color:#4285F4">minimize the loss</span>. With this, the metric to be monitored would be <span style="color:#4285F4">'loss'</span>, and mode would be <span style="color:#4285F4">'min'</span>. A <span style="background-color: #ECECEC; color:#0047bb">.fit()</span> training loop will check at end of every epoch whether the loss is no longer decreasing, considering the <span style="color:#4285F4">min_delta</span> and <span style="color:#4285F4">patience</span> if applicable. Once it's found no longer decreasing, the training terminates.**</font>

><font size="3">**The quantity to be monitored needs to be available in <span style="color:#4285F4">logs dictionary</span>. To make it so, pass the loss or metrics at <span style="background-color: #ECECEC; color:#0047bb">.compile()</span>.**</font>

<font size="5"><span style="background-color:#EA4335; color:white">&nbsp;!&nbsp;</span></font> <font size="3">**Early stopping is a very powerful regularization technique to tackle the overfitting problem.**</font>
    
<font size="3">**As the epochs go by, the training error and validation error naturally go down. After a while, the validation error stops decreasing and reverses. This indicates that the model has started to overfit. However, the training error continues decreasing.**</font>

<font size="3">**Therefore, in practice, it is more effective if early stopping is applied to <span style="color:#4285F4">monitor the performance of the validation set</span> during the training.**</font>

<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/4NNEarlyStopping.png?alt=media&token=80522de2-05bb-470d-b434-b2dc89875763" width="500" align="center"/>

<font size="3">**Commonly used arguments include:**</font>

><font size="3">**monitor** - Quantity to be monitored.</font><br>

><font size="3">**min_delta** - Minimum change in the monitored quantity to qualify as an improvement, <span style="color:#4285F4">i.e. an absolute change of less than min_delta, will count as no improvement</span>.</font>

><font size="3">**patience** - Number of epochs with no improvement after which training will be stopped.</font><br> 

><font size="3">**mode** - it should be one of <span style="color:#4285F4">{'auto', 'min', 'max'}</span>.</font><br>
>><font size="3">In <span style="color:#4285F4">min</span> mode, training will stop when the quantity monitored has stopped decreasing.</font><br>
>><font size="3">In <span style="color:#4285F4">max</span> mode, training will stop when the quantity monitored has stopped increasing.</font><br>
>><font size="3">In <span style="color:#4285F4">auto</span> mode, the direction is automatically inferred from the name of the monitored quantity.</font><br>

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>
<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="100" align="right"/>

In [29]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np

from tensorflow.keras.callbacks import EarlyStopping

X = np.random.random((100, 3))
y = np.random.random((100, 1))

<font size="3">**Define the callback's arguments - EarlyStopping on <span style="color:#4285F4">Loss</span> with <span style="color:#4285F4">patience</span>**</font>

In [23]:
model_loss=Sequential()
model_loss.add(Dense(2, activation='relu', input_shape=(3,)))
model_loss.add(Dense(1, activation ='sigmoid'))

model_loss.compile(optimizer="adam", loss="mse", metrics="mae")

In [24]:
earlyStopping_callback = EarlyStopping(monitor='loss', patience=3)

In [30]:
model_loss.fit(X, y,
          epochs = 200,
          validation_split=0.2,
          callbacks=[earlyStopping_callback])

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200


Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78/200
Epoch 79/200
Epoch 80/200
Epoch 81/200
Epoch 82/200
Epoch 83/200
Epoch 84/200
Epoch 85/200
Epoch 86/200
Epoch 87/200
Epoch 88/200
Epoch 89/200
Epoch 90/200
Epoch 91/200
Epoch 92/200
Epoch 93/200
Epoch 94/200
Epoch 95/200
Epoch 96/200
Epoch 97/200
Epoch 98/200
Epoch 99/200
Epoch 100/200
Epoch 101/200
Epoch 102/200
Epoch 103/200
Epoch 104/200
Epoch 105/200
Epoch 106/200
Epoch 107/200
Epoch 108/200
Epoch 109/200
Epoch 110/200
Epoch 111/200
Epoch 112/200
Epoch 113/200
Epoch 114/200
Epoch 115/200
Epoch 116/200
Epoch 117/200
Epoch 118/200
Epoch 119/200
Epoch 120/200
Epoch 121/200
Epoch 122/200
Epoch 123/200
Epoch 124/200
Epoch 125/200


Epoch 126/200
Epoch 127/200
Epoch 128/200
Epoch 129/200
Epoch 130/200
Epoch 131/200
Epoch 132/200
Epoch 133/200
Epoch 134/200
Epoch 135/200
Epoch 136/200
Epoch 137/200
Epoch 138/200
Epoch 139/200
Epoch 140/200
Epoch 141/200
Epoch 142/200
Epoch 143/200
Epoch 144/200
Epoch 145/200
Epoch 146/200
Epoch 147/200
Epoch 148/200
Epoch 149/200
Epoch 150/200
Epoch 151/200
Epoch 152/200
Epoch 153/200
Epoch 154/200
Epoch 155/200
Epoch 156/200
Epoch 157/200
Epoch 158/200
Epoch 159/200
Epoch 160/200
Epoch 161/200
Epoch 162/200
Epoch 163/200
Epoch 164/200
Epoch 165/200
Epoch 166/200
Epoch 167/200
Epoch 168/200
Epoch 169/200
Epoch 170/200
Epoch 171/200


<tensorflow.python.keras.callbacks.History at 0x24e9aff6cd0>

<font size="3">**Define the callback's arguments - EarlyStopping on <span style="color:#4285F4">Validation Loss</span> with <span style="color:#4285F4">patience</span>**</font>

In [31]:
model_val_loss=Sequential()
model_val_loss.add(Dense(2, activation='relu', input_shape=(3,)))
model_val_loss.add(Dense(1, activation ='sigmoid'))

model_val_loss.compile(optimizer="adam", loss="mse", metrics="mae")

In [32]:
earlyStopping_callback = EarlyStopping(monitor='val_loss', patience=3)

In [33]:
model_val_loss.fit(X, y,
          epochs = 500,
          validation_split=0.2,
          callbacks=[earlyStopping_callback])

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500


Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78/500
Epoch 79/500
Epoch 80/500
Epoch 81/500
Epoch 82/500
Epoch 83/500
Epoch 84/500
Epoch 85/500
Epoch 86/500
Epoch 87/500
Epoch 88/500


<tensorflow.python.keras.callbacks.History at 0x24e9b0de160>

<font size="3">**Define the callback's arguments - EarlyStopping on <span style="color:#4285F4">Validation Loss</span> with <span style="color:#4285F4">min_delta</span>**</font>

In [34]:
model_min_delta=Sequential()
model_min_delta.add(Dense(2, activation='relu', input_shape=(3,)))
model_min_delta.add(Dense(1, activation ='sigmoid'))

model_min_delta.compile(optimizer="adam", loss="mse", metrics="mae")

In [35]:
earlyStopping_callback = EarlyStopping(monitor='val_loss', min_delta=0.00001)

In [36]:
model_min_delta.fit(X, y,
          epochs = 500,
          validation_split=0.2,
          callbacks=[earlyStopping_callback])

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500


Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78/500
Epoch 79/500
Epoch 80/500
Epoch 81/500
Epoch 82/500
Epoch 83/500
Epoch 84/500
Epoch 85/500
Epoch 86/500
Epoch 87/500
Epoch 88/500
Epoch 89/500
Epoch 90/500
Epoch 91/500
Epoch 92/500
Epoch 93/500
Epoch 94/500
Epoch 95/500
Epoch 96/500
Epoch 97/500


<tensorflow.python.keras.callbacks.History at 0x24e9b422b20>

<hr style="border:2px solid #E1F6FF"> </hr>

### <font color='Orange'>Usage of Batch Size</font>

<font size="3">**<span style="background-color: #ECECEC; color:#0047bb">.fit()</span> trains a model by slicing the data into "batches" of size <span style="color:#4285F4">batch_size</span>, and repeatedly iterating over the entire dataset for a given number of <span style="color:#4285F4">epochs</span>.**</font>

<font size="3">**There are three different types of <span style="color:#4285F4">gradient descent</span>:**</font>

><font size="3">**<span style="color:#4285F4">Stochastic Gradient Descent</span> is applied, if <span style="color:#4285F4">batch_size</span> equals <span style="color:red">1</span>**</font><br>
>><font size="3">**A single sample is randomly picked and used to compute the gradient of the cost function for each iteration of the gradient descent and then update the parameters.**</font>

><font size="3">**<span style="color:#4285F4">Batch Gradient Descent or Vanilla Gradient Descent</span> is applied, if <span style="color:#4285F4">batch_size</span> equals <span style="color:red">total number of samples</span>**</font><br>
>><font size="3">**The entire dataset are used to compute the gradient of the cost function for each iteration of the gradient descent and then update the parameters.**</font>

><font size="3">**<span style="color:#4285F4">Mini batch Gradient Descent</span> is applied, if <span style="color:#4285F4">batch_size</span> is larger than <span style="color:red">1</span> but less than <span style="color:red">total number of samples</span>**</font><br>
>><font size="3">**A mini batch of samples is randomly picked and used to compute the gradient of the cost function for each iteration of the gradient descent and then update the parameters.**</font>

<font size="3">**If <span style="color:#4285F4">batch_size</span> is <span style="color:#4285F4">too small</span>, the model weights can be easily affected by small portion of data and it results in a less accurate estimate of the gradient. If <span style="color:#4285F4">batch_size</span> is <span style="color:#4285F4">too large</span>, it can cause out of memory issue, especially with very large datasets.**</font>

<font size="3">**For these reasons, smaller <span style="color:#4285F4">batch_size</span> are often used. By default, Keras applies <span style="color:#4285F4">Mini batch Gradient Descent</span> with <span style="color:#4285F4">32 samples</span>.**</font>

><font size="3">**Smaller batch sizes are noisy, offering a regularizing effect and lower generalization error.**</font>

><font size="3">**Smaller batch sizes make it easier to fit one batch worth of training data in memory (i.e. when using a GPU).**</font>
    


## 4.5 Sequential Model - Evaluate()

<font size="3">**To evaluate the <span style="color:#4285F4">generalisability</span> of a final model, it is always important to use the data that the model was not trained on, namely <span style="color:#4285F4">testing data</span>.**</font>

<font size="3">**A simple way to split all the samples into two datasets:**</font>

><font size="3">**Training dataset & Validation dataset** - The argument of <span style="color:#4285F4">validation_split</span> or <span style="color:#4285F4">validation_data</span> from <span style="background-color: #ECECEC; color:#0047bb">.fit()</span> can be used to split the samples into <span style="color:#4285F4">training and validataion dataset</span>.</font>

><font size="3">**Testing dataset** - Once a model is <span style="color:#4285F4">completely trained</span>, <span style="color:#4285F4">unseen data</span> in testing dataset can be used to evaluate how well or how bad a model perform. This provides an <span style="color:#4285F4">unbiased evaluation</span> and is always considered as a <span style="color:#4285F4">good practice</span>.</font>

<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/4NNEvaluate.png?alt=media&token=e4701b4b-11b2-48e9-8591-d67fd4cdcc13" width="1000" align="center"/>

<font size="3">**In the final model, <span style="background-color: #ECECEC; color:#0047bb">.evaluate()</span> returns the <span style="color:#4285F4">loss value</span> and <span style="color:#4285F4">metrics values</span> according to arguments provided in <span style="background-color: #ECECEC; color:#0047bb">.compile()</span>.**</font>

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>
<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="100" align="right"/>

In [37]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np

In [38]:
X = np.random.random((100, 3))
y = np.random.random((100, 1))

<font size="3">**Split the 100 samples into two datasets**</font>

In [42]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=123)

print('Training & Validation Dataset:', X_train.shape, y_train.shape)
print('Test Dataset:', X_test.shape, y_test.shape)

Training & Validation Dataset: (90, 3) (90, 1)
Test Dataset: (10, 3) (10, 1)


In [43]:
model=Sequential()
model.add(Dense(2, activation='relu', input_shape=(3,)))
model.add(Dense(1, activation ='sigmoid'))
model.compile(optimizer="adam", loss="mse", metrics="mae")

<font size="3">**Split the 90 samples into training and validation dataset**</font>

In [44]:
model.fit(X_train, y_train,
          validation_split=0.2,
          epochs = 5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x24e9d2a4730>

<font size="3">**Evaluate the final model with test dataset**</font>

In [45]:
test_loss, test_metrics = model.evaluate(X_test, y_test)

print('Test Loss - MSE:', test_loss)
print('Test Metrics - MAE:', test_metrics)

Test Loss - MSE: 0.2611391842365265
Test Metrics - MAE: 0.4374386668205261


<hr style="border:2px solid #E1F6FF"> </hr>

## 4.6 Sequential Model - Predict()

<font size="3">**Once the model is created, <span style="background-color: #ECECEC; color:#0047bb">.predict()</span> can be used to do prediction.**</font>

### <font color='#176BEF'> Examples </font>
<hr style="border:2px solid #E1F6FF"> </hr>
<img src="https://firebasestorage.googleapis.com/v0/b/deep-learning-crash-course.appspot.com/o/3NN9.png?alt=media&token=664be587-f0fe-43ec-8217-5ca7779ca0dd" width="100" align="right"/>

In [46]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np

In [47]:
X = np.random.random((100, 3))
y = np.random.random((100, 1))

model=Sequential()
model.add(Dense(2, activation='relu', input_shape=(3,)))
model.add(Dense(1, activation ='sigmoid'))
model.compile(optimizer="adam", loss="mse", metrics="mae")

model.fit(X, y,
          epochs = 5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x24e9e6a03d0>

<font size="3">**<span style="color:#4285F4">Same number of features</span> is needed in order to generate prediction.**</font>

<font size="3">**In this example, 3 features have been used for training. So 3 features are needed for prediction.**</font>

In [48]:
prediction_single = model.predict(np.random.random((1, 3)))

print("The prediction is:", prediction_single)

The prediction is: [[0.46480438]]


In [49]:
prediction_batch = model.predict(np.random.random((10, 3)))

print("The prediction is:", prediction_batch)

The prediction is: [[0.45315924]
 [0.41939664]
 [0.47458243]
 [0.4812256 ]
 [0.46150655]
 [0.47832948]
 [0.44983304]
 [0.49562877]
 [0.44853142]
 [0.4879438 ]]


<hr style="border:2px solid #E1F6FF"> </hr>