## Train and Evaluate DeepLearning Model

- Every cell in our body takes some input and produces some output.
- Input can be food and output can be some work.
- Similarly, brain cells which are called neurons, take some input and produce some output in the form of electrical signals.


- Artificial Neural Network tries to model this functionality of biological neural networks.
- Artificial Neural Network takes some input [x1, x2, x3, x4, ..., xn]
- It has a set of "Weights" [w1, w2, w3, w4, ..., wn] that are used for processing the input.
- There is also something called "bias" which is used to modify the input in certain way.
- Finally the Neuron produces an output y which is essentially a function of inputs xi, weights wi and bias.

# Deep Learning Introduction:

- Deep learning is an advanced form of machine learning that tries to emulate the way the human brain learns.
- In your brain, you have nerve cells called neurons, which are connected to one another by nerve extensions that pass electrohemical signals through the network.
- (refer slides)

- When there's more than one input value, x is considered a vector with elements named x1, x2, and so on.
- Associated with each x value is a weight (w), which is used to strengthen or weaken the effect of the x value to simulate learning.
- Additionally a bias (b) input is added to enable fine-grained control over the network.
- During the training process, the w and b values will be adjusted to tune the network so that it "learns" to produce correct outputs.
- The neuron itself encapsulates a function that calculates a **weighted sum of x, w, and b**.
- This function is in turn enclosed in an **activation function** that constraints the result (often to a value between 0 and 1) to determine whether or not the neuron passes an output onto the next layer of neurons in the network.

- In simple terms, a machine learning model is a function that calculates y (the label) from x (the features): f(x) = y.


## Case study: Classification problem:- Learning of Penguin species system.
- Input: For example, suppose your observation consists of some measurements of a penguin.
- Specifically, the measurements are: The length of penguin's bill (x1), the depth of the penguin's bill (x2), the length of the penguin's flipper (x3), and the penguin's weight (x4).
- So, the features (x) are a vector of four values, or mathematically, x = [x1, x2, x3, x4].
- Output: Let's suppose that the label we're trying to predict (y) is the species of the penguin, and that there are three possible species it could be: 0-Adelie, 1-Gentoo, and 2-Chinstrap.
- As you can see, it is a classification problems where input should be some characteristics of penguin and output is the one of the three species of penguin.

- This system needs 4 features for measurement, hence 4 neurons are needed input layer.
- The proposed system is intended to classify 3 penguin species, hence 3 neurons are needed at the output layer.
- Based upon requirement of deep learning, the number of hidden layers are decided (minimum 1).

## Training Deep neural network.

- The deep neural network model for the classifier consists of multiple layers of artificial neurons. 
- The training process for a deep neural netowkr consists of multiple iterations, called **epochs**. 
- For the first epoch, you start by assigning random initialization values for the weight (w) and bias (b) values.
- Then the process is as follows:
    1. Features for data observations with known label values are submitted to the input layer. Generally, these observations are grouped into batches (often referred to as **mini-batches**).
    2. The neurons then apply their function, and if activated, pass the result onto the next layer until the output layer produces a prediction.
    3. Calculation of **loss** is done, which is the amount of variance between the predicted and true values.
    4. Revised value for the weights and bias values are calculated to reduce the loss, and these adjustments are **back propagated** to the neurons in the network layers.
    5. The next epoch repeats the batch training forward pass with the revised weights and bias values, hopefully improving the accuracy of the model (by reducing the loss).
- **Remember these 5 steps**

## A closer look at loss functions and backpropagation
- The previous description of deeplearning training process mentioned that the loss from the model is calculated and used to adjust the weight and bias values. How exactly does this work?
### Calculating loss
- Suppose one of the samples passed through the training process contains features of an Adelie specimen (class 0). 
- The correct output from the network would be [1, 0, 0].
- New suppose that the output produced by the network is [0.4, 0.3, 0.3].
- Comparing these, we can calculate an absolute varaince for each element (in other words, how far is each predicted value away from what it should be) as [0.6, 0.3, 0.3].
- In reality, since we're actually dealing with multiple observations, we typically aggregate the variance - for example by squaring the individual variance values and calculating the mean, so we end up with a single average loss value like 0.18.

## Backpropagation
### Optimizers:
- The loss is calculated using a funciton, which operates on the results from the final layer of the network, which is also a function.
- The final layer of network operates on the outputs from the previous layers, which are also functions.
- So in efect, the entire model from the input layer right through to the loss calculation is just one big nested function.
- Functions have a few really useful characteristics, including:
    - You can conceptualize a function as a plotted line comparing its output with each of its variables.
    - You can use differential calculus to calculate the derivative of the function at any point with respect to its variables.
- Lets take the first of these capabilities. 
- We can plot the line of function to show how an individual weight value compares to loss, and mark on that line the point where the current weight value matches the current loss value.

- (refer slides)
- There are two ways to implement backpropagation:
    - manual way
    - systematic way using optimizers
- There are multiple commonly used optimization algorithms, including:
    - Stochastic Gradient Descent (SGD)
    - Adaptive Learning Rate (ADADELTA)
    - Adaptive Momentum Estimation (Adam)
    - and others.
- All of which are designed to figure out how to adjust the weights and biases to minimize loss.

## Global Minima
- The best solution of learning is called as global minima.
- The Global Minima is nothing but the values of w, b when loss function's output is least.
- Sometime due to improper speed of learning, we may miss global minima.
- To avoid this, we must carefully select a parameter called as learning rate.

## Learning rate
- (refer slides)

## Implementation of deep learning algorithm for Penguin species classification case study (Code from python notebook (PenguinDLSklearn.ipynb)):
1. Observe that data is available in two different fiels training and test.
2. As the feature vector species is in categorical form, so we need to convert this into a numeric value.
3. In order to perform training operations, we need to perform standardization/scaling operations with numpy.
4. Now we need to apply this data to a neural network model.
5. Now we need to validate and test this model.

## Convolutional Neural Network (CNN):
- Definition of Convolution in mathematics: https://en.wikipedia.org/wiki/Convolution
- Convolution is a mathematical operation that allows the merging of two sets of information. 
- In the case of CNN, convolution is applied to the input data to filter the information and produce a feature map. 
- This filter is also called a kernel, or feature detector, and its dimensions can be, for example, 3x3.

- CNNs is actually a form of deep learning, It converts image related input data into some feature value.
- While you can use deep learning models for any kind of machine learning, they're particularly useful for dealing with data that consists of large arrays of numeric values - such as images.
- Machine learning models that work with images are the foundation for an area artificial intelligence called **Computer Vision**, and deep learning techniques have been responsible for driving amazing advances in this area over recent years.
- (refer slides).

## Layers in a CNN
- CNNs consist of multiple layers, each performing a specific task in extracting features or predicting labels.
- **Convolutional layers:** One of the principal layer types is a convolutional layer that extracts important features in images.
- A convolutional layer works by applying a filter to images.
- The filter is defined by a kernel that consists of a matrix of weight values.
- For example, a 3x3 filter might be defined like this:
<pre>
[
 1 -1  1
-1  0 -1
 1 -1  1
]
</pre>
- Black and white images are represented by 1 matrix.
- Color images are represented by 3 matrices (For each of the RGB values).

## Convolution Layers
- Due to size of filter kernel we use padding value of 0.
- Rectified Linear Unit (ReLU) is an activation function whose output is same as input for positive values and its output is zero for negative value.
- Image -> Convolution -> ReLU -> (-ve w = 0, +ve w = as it is)

## Pooling layers:
- After extracting feature valeus from images, pooling (or downsampling) layers are used to reduce the number of feature values while retains the key differentiating features that have been extracted.

## Flattening layer:
- A flattening layer is used to flatten the feature maps into a vector of values that can be used as input to a fully connected layer.

## Fully connected CNN:
1. Images are fed into a convolutional layer. In this case, there are two filters, so each images produces two feature maps.
2. The feature maps are passed to a pooling layer, where a 2x2 pooling kernel reduces the size of the feature maps.
3. A dropping layer randomly drops some of the feature maps to help prevent overfitting.
4. A flattening layer takes the remaining feature map arrays and flattens them into a vector.
5. The vector elements are fed into a fully connected network, which generates the predictions.

## CNN using Keras in TensorFlow

- Refer jupyter notebook from google colab link
- api is similar to what we've seen in scikit-learn

## CNN using PyTorch

- Whenever you are using PyTorch, it is advisable to use GPU.
- In google Colab, go to runtime -> change runtime type -> select a GPU.
- If you want access to premium GPUs? you might have to purchase google's "Compute Units".


- Notice that pytorch uses OOP, but the idea is still the same.
- We create a class called Net, this is an empty neural network, aka just a skeleton.
- we define some hyperparameters like
    - hidden layer = 20
    - number of epoch = 40
    - learning rate = 0.01
- Then we choose a optimizer and loss function
- we use something like:
    - torch.Tensor(xtrain).float()
    - torch.Tensor(ytrain).long()
- This was done because we need to do type conversion before we use pytorch dataframe.
- Then there is a for loop that does the training.
- refer the jupyter notebooks for more info.