<a href="https://colab.research.google.com/github/Christian-Young/AI-Machine-Learning/blob/master/HW_5/HW5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Summarize and describe the different concepts/methods/algorithms that you have learned in this course.

Use a Colab notebook. Make sure that you organize the material logically by using sections/subsections. Also, use code cell to include code snippets.

I suggest that you group everything into five categories:

* General concepts (for instance, what is artificial intelligence, machine learning, deep learning)

* Basic concepts (for instance, here you can talk about linear regression, logistic regression, gradients, gradient descent)

* Building a model (for instance, here you can talk about the structure of a convent, what it components are etc.)

* Comping a model (for instance, you can talk here about optimizers, learning rate etc.)

* Training a model (for instance, you can talk about overfitting/underfitting)

* Finetuning a pretrained model (describe how you proceed)

Take this homework very seriously. You have the opportunity to make up for lost point on previous homework assignments.

# General Concepts

### Artificial Intelligence (AI)

In this course, I have examined the broad field of Artificial Intelligence as well as the subsets which are included within it. Artificial Intelligence (AI) is loosely defined. As such, there are several different definitions that are equally valid in representing its goals and properties. One definition is described as the "Science and Engineering of making intelligent machines". As mentioned by John McCarthy. Other definitions Include: "The branch of Computer Science dealing with the simulation of intelligent behavior in computers", among others. As for my interpretation of these definitions based on what I've learned, I would describe the broad field of AI as the construction of algorithms to perform tasks that normally would need to be done by a human.

### Machine Learning (ML)

Machine learning (ML) is considered to be subset of AI. It is the branch of AI that deals with the study of computers being given the ability to learn with only the parameters available to them. In turn, a program with a foundation involving Machine Learning is able to adapt and respond to the results of data given to them by the programmer. With this ability, computers are able to learn dynamically without changes issued by a human. The main and most important difference between AI and ML, is that in AI, the rules are given to the computer to achieve a specified output intended by the programmer. However, in ML, the computer creates the rules themselves in response to data. This it the single most distinguishing feature between AI and ML.

In Machine Learning, terminologies are used to describe the problem and the method of approaching it.

* A **label** is the intended prediction.

* A **feature** is an input variable which is used to predict the model.

* A **bias** is often considered the "y-intercept".

* A **weight** is essentially the same concept as "slope" in referring to the graphing of a model.

* An **inference** is a prediction based on an input.

### Subsets of ML: Deep Learning (DL), Reinforcement Learning (RL), Supervised/Unsupervised Learning

There are subsets or branches of Machine Learning as well. Such as Supervised or Unsupervised Learning. As well as Reinforcement Learning (RL) or Deep Learning (DL). Each of which have their own properties and have uses in different problems. Reinforcement Learning for instance has its main goal in adjusting a computers behavior with rewards at each step of an action.

# Basic Concepts

### Linear Regression

In terms of Machine Learning, Linear Regression is a method of modeling a relationship between a response (such as a predicted label) and an input (a feature). A regression model relies on features (inputs) to predict a certain value. The more features, the more accurate the model. If the model has a smaller number of features, it may not represent the true nature of the trend.

### Training and Loss

**Loss** is terminology in Machine Learning important in assessing the results of training. Loss is essentially the result of a failed prediction. The purpose of loss is to display the severity of a bad prediction that a model created. Only when a prediction is perfect, is the loss equal to 0. **Training** the model is the process of adjusting variables (weights and bias) based on features and outputs. The main goal of training is to minimize loss for a more accurate model. This entire process is described as another term called **Empirical Risk Minimization**.

### Gradient Descent

Gradient Descent is an iterative optimization algorithm. Its purpose is to optimize by finding a local minimum of a differentiable function. This is done by assigning a starting point on a curve of a graph of a model whose parameters are loss (for the y-axis) and the value of weights (for the x-axis). We achieve the smallest amount of loss by stepping in one direction along the curve and assessing whether we are "warmer" or "colder". Based on the result, the starting point is updated. The gradient is the variable used to make the steps in either direction. It is a vector. Meaning it has the properties of direction and magnitude. This is used to optimize the loss of the model. The "step" is also known as the learning rate. It is a scalar quantity that is multplied by the gradient vector. It is important to adjust the learning rate. If the learning rate is too small, it will take too long. If the rate is too large, the model may overshoot the local minimum in relation to the loss.

Below is a code snippet featuring gradient descent:

In [0]:
import numpy as np

# Random data generation.
x = 9 * np.random.rand(100, 1)
y = 3 * np.random.rand(100, 1)

# Parameters. Epochs, learning rate (step size), batch size, weights.
epochs = 50
lr = 0.01
batch = 5
wt = np.random.randn(3, 1)
weightPath = []
weightPath.append(wt)
arr = np.column_stack([np.ones((100, 1)), x, y])

# Gradient Descent training.
for epoch in range(epochs):
  indices = np.random.permutation(100)
  xS = arr[indices]
  yS = y[indices]
  for i in range(0, 100, batch):
    x_i = xS[i: i + batch]
    y_i = yS[i: i + batch]
    batchDot = batch * x_i.T.dot(x_i.dot(wt) - y_i)
    gradient = 1 / batchDot
    wt = wt - (lr * gradient)
    weightPath.append(wt)

# Building a Model

### Overview

The groundwork for a neural network is a collection of data-processing modules known as **layers**. The layers recieve input tensors from layers and output tensors as well to other layers. Layers often have a state known as a **weight**. All of which are learned from **Stochastic Gradient Descent (SGD)**. It is important to choose the correct layers for the appropriate formats or data processing.

The programmer must also specify a **loss function** (objective) as well as an **optimizer** when designing the neural network architecture. The loss function is the representative of success for the current step. While the optimizer describes how the neural network must change based on the loss. Implemented by Stochastic Gradient Descent. The loss function selected is important as different functions serve different purposes for different problems.

Below is a code snippet featuring the construction of a model with different layers:

In [0]:
from keras import layers
from keras import models
from keras import optimizers
from keras.applications import Xception

# ConvNet base
conv_base = Xception(
    weights='imagenet', 
    include_top=False, 
    input_shape=(150, 150, 3))

# Layers
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

### Convolutional Neural Networks (ConvNets)

Convolutional Neural Networks are instantiated with the introduction of **convolutional layers** in the model. The convolutional layers consist of several different components:

* Filters
* Kernel Size
* Strides
* Padding
* Data Format
* Dilation Rate
* Activation
* Use Bias
* Kernel Initializer
* Bias Initializer
* Kernel Regularizer
* Bias Regularizer
* Activity Regularizer
* Kernel Constraint
* Bias Constraint

Below is a code snippet featuring a convolutional base:

In [0]:
conv_base = Xception(
    weights='imagenet', 
    include_top=False, 
    input_shape=(150, 150, 3))

# Compiling a Model

### Optimizers and Loss functions

The optimizer is a component of the network architecture in which the programmer has defined the layers. The optimizers purpose is to update the network based on the loss function (which the programmer also specifies). The optimizer often takes certain arguments such as learning rate, momentum (Accelerates SGD in a direction and dampens oscillations), and nesterov momentum.

The loss function selected is important as different functions serve different purposes for different problems. For instance, in the case of Binary classification or if there is multi-class and multi-label classification, it is most appropriate to use the Binary Cross Entropy loss function. Another case to consider is the case that regression evaluates to arbitrary values. In this case the Mean Squared Error loss function is most appropriate. As demonstrated in the code below.

Below is a snippet featuring the construction of a keras model with an optimizer and loss function:

In [0]:
from keras import optimizers
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(64, kernel_initializer='uniform', input_shape=(10,)))
model.add(layers.Dense(1, activation='sigmoid'))

# SGD optimizer.
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)

# Training a Model

### Overfitting/Underfitting

When a Machine Learning model overcompensates the peculiarities of initial data, it may not be prepared for new data and it will **overfit** as a result. While a model that overfits has a low loss rate with training data, it will suffer when new data is introduced. The cause of this is attributed to an unnecessarily complex model. So one of the main downfalls of Machine Learning is to find an appropriate balance between high accuracy and simplicity with our models. The Machine Learning model needs to make good predictions of unknown data as well. Otherwise the results will end up being wildly inaccurate or outright incorrect. To alleviate overfitting. The programmer should divide the data into test and training sets. However, this method is only sound when considering the following:

* Examples are **independent** from each other
* **Stationary** distribution (i.e. the distribution does not change)
* **Same** distribution

With this strategy, a model will be able to train on the training set, evaluate on the test set, and apply updates based on the result of the test set. After each cycle of the workflow we pick the model that performs the most optimally on the test set.

# Fine-tuning

### Validation Set

As mentioned in the overfitting/underfitting section. A Machine Learning model can overestimate its predictons based on test data. And as such may not be prepared for new unforeseen data. The main solution is to partition the data into test and training sets. Which is a useful method in that it allows the model to train on some examples and test on an entirely different set of examples. However, this method is not perfect. Since partitioning the data into two sets provided more optimal results, partitioning into three different sets would provide an even higher accurancy. Then the validation set is introducted. The validation set will essentially be the intermediary stage in between the training and test sets. The purpose of the validation set is to evaluate results from the training set to allow for the test set to "double-check" the results of the validation set. With the data set partitioned into three, the workflow is similiar to the previous strategy of two partitions: Train model on training set, evaluate the model on the validation set, update parameters based on the validation set results, select the model that performs the most optimally on the validation set, and finally, confirm the model on the test set.

# References



*   [Slides by Dr. Pawel Wocjan](https://github.com/schneider128k/machine_learning_course/tree/master/slides)

* [Optimizers](https://keras.io/optimizers/)

* [Loss](https://keras.io/losses/)

* [Convolutional](https://keras.io/layers/convolutional/)