<a href="https://colab.research.google.com/github/Xylenox/CAP4630/blob/master/HW_5/HW5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. General Concepts

### Aritifical Intelligence
Artificial Intelligence is "the science and engineering of making intelligent machines." It's basically making machines that can make decisions without human input.

### Machine Learning
Machine learning is the technique of letting machines take in information and use it to learn how to do things better. Instead of always doing the same thing on the same input, the machine can learn how to do the task better and return a better result.

### Deep Learning
Deep learning is a subst of machine learning that relies on neural networks. They are inspired by biological neurons in animal brains. They aren't programmed from a specific task, and learn from a set of examples to do some task.

# 2. Basic Concepts

### Linear Regression
Linear regression is a type of regression that predicts the output value as a linear combination of the input features plus a bias.

A linear regression model has a set of weights, $w=(w_1, \ldots, w_n)$, and a bias $b$. If the input is $x=(x_1, \ldots, x_n)$, then the output will be $w_1*x_1 + \ldots + w_2*x_2 + b$.

### Logistic Regression
Logistic regression is a type of regression that predicts the probability that a set of input features is one of two different classes. Logistic regression is very similar to linear regression in that it as a set of weights and a bias. The difference is that instead of the output being $w_1*x_1 + \ldots + w_2*x_2 + b$, the out put is $\sigma(w_1*x_1 + \ldots + w_2*x_2 + b)$. $\sigma(x) = \frac{1}{1+e^{-x}}$, also known as the sigmoid function.

### Gradient
The gradient of a function at a specific point is the vector in which the change of the function in that direction is maximal. In other words, give a function $f(x)$, where $x=(x_1, \ldots, x_n)$, the gradient of $f$ is equal to $(\frac{\partial f}{\partial x_1}, \ldots, \frac{\partial f}{\partial x_n})$.

### Gradient Descent
Gradient Descent is a technique used to find a local minimum of a function. The input is initialized to arbitrary values. Basically, gradient descent repeatedly moves the input a small amount, called the learning rate, in the direction of the gradient.

### Loss Function
The loss function of a model is how close to the correct answers a model is on a set of inputs.

# 3. Building a Model

A convnet consists of two parts, the convolutional base and the classifier.

### Convolutional Base
The conv base consists of mutiple groups of convolutional filters and max pooling layers. The first convolution layer describes specific details, such as edges of a picture, while later convolutional layers describe more abstract concepts, such as an ear or an eye.

Convolutional bases can generally be reused for different projects, or slightly modified between them.

An example of buiilding a base from scratch is as follows:

In [4]:
from keras import layers
from keras import models

model = models.Sequential()
# conv layers
# 1
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D(2, 2))
# 2
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D(2, 2))
# 3
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D(2, 2))
# 4
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D(2, 2))

model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_5 (Conv2D)            (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 15, 15, 128)      

Alternatively, you could use a pre-trained model, such as VGG-16:

In [3]:
%tensorflow_version 2.x

import tensorflow as tf
from tensorflow.keras.applications import VGG16

conv_base = VGG16(
    weights='imagenet', 
    include_top=False, 
    input_shape=(150, 150, 3))
conv_base.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 150, 150, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 150, 150, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 150, 150, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 75, 75, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 75, 75, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 75, 75, 128)      

When using a pre-trained base, it's also important to make sure that the base is untrainable, so that the weights do not get messed up.

In [0]:
conv_base.trainable = False

### Classifier
The classifier consists of a series of dense layers. It takes the output from the convolutional base and turns it into the probability of some set of classes.

# 4. Compiling a Model

### Optimizer
Optimizers are ways of updating a models parameters in response to the loss function. Gradient descent is an example of an optimizer in that it changes the weights in the direction of the gradient of the loss function.

An example of compiling a model with RMSprop is as follows:

In [0]:
model.compile(
    loss='binary_crossentropy', 
    optimizer=optimizers.RMSprop(lr=2e-5), 
    metrics=['acc'])

### Learning Rate
The learning rate is a parameter passed into an optimizer that controls how fast the model learns. If the model learns too fast, it could overshoot the minimum and end up with a suboptimal result.

# 5. Training a Model

Two problems that can occur during training are overfitting and underfitting.

### Overfitting
Overfitting is when your model is too complex for the problem you are working on. The model would be very accurate on the training data, but not very accurate on the validation data. One way to deal with overfitting is to change the training data a bit, such as rotating or scaling images. Another way is to add a dropout layer, which removes a few nodes in the network each time it trains. Also, making sure to get redundant input features is important.

### Underfitting
Underfitting is when the model is not complex enough to fit the data. The model either doesn't have very good accuracy on the train data or it takes a long time to train. The way to fix this is to just increase the complexity of the model.

# 6. Finetuning a Model

Instead of creating a conv base from scratch, it might be worthwile reusing a pre-trained base, such as VGG-16. Since these are trained on a large database of images, they are very generalized. Then, you can attach your own classifier to it. The conv base should be locked from training, or else it might change dramatically during the initial training.

After the initial training, when the classifier is more adapted to the base, a few layers near the bottom of the base can be unfrozen to finetune them. This generally leads to better results than just training the classifier, but can also lead to more overfitting.

In [0]:
conv_base.trainable = True

set_trainable = False
for layer in conv_base.layers:
  if layer.name == 'block5_conv1':
    set_trainable = True
  if set_trainable:
    layer.trainable = True
  else:
    layer.trainable = False

What this code does is it sets the bottom few layers of the conv base to be trainable.