# General Concepts

**Artificial Intelligence (AI)** is a very broad term, and what is considered "AI" changes as new advancements are made, but it generally refers to machines simulating human intelligence.

A subfield within AI is **Machine Learning**, which is the concept that machines can perform a specific task with minimal human intervention.

Within Machine Learning, there is the subfield **Deep Learning**, which centers on using **deep neural networks**. These neural networks typically take in raw data and are able to modify **weights** and **biases** to improve the accuracy and efficiency of the network.

There are different kinds of deep learning, namely: **Supervised**, **Unsupervised**, and **Reinforcement** learning. Supervised learning models learn from labeled training data, unsupervised models learn from unlabeled training data, and reinforcement models learn from a given "punishment" or "reward" which essentially rates their prediction.

A specific type of neural network is the **Convolutional Neural Network (CNN)**, which excels when dealing with images as input data.

# Building a Model

When building a neural network model, it is important to understand the structure of a neural network.

Neural networks consist of **nodes**, which take in several input weights and a bias, and produce an output. A neural network consists of several connected layers of these nodes, which ultimately produce an output in the final layer, which is the output of the entire network.

Nodes may sometimes have **activation functions**, which are functions applied to nodes that will produce a certain output. For example, the **sigmoid** activation function will produce an output from 0 to 1. The sigmoid activation function is particularly useful for **multi-class multi-label classification** and **binary classification**. Another common activation function is **softmax**. Softmax will take a layer of neurons and produce a probabiltiy distribution of that layer. Softmax is often used for **multi-class single label classification**.

A CNN takes this concept a step further. A CNN takes in an image tensor as input (length **x** width **x** channel amount). After the input layer, several **convolutions** will occur onto the image from several **filters**, or kernels, throughout the entire image. The results of these convolutions is passed onto a **pooling layer**, where sets of pixels are condensed into a single pixel, thus shrinking the length and width of the result of the convolutional layer. This result is then passed onto pontentially more convolutional and pooling layers, repeating the process until satisfactory.

Note: The convolutional and pooling layers are efficient for images because the filters perform their own feature extraction on the images, and can recognize patterns and features from anywhere on an image (this kind of feature-location independence is lost when using passing in images to a basic neural network).

After the final pooling layer, the ouput tensor of that layer is flattened into a vector, and this vector is passed into a set of dense layers. At the end of these layers, an output will be produced (the output depends on the task, i.e. **image classification**).

Below is an example of how to construct the layers of a CNN using Python: (code from Dr. Wocjan)

In [0]:
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
model.summary()

# Compiling a Model

When compiling a model, there are **hyperparameters** to consider, such as **learning rate**, **batch size**, **number of epochs**, and so on. These are all parameters that are explicitly defined by a human before the network begins training.

The learning rate affects the rate at which **gradient descent** occurs. Gradient descent is the process of minimizing the **loss** of a neural network by finding a local minimum, which can be obtained by traveling in the opposite direction of the gradient. 

The loss of a neural network represents how far away a neural network's predictions are from correct. The ideal loss is 0. There are different loss functions for different types of neural networks. Some examples include **Mean Squared Error**, **Binary Cross Entropy**, and **Categorical Crossentropy**.

# Training a Model

When training a model, it is important to avoid underfitting or overfitting.

**Underfitting** occurs when the model is too simple to describe the data. To combat this, increase the complexity of the model by say, increasing the number of layers, or neurons in a layer.

**Overfitting** occurs when the model may be too complex for the data, and the model simply remembers the training data and performs poorly on the testing data. To combat this, try reducing the complexity of the model (there are many other methods as well, such as using **dropout**). If training a CNN, methods such as **data-augmentation** and **fine-tuning** can be helpful as well.

Also, as a general tip, make sure to use separate data for training and for validation, as validating on the training data could result in just memorization.



# Fine-tuning a Pretrained Model

**Fine-tuning** is the defined process of modifying a model by replacing its final layers. Fine-tuning is used when working with CNNs because it can be rather difficult to gather tons of image data and train a massive CNN, so it is efficient to instead take a pre-existing CNN like VGG or ResNet and append a unqiue set of fully connected layers at the end of it.



The steps of fine-tuning are as follows:

1) Remove the fully connected nodes at the end of the network and replace them with new ones.

2) Freeze the earlier part of the network (the convolutional and pooling layers).

3) Train the model.

4) (optional) Unfreeze the network and run through a second round of training.


Below is an example of what freezing the convolutional and pooling layers in a CNN might look like in Python: (code from Dr. Wocjan)

In [0]:
conv_base.trainable = True

set_trainable = False
for layer in conv_base.layers:
  if layer.name == 'block5_conv1':
    set_trainable = True
  if set_trainable:
    layer.trainable = True
  else:
    layer.trainable = False