<a href="https://colab.research.google.com/github/anki079/CAP4630_AI_Fall2019/blob/master/HW4/CAP4630_HW4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#CAP 4630: Homework 4, Fall 2019

####Name: Ankita Tripathi

The goal of this assignment is to provide a summary and description of the different concepts, methods, and algorithms that have been learned over the duration of this course. 

**Note**: Some of the code snippets included in this notebook have been taken from course materials provided by Dr Wocjan. These course materials can be found at: https://github.com/schneider128k/machine_learning_course

#**0. General Concepts**

The distinction between the fields of Artificial Intelligence, Machine Learning, and Deep Learning was made clear in this class. 
It was understood that Machine Learning fell under the umbrella of Artificial Intelligence. Deep Learning was further encapsulated in Machine Learning. 

Note: Some of the following concepts described in this section were beyond the scope of this course and thus were not covered in detail.

###**Artificial Intelligence (AI)**
Artificial Intelligence may be defined as a computer system that is able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. It deals with the simulation of intelligent behavior in computers. The collection of all methods in AI research that are based on symbolic representations of problems, logic and, search is called symbolic AI or GOFAI ("Good Old-Fashioned AI"). It can be thought of as a system that takes in an input and a set of rules and produces an output.

Input ----------> +-------------+           

>>> |...............| ----------> Output

Rules ---------->  +------------+

###**Machine Learning (ML)**
Machine Learning allows computers to make predictions without any explicit programming. ML algorithms are set up to adjust their output based on the data they are passed. Machine Learning does not require human interference to make changes. Unlike symbolic AI, machine learning has the ability to modify itself when more data is presented to it.

Input ----------> +-----------+           
                   
>>>  |.............| ----------> Rules

Output ------->  +------------+


###**Deep Learning (DL)**
Deep Learning is a subset of ML wherein the algorithms roughly try to mimic the processing patterns that occur in the human brain. For instance, ML requires the features to be given to the program while a deep learning model can discover the features required for classification by itself.

###**Supervised Learning**
Supervised learning is a branch of ML where the model is provided with labeled training data. It finds patterns between the data and labels and expresses them as mathematical functions. For a given input feature, the system is explicitly told what the expected output label is, hence the name Supervised Learning.

###**Unsupervised Learning**
The goal of unsupervised learning is to identify meaningful patterns in the data. To that end, the program needs to learn from an unlabeled dataset. The model is not given any rules and must self-infer them to categorize the data given to it.

###**Reinforcement Learning (RL)**
In RL, the model (also known as "agent") proceeds in an environment where it can be in a certain number of states, and each state has an action and reward associated with it, on the basis of which the agent can transition into another state. Generally the goal of RL is for the agent to learn which actions to take in order to maximize the total reward it receives.

#**1. Building a model** 

The structure of a neural network is loosely analogous to the networks of neurons found in human brains where one group of neurons firing triggers activity in some other neurons. A neural network in AI is composed of a series of layers. The most basic neural network will have an input layer, an intermediate hidden layer, and an output layer. The hidden layer can be of different types depending on the purpose and architecture of the neural network. 

![](https://i.imgur.com/1xyl55E.jpg)

In a **convolutional neural network** (CNN), the hidden layers can be thought of as a set of features based on the previous layer. Successive hidden layers are higher features based on previous features. 

![](https://miro.medium.com/max/2510/1*vkQ0hXDaQv57sALXAJquxA.jpeg)

The input to the CNN is a feature map as a 3-D matrix, where the first 2 dimensions are the length and width of the images in pixels and the third dimension is the number of channels in an image.

The CNN comprises a stack of modules, wherein each stack performs 3 operations, as follows:
1. _Convolution_: A convolution extracts tiles of the input feature map, and
applies filters to them to compute new features, producing an output feature map, or convolved feature (which may have a different size and depth than the input feature map). This operation is performed by essentially sliding a window known as a "kernel" over the input image, and computing a dot product of the vectors to extract features.
2. _ReLU_: After each convolution operation, the CNN applies a Rectified
Linear Unit (ReLU) transformation to the convolved feature, in order to introduce nonlinearity into the model.
3. _Pooling_: After ReLU comes a pooling step, in which the CNN downsamples the convolved feature (to save on processing time), reducing the number of dimensions of the feature map, while still preserving the most critical feature information. A common algorithm to perform this is called max pooling.

At the end of a CNN there is typically one (or more) dense layer (a layer in which every node in that layer is connected to every node in the next layer). These layers perform classification based on the features extracted by the convolutions. 

Typically, the final fully connected layer contains a softmax activation function, which outputs a probability value from 0 to 1 for each of the classification labels the model is trying to predict.

The following code snippet is an example implementation of how to build a CNN in keras.

In [0]:
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
model.summary()

Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dens

#**2. Compiling a model** 

In Keras, we can compile a model as follows:


In [0]:
network.compile(optimizer, loss=None, metrics=None, loss_weights=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None)

As we can see, the compile() function takes a number of arguments. The required arguments from this list are _optimizer, loss_, and _metrics_. 

* **optimizer** : This is a string (name of optimizer) or optimizer instance.

>> Most deep learning algorithms involve optimization of some sort. Optimization refers to the task of either minimizing or maximizing some function f(x) by altering x. Examples of optimizers are the Stochastic Gradient Descent (SGD) and RMSprop functions. Optimizers depend on various parameters but one parameter common to all optimizers is the learning rate.

>> The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. There’s a Goldilocks learning rate for every regression problem.
If you know the gradient of the loss function is small then you
can safely try a larger learning rate, which compensates for
the small gradient and results in a larger step size.

* **loss** : This is a string (name of objective function) or objective function or Loss instance. 

>> Loss is the penalty for a bad prediction, i.e it is a number indicating how bad the model’s prediction was on a single example. If the model’s prediction is perfect, the loss is zero; otherwise, the loss is greater.
The goal of training a model is to find a set of weights and
biases that have low loss, on average, across all examples. Examples of  loss functions are mean squared error (MSE), categorical crossentropy, hinge etc.

* **metrics** : This is a list of metrics to be evaluated by the model during training and testing. 

>> Typically you will use metrics=['accuracy']. To specify different metrics for different outputs of a multi-output model, you could also pass a dictionary, such as metrics={'output_a': 'accuracy', 'output_b': ['accuracy', 'mse']}. You can also pass a list (len = len(outputs)) of lists of metrics such as metrics=[['accuracy'], ['accuracy', 'mse']] or metrics=['accuracy', ['accuracy', 'mse']].

The following code snippet is an example implementation of the compilation step.


In [0]:
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

#**3. Training a model**

Training a model simply means learning (determining) good values for all the weights and the bias from labeled examples. In supervised learning, a machine learning algorithm builds a model by examining many examples and attempting to find a model that minimizes loss; this process is called empirical risk minimization. 

At this step the model is essentially looking at the input data, making a prediciton, and checking if it is right or wrong. The training step takes place after we have built and compiled our model.

The following code snippet is an example implementation of the training step.


In [0]:
epochs = 10
history = model.fit(train_images, 
                    train_labels, 
                    epochs=epochs, 
                    batch_size=64,
                    validation_data=(test_images, test_labels))

Instructions for updating:
Use tf.cast instead.
Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Generally once training has concluded, it may be observed that the training loss decreased with every epoch, and the training accuracy increased with every epoch. That is to be expected with gradient descent optimization - the quantity you're trying to minimize should be less with every iteration.

* **Underfitting** : An observation may be made wherein the model fails to "learn" from the data and cannot make reliable predictions. The model may be too simple, resulting in low generalization. This is called underfitting.

* **Overfitting** : In contrast, another observation may be made wherein the training accuracy increases with every epoch but the test accuracy may stagnate after increasing up to a certain point. This means that the model failed to generalize to data outside the training set an, and instead learned representations specific to the training data. The model overoptimized on the training set. This is called overfitting. We can use techniques such as data augmentation and dropout to combat overfitting. 

>>Data augmentation is a strategy that enables significantly increasing the diversity of data available for training models, without actually collecting new data. 

>>Dropout is a process by which a single model can be used to simulate having a large number of different network architectures by randomly dropping out nodes during training. 

![](https://i.imgur.com/6CigYyb.jpg)

# **4. Finetuning a pretrained model**

After a model has been trained, it may be possible to increase its performance even further. One way to do this is to "fine-tune" the weights of the top layers of the pretrained model alongside the training of the top-level classifier. 

It must be noted that fine-tuning has the following caveats:
* Fine-tuning should be attempted only after the top-level classifier has been trained with the pretrained model set to non-trainable.
* Additionally, only the top layers of the pre-trained model are fine-tuned rather than all layers of the pretrained model because in a convnet, a layer is more specialized the higher up it is.

As we go higher up, the features are increasingly specific to the dataset that the model is trained on. In order to fine-tune a model, we need to adapt these specialized features to work with the new dataset. 

We proceed to fine-tune a model by setting the top layers of the pretrained model to be trainable, then recompiling the model so that the changes to the top layers take effect, and resuming the training.
In addition to this, using a smaller learning rate is also beneficial, since we expect the pre-trained weights to be quite good already as compared to randomly initialized weights, and we do not want to distort them too quickly and too much.

The following code snippet is an example implementation of how to fine-tune a model.

In [0]:
conv_base.trainable = True

set_trainable = False
for layer in conv_base.layers:
  if layer.name == 'block5_conv1':
    set_trainable = True
  if set_trainable:
    layer.trainable = True
  else:
    layer.trainable = False

In [0]:
# compile model

model.compile(
    loss='binary_crossentropy',
    #
    # choose a smaller learning rate
    #
    optimizer=optimizers.RMSprop(lr=1e-5), 
    metrics=['acc'])

# train

history = model.fit_generator(
    train_generator,
    steps_per_epoch=100,
    epochs=100,
    validation_data=validation_generator,
    validation_steps=50)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78