# Lesson 5 notes: convelutional neural networks

Convelutional neural networks are a different type of networks, useful for:
1. Voice User Interface: Wavenet
2. Natural Language Processing
3. Computer Vision 



## Features: 
The shapes and colors that define an image. Part of the job of a CNN is to identify the features in order to detect patterns in an image, such as edges.

For example, in the case of a dog image the features can be the size, shape, legs, and so on. 

What makes up an image is the **features** of an image 

## How Computers Interpret Data

1. **Grayscale image:** a grid of values, maid up of pixels with numerical values. 
2. **Pre-process data:** Making each pixel between 0 and 1 instead of 0 to 255 (normalization). This helps the algorithm work better when detecting features. 
3. **MLP**: Multi-layer Perceptron, the method learned in the previous lesson takes vector as input such that a 28 by 28 images will simply be a vector of 784 units. 
4. **Flattening**: The process of converting an image to a vector, row 1 is first part, row 2 is second part, etc..


Keep in mind that although flattenning is used in MLP, this strategy is not perfect because the network doesn't learn based on which pixels are next to each and with real world data an MLP is not the best choice.  

## Multi-Layer Perceptions

Multi-Layer Perceptions(MLP) are the networks we have been construcing during lesson 4. This networks consisted of fully connected layers starting from input layers to hidden layers and finally to an output layers. 

The task of classifying image in an MLP is a multi-process step. 
1. First, Visualize the data set to understand input size and the task. 
2. Normalize data, applying any transformations required such as converting to a tensor. 
3. Define a model and a pre-trained network. 
4. Train the model
5. Test the model 

Next, it's time to start working convulutional neural networks which are another way of working with images. 

**Important Definitions**:
1. **Class Score**: The output of the network, Indicates how sure a network is that a given input is of a specific task. 
2. **Loss**: Measure any difference from a predicted and a true class
3. **Backrpopagation**: Quatify how bad a particular weight is in making a mistake
4. **Optimization**: Gives us a way to calculate a better weight value
5. **Cross-Entropy Loss**: Looks at the label's probability value and takes the negative Loss log of that value. The loss is lower when the loss and prediction agree. 
6. **Types of optimizers**: See code

In [42]:
from torch import optim
class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 64)
        self.fc4 = nn.Linear(64, 10)
        
    def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.log_softmax(self.fc4(x), dim=1)
        
        return x    
model = Classifier()

# Optimizers: 
adam = optim.Adagrad(model.parameters(), lr=0.001)
adadelta = optim.Adadelta(model.parameters(), lr=0.001)
SGD = optim.SGD(model.parameters(), lr=0.001)
rmsProp = optim.RMSprop(model.parameters(), lr=0.001)
Adadelta = optim.Adadelta(model.parameters(), lr=0.001)

""" Note that the best optimizers are SGD and Adam because they are combinations of the other optimizers"""

' Note that the best optimizers are SGD and Adam because they are combinations of the other optimizers'

## Convolutional Neural Networks 


Convolutional Neural Networks are a different types of network, much better for real-world data where the actual image may be anywhere around the large image. Some reason of this include:

1. MLPs are flatten the data, converting it into a vector. Due to this, MLPS and are not aware what the initial image looks like and they can't understand the relationship between a pixel and the pixel above it. 
2. In comparison, CNN understand that pixels closer to each other are more related by accepting matrices as input. 
3. MLPs only use fully connected layers, where CNNs use sparsely connected layers. This means that the CNN doesn't immediately look at whole image, but it looks at parts of the image in order to make up the bigger picture. 
4. In a CNN every hidden layer recieves a part of the data the previous layer had. 


Basically, in any case where data is laid in a more complciated arichtecture such that the data can be anywhere in the images instead of simply in the center, a CNN has a serious advantage. 


## Filters

The job of filters in the convolutional layer is to extract information from the image such as the edges. **Intensity** is the measure of brightness, intensity can be used to detect the shape for tasks like distinguishing boundries between people and background. 

**Frequency** is the rate of change whhere high frequency means a high change in colors and low frequency is simply a low or no change. For example, in an image of person with white background the person outline would have high frequency. 

**High-Pass Filter** is the process of emphasizing edges and darking out areas where the roc is low and changing images where the roc is high to white. 

**Convolution Kernels**: Taking a kernel and passing it through an image

The process of using a high pass filter involves taking a kernel, and multiplying it by each set of pixels in the images. A high value indicates an edge. However, what happens when you get to the edges of the image? 

## Convolutional Layer

* A stack of feature maps - one feature map for each filter 
* Produced by applying a series of different images filters, aka convolutionals kernels. 
* The amount of kernels equals the amount of images produced 


**Hyperparameters:**

* Increasing the number of filters increases the number of nodes.
* Increase size of each filter increases the size of the patterns
* Stride: The amount by which the filter slides over the window.


One possible issue with the convolutional layer is dealing with the nodes outside of the image. Solutions include:
1. Ignoring them - this might cause an issue of the edges being undefined
2. Pad the image with 0 to give the filter more space to move. 


**Define a convolutional layer:**

1. Define the layers of the model in the `__init__` function
2. Use `super(Net, self).__init__()` just like in an MLP 
3. Initialize the height and width based on the filter value
4. Set self.conv `nn.Conv(1, gray_scalefilters, kernel_size, bias)` 
5. Set `self.conv.weight` to be `torch.nn.Parameters(weight)`
6. In the forward function, calculate the output of the convolutional layer using `self.conv(x)` 
7. Run the output through a relu activation 
9. Return the output, and the value before activation. 

**Example:**

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
    
# define a neural network with a single convolutional layer with four filters
class Net(nn.Module):
    
    def __init__(self, weight):
        super(Net, self).__init__()
        # initializes the weights of the convolutional layer to be the weights of the 4 defined filters
        k_height, k_width = weight.shape[2:]
        # assumes there are 4 grayscale filters
        self.conv = nn.Conv2d(1, 4, kernel_size=(k_height, k_width), bias=False)
        self.conv.weight = torch.nn.Parameter(weight)

    def forward(self, x):
        # calculates the output of a convolutional layer
        # pre- and post-activation
        conv_x = self.conv(x)
        activated_x = F.relu(conv_x)
        
        # returns both layers
        return conv_x, activated_x
    
# instantiate the model and set the weights
weight = torch.from_numpy(filters).unsqueeze(1).type(torch.FloatTensor)
model = Net(weight)

# print out the layer in the network
print(model)

## Pooling Layers

* While training using CNN, a large number of filters may be required for this large data sets.
* Pooling layers reduce the extra dimensions created by the convolutional layer 
* Max pooling layer: Given a stack of feature maps returns the greatest value in each feature map
    * Define a window size and stride (eg. Window 2X2, Stride: 2)
    * Start with the top left corner window 
    * Take the maximum value in the window 
    * Continue for all features 
    * Finally, the final width and height are half of the previous convolutional layer. 
    

* Average pooling layer: Chooses an average pixel values in a given window size 




## Increasing Depth

1. Using convolutional layers to make the array deeper
2. Maxpooling layers will be used to decrease the xy dimension 

## Other Notes

1. Lesson 5.35: To show an image, it is important to first convert the image into a numpy image.  
1. When defining a network, don't forget to include the linear transformations. 
* [Wavenet model](https://deepmind.com/blog/wavenet-generative-model-raw-audio/): A CNN application, taking wavefroms recorded from humans used to generate an AI similiar to humans.