In [4]:
import torch # used for all pytorch things 
import torch.nn as nn # for torch.nn.Module, the parent for PyTorch models 
import torch.nn.functional as F # for activation function 

![alt text](imgs/screenshot1.png)


Above is LeNet-5, one of the earliest convolutional nns. It was built to read small images of handwritten numbers (MNIST dataset) and correctly classify which digit was represented in the image. 

how it works:
- Layer C1 is a convolutional layer. It scans the input image for features it learned during training. It outputs a map of where it saw each of its learned features in the image. the "activation map" is downsampled in layer S2. 

- Layer C3 is another convolutional layer. This layer scans C1's activation map for combinations of features. It also puts out an activation map describing the spatial locations of these feature combinations which is downsampled in layer S4. 

- Finally the fully-connected layers at the end, F5, F6, and OUTPUT are a classifier that takes the final activation map and classifies it into one of ten bins representing 10 digits.

In code the nn is represented by: 

In [5]:
class LeNet(nn.Module):

    def __init__(self):
        super(LeNet, self).__init__()
        # 1 input image channel (black & white), 6 output channels, 3x3 square convolution (kernel size) 
        # kernel 
        self.conv1 = nn.Conv2d(1, 6, 3) # one input channel (grayscale image), 6 output channels (feature maps), kernel size (3x3) 
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine ooperation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6 * 6 from image dimension 
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2,2) window 
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
        # if the size is a square you can only specify a single number 
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x)) # flattens the image 
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x 
        
    def num_flat_features(self, x):
        size = x.size()[1:] # all dimensions except the batch dimension 
        num_features = 1
        for s in size: 
            num_features *= s 
        return num_features


The above code shows a typical PyTorch Model: 
- inherits from ```torch.nn.Module``` - modules may be nested, in fact, even the ```Conv2d``` and ```Linear``` classes inherit for ```torch.nn.Module``` 

- A model will have a ```__init__()``` funtion where it instantiates its layers, and loads any data artifacts it might need (e.g. an NLP model might load a vocabulary)

- A model will have a ```forward()``` function. This is the actual computation happens. An input is passes through the network layers and various functions to generate output. 

- Aside from these facts we can build our model like any other Python class adding whatever properties and methods needed to support model computation. 



understanding the code 
## structure 
```__init__()``` 
gathers the tools needed to start 

```conv1, conv2``` 
concolutional layers "feature detectors" such as edges curves, corners. 
```conv1``` - takes one input channel and creates 6 "feature maps" using a 3x3 kernel. 

```self.fc*``` 
"Fully connected" (linear) layers. Once convolution layers have found the features these layers act as a traditional brain to make sense of them and decide: "Based on these curves, this is likely the number 5."

## Data Flow 
```forward()``` method defines the path the image takes through the network. 

step 1. <strong>Convolution</strong>: scans the image for patterns 

step 2. <strong>ReLU</strong>: Activation function that turns negative values to zero. (add non-linearity)

step 3. <strong>Max Pooling</strong>: shrinks the image size by half to reduce computation and focus on the most important features. 

step 4, <strong>Flattening</strong>: Converts the 3D cube of the data into a 1D long list of numbers, so the "linear" layers can read it. 

step 5. <strong>Output</strong>: The final layer ```fc3``` produces 10 numbers. The highest number represents the networks "guess"

## Flattening Math 
```self.fc1 = nn.Linear(16 * 6 * 6, 120)```
The network expects the input to be flattened to a single vector of 576 elements. (16 * 6 * 6, 120)
- 16 is the number of channels from the previous layer. 

- 6 * 6 is the height and width of the data after it has been shrunk down by pooling layers. 

```num_flat_features``` calculates this total (16 * 6 * 6 = 576) automatically. 





## fc Layers 
```
self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6 * 6 from image dimension 
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
```
This is a decision making funnel. Each layer condenses the information further to reach the final answer.

fc1 - input size: 576, output size: 120; Takes all the detected patterns (edges, 
                                         circles) and starts combining them into "parts" of a number.

fc2 - input size: 120, output size: 84;  Further compresses these parts into 
                                         abstract concepts.

fc3 - input size: 84, output size: 10;   The final "vote." Each of the 10 
                                         outputs represents a digit (0â€“9). The highest value is the winner.

### Function 
The operation happening inside these layers is the affine transformation
$$y = xA^T + b$$

## Forward Pass 
"Engine room" of the model. 
```x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))``` performs convlution, ReLU activation, and max pooling all in once line. 

```conv1(x)``` - looks for features

```F.relu(...)``` - Rectified Linear Unit, If a value is negative, turn it to 0. If it's positive, keep it.

```F.max_pool2d(...,(2,2))``` - Downsampling looking at 2x2 squares and keeps the largest value (throwing the other 3 away)

### Flattening the Data 
Before the data can enter he Linear Layers it has to be changed from 3D to 1D list. 

```x = x.view(-1, self.num_flat_features(x))``` 

```x.view()``` - PyTorch way of resizing a shape. 

```-1``` - Wildcard telling PyTorch "I don't know how many images are in the batch, so just figure that part out automatically."

The -1 is one of the most useful tricks in PyTorch. It stands for "everything else."

When you run x.view(-1, 784), you are telling PyTorch to rearrange those same 7,840 numbers into a 2D table (a matrix).
- **The 784 part**: This defines the Columns. You are saying "I want every image to be represented by a single horizontal line of 784 pixels.
- **"The -1 part**: This is an "Auto-Calculate" button for the Rows. PyTorch looks at the total numbers (7,840) and says: "If I need 784 columns, I must need 10 rows to fit all the data." ($7840 / 784 = 10$).

The New Shape: (10, 784)

```self.num_flat_features(x)``` - calculates total number of pixels. 

### The Fully Connected Layers 
Now that the data is a flat list of features it passes through the brain of the network. 

```
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
```

Will run through the first decision layer and apply activation function to keep important features. It will do the same for the second decision layer and finally the final decision to return. 


## Calculator (num_flat_features)
figures out exactly how many numbers are in a single image after it has passed through the convolutional layers.

```size = x.size()[1:]``` 

```x.size()``` will return the size of the tensor x in form of (batch_size, channels, height, width)

```[1:]``` asks for everything except the first number of the list (size becomes [16, 6, 6])

```
num_features = 1
for s in size: 
    num_features *= s 
return num_features
```
will calculate the number of features for each image in the batch 

Lets instatiate this object and run a sample input through it

In [9]:
net = LeNet()
print(net) # what does it tell us about itself 

input = torch.rand(1,1, 32, 32) # first parameter represents the batch_size the rest are (1 color channel, 32 height, 32 width)
print('\nImage batch shape:')
print(input.shape)

output = net(input) #forward is not called directly
print('\nRaw output:')
print(output)
print(output.shape)



LeNet(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

Image batch shape:
torch.Size([1, 1, 32, 32])

Raw output:
tensor([[ 0.0524,  0.0014,  0.0689, -0.0758,  0.0945, -0.0307, -0.1527, -0.0552,
          0.0888,  0.0073]], grad_fn=<AddmmBackward0>)
torch.Size([1, 10])


A subclass ```torch.nn.Module``` will reort the layers it created and their shapes & parameters. Provides an overview of a model if we want to get the gist of its processing. 

```input = torch.rand(1, 1, 32, 32)``` creates a dummy input representing a 32x32 image with 1 color channel. Usually this is an image tile loaded in and converted to a tensor of this shape. 

(1 batch_size, 1 color channel, 32 height, 32 width)

Output of net(input) is the models confidence that the input is a particular digit. (as of now since the model hasn't learned abnything yet lets not expect any signal in the output). 

The shape of the output batch dimension should match the input batch dimension. 

