In [1]:
import torch # used for all pytorch things 
import torch.nn as nn # for torch.nn.Module, the parent for PyTorch models 
import torch.nn.functional as F # for activation function 

![alt text](imgs/screenshot1.png)


Above is LeNet-5, one of the earliest convolutional nns. It was built to read small images of handwritten numbers (MNIST dataset) and correctly classify which digit was represented in the image. 

how it works:
- Layer C1 is a convolutional layer. It scans the input image for features it learned during training. It outputs a map of where it saw each of its learned features in the image. the "activation map" is downsampled in layer S2. 

- Layer C3 is another convolutional layer. This layer scans C1's activation map for combinations of features. It also puts out an activation map describing the spatial locations of these feature combinations which is downsampled in layer S4. 

- Finally the fully-connected layers at the end, F5, F6, and OUTPUT are a classifier that takes the final activation map and classifies it into one of ten bins representing 10 digits.

In code the nn is represented by: 

In [None]:
class LeNet(nn.Module):

    def __init__(self):
        super(LeNet, self).__init__()
        # 1 input image channel (black & white), 6 output channels, 3x3 square convolution 
        # kernel 
        self.conv1 = nn.Conv2d(1, 6, 3) # one input channel (grayscale image), 6 output channels (feature maps), kernel size (3x3) 
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine ooperation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6 * 6 from image dimension 
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

        def forward(self, x):
            # Max pooling over a (2,2) window 
            x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
            # if the size is a square you can only specify a single number 
            x = F.max_pool2d(F.relu(self.conv2(x)), 2)
            x = x.view(-1, self.num_flat_features(x))
            x = F.relu(self.fc1(x))
            x = F.relu(self.fc2(x))
            x = self.fc3(x)
            return x 
        
        def num_flat_features(self, x):
            size = x.size()[1:] # all dimensions except the batch dimension 
            num_features = 1
            for s in size: 
                num_features *= s 
            return num_features


The above code shows a typical PyTorch Model: 
- inherits from ```torch.nn.Module``` - modules may be nested, in fact, even the ```Conv2d``` and ```Linear``` classes inherit for ```torch.nn.Module``` 

- A model will have a ```__init__()``` funtion where it instantiates its layers, and loads any data artifacts it might need (e.g. an NLP model might load a vocabulary)

- A model will have a ```forward()``` function. This is the actual computation happens. An input is passes through the network layers and various functions to generate output. 

- Aside from these facts we can build our model like any other Python class adding whatever properties and methods needed to support model computation. 



understanding the code 
## structure 
```__init__()``` 
gathers the tools needed to start 

```conv1, conv2``` 
concolutional layers "feature detectors" such as edges curves, corners. 
```conv1``` - takes one input channel and creates 6 "feature maps" using a 3x3 kernel. 

```self.fc*``` 
"Fully connected" (linear) layers. Once convolution layers have found the features these layers act as a traditional brain to make snese of them and decide: "Based on these curves, this is likely the number 5."

## Data Flow 
```forward()``` method defines the path the image takes through the network. 

step 1. <strong>Convolution</strong>: scans the image for patterns 

step 2. <strong>ReLU</strong>: Activation function that turns negative values to zero. (add non-linearity)

step 3. <strong>Max Pooling</strong>: shrinks the image size by half to reduce computation and focus on the most important features. 

step 4, <strong>Flattening</strong>: Converts the 3D cube of the data into a 1D long list of numbers, so the "linear" layers can read it. 

step 5. <strong>Output</strong>: The final layer ```fc3``` produces 10 numbers. The highest number represents the networks "guess"





Lets instatiate this object and run a sample input through it