# CNN with Pytorch
## An Inside look into Layers

### Architecture

#### Parameter Vs. Argument
- Parameter: used inside the function definition. They are place holders.
- Argument: pass in when the functions are called.
- Names are the parameters like in_channels, out_channels, kernel_size are parameters and the values we passed in are the arguments
##### Two types of parameters:
- Hyperparameters: values chosen manually and arbitrarily by trial and errors. For CNN, we chose kernel_size, out_channels, and out_features manually.
    - Kernel_size: sets the size of the filter, kernel and filter words are interchangeable.
    - out_channels: Setting the number of filters. How many layers we want. So, when defining this, we define the number of output_channels too. Also called feature maps. Dealing with linear layers, the outputs are just rank one tensors. So, we called them out_features.
    - out_features: How many nodes do we want in our layers.

We increase the out_channels for the convolutional layers and decrease the number of out_features in linear layers.
    
- Data dependent hyperparameters: Values depend on data.
    - in_channels: depends on the number of channels present in the images. If it is grascale, this would be 1.
    - out_features: What categories we want for the output


### Learnable Parameters:

Values learned during the training processes. We typically start out with a set of arbitrary values, and these values are updated in an iterative fashion as the network learns.

In [5]:
import torch
import torch.nn as nn

In [6]:
class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)
    
    def forward(self, t):
        # implement the forward pass
        return t

In [7]:
network = Network()
print(network)

# stride tells the filter how far to move after making calculation at one step
# 

Network(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)


In [10]:
# We can override the functionality in a class by using:
def __repr__(self):
    return "lizardnet"
## object

In [11]:
network.conv1 # prints out each layer

Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))

In [12]:
network.fc2 # prints out the linear layers

Linear(in_features=120, out_features=60, bias=True)

In [13]:
network.fc2.weight # randomly generated weights

Parameter containing:
tensor([[ 0.0678,  0.0245, -0.0799,  ...,  0.0083,  0.0099,  0.0824],
        [ 0.0149, -0.0048, -0.0633,  ...,  0.0733, -0.0556,  0.0379],
        [ 0.0190,  0.0604, -0.0444,  ..., -0.0388, -0.0067, -0.0772],
        ...,
        [-0.0204,  0.0587, -0.0345,  ...,  0.0458, -0.0430, -0.0665],
        [-0.0631,  0.0422,  0.0673,  ..., -0.0908,  0.0069, -0.0244],
        [ 0.0136, -0.0552,  0.0018,  ..., -0.0795,  0.0826,  0.0712]],
       requires_grad=True)

In [14]:
network.fc2.weight.shape

torch.Size([60, 120])

In [15]:
conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
network.conv1.weight.shape

torch.Size([6, 1, 5, 5])

The shape tells us everything we need to know about the tensor.
Tor the shape then, we can get what we passed in nn.Conv2d
shape = 6,
in_channels= 1,
the last two 5s stand for filter of size 5*5

In [16]:
network.conv2.weight.shape # shape of second layer's weights

torch.Size([12, 6, 5, 5])

12 filters, 6 input channels coming from the previous layer, the last two 5s stand for filter size 5*5

1. All filters are represented using a single tensor.
2. Filters have depth that accounts for the color channels.

In [17]:
# We can get a single filter out by:
network.conv2.weight[0].shape
# prints out a filter of size 5*5 and a depth of 6.


torch.Size([6, 5, 5])

In [19]:
# Let's take a look at the weight tensors for linear layers
print(network.fc1.weight.shape)
print(network.fc2.weight.shape)
print(network.out.weight.shape)

torch.Size([120, 192])
torch.Size([60, 120])
torch.Size([10, 60])


In [20]:
# 120, 60, and 10 stands for output_features.
# width of 192, 120, and 60 stands for the input_features that we defined in the linear layers
# Since we are referring to matrix multiplication, that is why we have the output features listed first
# and the correspond to the height and input_features are the width.


In [21]:
for param in network.parameters():
    print(param.shape)

torch.Size([6, 1, 5, 5])
torch.Size([6])
torch.Size([12, 6, 5, 5])
torch.Size([12])
torch.Size([120, 192])
torch.Size([120])
torch.Size([60, 120])
torch.Size([60])
torch.Size([10, 60])
torch.Size([10])


In [24]:
for name, param in network.named_parameters():
    print(name, '\t\t', param.shape)

conv1.weight 		 torch.Size([6, 1, 5, 5])
conv1.bias 		 torch.Size([6])
conv2.weight 		 torch.Size([12, 6, 5, 5])
conv2.bias 		 torch.Size([12])
fc1.weight 		 torch.Size([120, 192])
fc1.bias 		 torch.Size([120])
fc2.weight 		 torch.Size([60, 120])
fc2.bias 		 torch.Size([60])
out.weight 		 torch.Size([10, 60])
out.bias 		 torch.Size([10])


In [25]:
# bias is also a learnable parameter for the linear layers. 