Each layer in a neural network has two primary components:

1. A transformation (code)
2. A collection of weights (data or attributes)

So, a layer can be represented by a class.

#### Neural networks and layers in PyTorch extend the nn.Module class. This means that we must extend the nn.Module class when building a new layer or neural network in PyTorch.

### PyTorch nn.Modules have a forward() method
The tensor input is passed forward though the network.

Each layer has its own transformation (code) and the tensor passes forward through each layer. The composition of all the individual layer forward passes defines the overall forward pass transformation for the network.

The goal of the overall transformation is to transform or map the input to the correct prediction output class, and during the training process, the layer weights (data) are updated in such a way that cause the mapping to adjust to make the output closer to the correct prediction.

What this all means is that, every PyTorch nn.Module has a forward() method, and so when we are building layers and networks, we must provide an implementation of the forward() method. The forward method is the actual transformation.

When we implement the forward() method of our nn.Module subclass, we will typically use functions from the nn.functional package.

# Building a neural network in PyTorch

Steps :- 
1. Create a neural network class that extends the nn.Module base class.
2. In the class constructor, define the network’s layers as class attributes using pre-built layers from torch.nn.
3. Use the network’s layer attributes as well as operations from the nn.functional API to define the network’s forward    pass.

In [5]:
import torch 
import torch.nn as nn

In [6]:
# PyTorch neural network
class Network(nn.Module): # class extends the nn.Module class 
    def __init__(self):
        super().__init__() #calls the init function of super class 
        self.layer = None   #single dummy layer 

    def forward(self, t):
        t = self.layer(t)#the forward() function takes in a tensor t and transforms it using the dummy layer. 
        return t #After the tensor is transformed, the new tensor is returned.

In [32]:
#At the moment, our Network class has a single dummy layer as an attribute. 
#Let’s replace this now with some real layers that come pre-built for us from PyTorch's nn library
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)#linear, dense, and fully connected layer all are same
        self.out = nn.Linear(in_features=60, out_features=10)
    def forward(self, t):
        # implement the forward pass
        return t
 #def __repr__(self):
        #return ("AYUSH RANJAN"  )  

# the parameters of the layers in PyTorch

Each of our layers extends PyTorch's neural network Module class. For each layer, there are two primary items encapsulated inside, a forward function definition and a weight tensor.

The weight tensor inside each layer contains the weight values that are updated as the network learns during the training process, and this is the reason we are specifying our layers as attributes inside our Network class.

PyTorch's neural network Module class keeps track of the weight tensors inside each layer. The code that does this tracking lives inside the nn.Module class, and since we are extending the neural network module class, we inherit this functionality automatically.

### CNN Layer Parameters

Parameters are used in function definitions as place-holders while arguments are the actual values that are passed to the function.
types of parameters that we used when constructing our layers:

1. Hyperparameters
2. Data dependent hyperparameters

When we construct a layer, we pass values for each parameter to the layer’s constructor. With our convolutional layers have three parameters and the linear layers have two parameters.

Convolutional layers
1. in_channels
2. out_channels
3. kernel_size


Linear layers
1. in_features
2. out_features


Let's see how the values for the parameters are decided. We'll start by looking at hyperparameters, and then, we'll see how the dependent hyperparameters fall into place.

### Hyperparameter 
hyperparameters are parameters whose values are chosen manually and arbitrarily.we choose hyperparameter values mainly based on trial and error and increasingly by utilizing values that have proven to work well in the past

our CNN layers, these are the parameters we choose manually.

1. kernel_size
2. out_channels
3. out_features

### Data dependent hyperparameters

Data dependent hyperparameters are parameters whose values are dependent on data. The first two data dependent hyperparameters that stick out are the in_channels of the first convolutional layer, and the out_features of the output layer.

You see, the in_channels of the first convolutional layer depend on the number of color channels present inside the images that make up the training set. Since we are dealing with grayscale images, we know that this value should be a 1.

The out_features for the output layer depend on the number of classes that are present inside our training set. Since we have 10 classes of clothing inside the Fashion-MNIST dataset, we know that we need 10 output features.

In general, the input to one layer is the output from the previous layer, and so all of the in_channels in the conv layers and in_features in the linear layers depend on the data coming from the previous layer.


When we switch from a conv layer to a linear layer, we have to flatten our tensor. This is why we have 12*4*4. The twelve comes from the number of output channels in the previous layer, but why do we have the two 4s?

# CNN Weights - Learnable Parameters in Neural Networks
Learnable parameters are parameters whose values are learned during the training process.

In fact, when we say that a network is learning, we specifically mean that the network is learning the appropriate values for the learnable parameters. Appropriate values are values that minimize the loss function.

In [33]:
#Getting an Instance the Network
network = Network()

In [34]:
print(network)
#The print() function prints to the console a string representation of our network. 
#With a sharp eye,
#we can notice that the printed output here is detailing our network’s architecture listing out our network’s layers, 
#and showing the values that were passed to the layer constructors.

Network(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)


In [21]:
print(network)
#if network was a normal class and hasn't inherited the nn.module class print would give us object definition

<__main__.Network object at 0x12533c400>


### We can override Python’s default string representation using the __repr__ function. This name is short for representation. And this is what happened when we made Network a child of nn.Module class. nn.module class has overridden the default __rep__ to the layers detail .


Let's try to change the output  for the above case 

In [22]:
def __repr__(self):
    return "AYUSH RANJAN"
#this above code is copied into the class whose object we made .

In [26]:
print(network)
#Now the output is what ever we want 

AYUSH RANJAN


In [35]:
#Coming back to where we were ..
network

Network(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)

For the convolutional layers, the kernel_size argument is a Python tuple (5,5) even though we only passed the number 5 in the constructor.
                   This is because our filters actually have a height and width, and when we pass a single number, the code inside the layer’s constructor assumes that we want a square filter.
                   
                   
The stride tells the conv layer how far the filter should slide after each operation in the overall convolution. This tuple says to slide by one unit when moving to the right and also by one unit when moving down.


### Accessing the Network's Layers

In [36]:
network.conv1

Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))

In [37]:
network.fc1

Linear(in_features=192, out_features=120, bias=True)

In [38]:
network.out

Linear(in_features=60, out_features=10, bias=True)

### Accessing the Layer Weights

In [39]:
network.conv1.weight

Parameter containing:
tensor([[[[ 0.1020,  0.1215, -0.1171, -0.1623,  0.1411],
          [-0.0527,  0.1480,  0.0073,  0.1955,  0.0518],
          [-0.1070, -0.0038, -0.1312, -0.1623,  0.1520],
          [ 0.1162,  0.0866, -0.1269,  0.1381,  0.0608],
          [-0.1013,  0.1447,  0.1565, -0.0279, -0.1306]]],


        [[[-0.1022,  0.0904,  0.1983,  0.0466,  0.0404],
          [-0.0107, -0.1948,  0.0105, -0.1969, -0.1783],
          [-0.1969,  0.1460,  0.1940,  0.1031,  0.1218],
          [-0.1357,  0.1991, -0.1208, -0.0384,  0.1914],
          [ 0.1424,  0.1173,  0.0513,  0.1940, -0.1058]]],


        [[[-0.1823,  0.0417,  0.0566, -0.0692, -0.0958],
          [-0.1608, -0.0551, -0.0730,  0.1810,  0.0611],
          [-0.1811,  0.1298, -0.1511,  0.1434,  0.0097],
          [-0.0723, -0.0159,  0.1773,  0.1791, -0.0224],
          [ 0.0687,  0.1868, -0.1445,  0.0323,  0.1713]]],


        [[[ 0.1980,  0.1130,  0.1308,  0.1784,  0.0372],
          [-0.0367, -0.0877,  0.1149,  0.0960,  0.1227

#### The output is a tensor, but before we look at the tensor, let’s talk OOP for a moment. This is a good example that showcases how objects are nested. We first access the conv layer object that lives inside the network object.Then, we access the weight tensor object that lives inside the conv layer object, so all of these objects are chained or linked together


One thing to notice about the weight tensor output is that it says parameter containing at the top of the output. This is because this particular tensor is a special tensor because its values or scalar components are learnable parameters of our network.

To keep track of all the weight tensors inside the network. PyTorch has a special class called Parameter. The Parameter class extends the tensor class, and so the weight tensor inside every layer is an instance of this Parameter class. This is why we see the Parameter containing text at the top of the string representation output.

We can see in the Pytorch source code that the Parameter class is overriding the __repr__ function by prepending the text parameter containing to the regular tensor class representation output.

# Weight Tensor Shape

For the convolutional layers, the weight values live inside the filters,and in code, the filters are actually the weight tensors themselves.

In [40]:
network.conv1
# we have 1 color channel that should be convolved by 6 filters of size 5x5 to produce 6 output channels

Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))

In [42]:
network.conv1.weight.shape
#The first axis has a length of 6, and this accounts for the 6 filters.
#The second axis has a length of 1 which accounts for the single input channel, 
#and the last two axes account for the height and width of the filter.
#This means we are packaging all of our filters into a single tensor.

torch.Size([6, 1, 5, 5])

In [43]:
network.conv2

Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))

In [44]:
network.conv2.weight.shape
#there are 6 input channels coming from the previous layer.
#Think of this value of 6 here as giving each of the filters some depth. 
#Instead of having a filter that convolves all of the channels iteratively, 
#our filter has a depth that matches the number of channels.

torch.Size([12, 6, 5, 5])

2 points to take away from this :-
1. All filters are represented using a single tensor.
2. Filters have depth that accounts for the input channels.

For weights the 4 axis represents:-
(Number of filters, Depth, Height, Width)


##### With linear layers or fully connected layers, we have flattened rank-1 tensors as input and as output. The way we transform the in_features to the out_features in a linear layer is by using a rank-2 tensor that is commonly called a weight matrix.
This is due to the fact that the weight tensor is of rank-2 with height and width axes.

In [45]:
network.fc1.weight.shape

torch.Size([120, 192])

In [51]:
network.fc2.weight.shape  

torch.Size([60, 120])

In [52]:
network.out.weight.shape  

torch.Size([10, 60])

In [53]:
network

Network(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)

 height of the weight tensor has the length of the desired output features and a width of the input features.
 
 # Accessing the Networks Parameters

In [54]:
for p in network.parameters():
    print(p.shape)

torch.Size([6, 1, 5, 5])
torch.Size([6])
torch.Size([12, 6, 5, 5])
torch.Size([12])
torch.Size([120, 192])
torch.Size([120])
torch.Size([60, 120])
torch.Size([60])
torch.Size([10, 60])
torch.Size([10])


In [63]:
for name, p in network.named_parameters():
    print(name,'\t \t' ,p.shape)

conv1.weight 	 	 torch.Size([6, 1, 5, 5])
conv1.bias 	 	 torch.Size([6])
conv2.weight 	 	 torch.Size([12, 6, 5, 5])
conv2.bias 	 	 torch.Size([12])
fc1.weight 	 	 torch.Size([120, 192])
fc1.bias 	 	 torch.Size([120])
fc2.weight 	 	 torch.Size([60, 120])
fc2.bias 	 	 torch.Size([60])
out.weight 	 	 torch.Size([10, 60])
out.bias 	 	 torch.Size([10])
