<a href="https://colab.research.google.com/github/Redcoder815/Deep_Learning_PyTorch/blob/main/12CustomLayers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch
from torch import nn
from torch.nn import functional as F

Layers without Parameters

In [2]:
class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, X):
        return X - X.mean()

In [3]:
layer = CenteredLayer()
layer(torch.tensor([1.0, 2, 3, 4, 5]))

tensor([-2., -1.,  0.,  1.,  2.])

In [4]:
net = nn.Sequential(nn.LazyLinear(128), CenteredLayer())

In [5]:
Y = net(torch.rand(4, 8))
Y.mean()

tensor(2.3283e-10, grad_fn=<MeanBackward0>)

Layers with Parameters

In [7]:
class MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))
        self.bias = nn.Parameter(torch.randn(units,))

    def forward(self, X):
        linear = torch.matmul(X, self.weight.data) + self.bias.data
        return F.relu(linear)

In [8]:
linear = MyLinear(5, 3)
linear.weight

Parameter containing:
tensor([[-0.7144, -0.0186,  0.3919],
        [ 0.0439, -0.3770, -0.4825],
        [ 1.0769,  0.6800,  0.7254],
        [ 0.6147,  1.0605,  0.2761],
        [-0.1552, -0.6807,  0.6813]], requires_grad=True)

The output has 2 rows and 3 columns due to the specifics of matrix multiplication and how you defined the MyLinear layer:

Input X: You passed torch.rand(2, 5) as input. This tensor has 2 rows and 5 columns.
MyLinear Layer Definition: When you created linear = MyLinear(5, 3):
in_units was set to 5. This means the self.weight matrix will have 5 rows.
units was set to 3. This means the self.weight matrix will have 3 columns, and the self.bias vector will have 3 elements.
Matrix Multiplication: Inside forward(self, X), the key operation is torch.matmul(X, self.weight.data).
X has shape (2, 5).
self.weight.data has shape (5, 3).
When you multiply a (A, B) matrix by a (B, C) matrix, the result is an (A, C) matrix. So, (2, 5) * (5, 3) results in a tensor of shape (2, 3).
Bias Addition and ReLU: Adding the self.bias.data (which is (3,)) and applying F.relu are element-wise operations or broadcast operations that do not change the number of rows or columns of the resulting tensor.
So, the 2 rows come from your input torch.rand(2, 5), and the 3 columns come from the units=3 you specified when initializing MyLinear.

In [9]:
linear(torch.rand(2, 5))

tensor([[0.0000, 0.3629, 0.3967],
        [0.0000, 0.4779, 0.8893]])

Let's trace the shape transformation through your net = nn.Sequential(MyLinear(64, 8), MyLinear(8, 1)):

Initial Input: You provide torch.rand(2, 64). This tensor has 2 rows and 64 columns.

First Layer (MyLinear(64, 8)):

Input: (2, 64)
This layer is initialized with in_units=64 and units=8.
Inside this MyLinear layer, the matrix multiplication is torch.matmul(Input, self.weight.data).
The Input has shape (2, 64).
self.weight.data for this layer has shape (64, 8) (from in_units and units).
The result of (2, 64) * (64, 8) is a tensor of shape (2, 8). (The 64s cancel out, leaving 2 rows and 8 columns).
The ReLU activation function then applies to this (2, 8) tensor, maintaining its shape.
Second Layer (MyLinear(8, 1)):

Input: The output of the first layer, which is (2, 8).
This layer is initialized with in_units=8 and units=1.
Again, the matrix multiplication is torch.matmul(Input, self.weight.data).
The Input for this layer has shape (2, 8).
self.weight.data for this layer has shape (8, 1).
The result of (2, 8) * (8, 1) is a tensor of shape (2, 1). (The 8s cancel out, leaving 2 rows and 1 column).
The ReLU activation is applied, and the shape (2, 1) is preserved.
Therefore, the final output of net(torch.rand(2, 64)) has 2 rows and 1 column.

In [10]:
net = nn.Sequential(MyLinear(64, 8), MyLinear(8, 1))
net(torch.rand(2, 64))

tensor([[1.0354],
        [3.8670]])