In [1]:
import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l

### Layers without Parameters

To start, we construct a custom layer that does not have any parameters of its own. This should look familiar if you recall our introduction to module in Section 6.1. The following CenteredLayer class simply subtracts the mean from its input. To build it, we simply need to inherit from the base layer class and implement the forward propagation function.

In [2]:
class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, X):
        return X - X.mean()

In [5]:
layer = CenteredLayer()
layer(torch.tensor([1.0, 2, 3, 4, 5]))

tensor([-2., -1.,  0.,  1.,  2.])

In [6]:
net = nn.Sequential(nn.LazyLinear(128), CenteredLayer())



As an extra sanity check, we can send random data through the network and check that the mean is in fact 0. Because we are dealing with floating point numbers, we may still see a very small nonzero number due to quantization.

In [7]:
Y = net(torch.rand(4, 8))
Y.mean()

tensor(-6.7521e-09, grad_fn=<MeanBackward0>)

### Layers with Parameters

Now that we know how to define simple layers, let’s move on to defining layers with parameters that can be adjusted through training. We can use built-in functions to create parameters, which provide some basic housekeeping functionality. In particular, they govern access, initialization, sharing, saving, and loading model parameters. This way, among other benefits, we will not need to write custom serialization routines for every custom layer.

In [9]:
class MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))
        self.bias = nn.Parameter(torch.randn(units,))

    def forward(self, X):
        linear = torch.matmul(X, self.weight.data) + self.bias.data
        return F.relu(linear)

In [10]:
linear = MyLinear(5, 3)
linear.weight

Parameter containing:
tensor([[-0.2647,  1.1415, -2.9482],
        [-0.2882, -0.3804, -0.9601],
        [ 0.0447,  0.1881, -0.4093],
        [-0.5894, -2.0246,  0.0415],
        [ 0.6353,  0.9011, -2.1016]], requires_grad=True)

In [11]:
linear(torch.rand(2, 5))

tensor([[0.6401, 0.0000, 0.0000],
        [0.6428, 0.0000, 0.0000]])

We can also construct models using custom layers. Once we have that we can use it just like the built-in fully connected layer.

In [12]:
net = nn.Sequential(MyLinear(64, 8), MyLinear(8, 1))
net(torch.rand(2, 64))

tensor([[0.0000],
        [9.5910]])

### EXERCISES

1. Design a layer that takes an input and computes a tensor reduction, i.e., it returns $y_k = \sum_{i, j} W_{ijk} x_i x_j$

2. Design a layer that returns the leading half of the Fourier coefficients of the data.