## The Multilayer Perceptron Model
We will construct our MLP model to have the following structure
1. Input Layer: 64 units, followed by ReLU.
2. Hidden Layer 1: 32 units with ReLU, followed by Dropout and Batch Normalization to keep training smooth.
3. Hidden Layer 2: 16 units with ReLU, Dropout, and Batch Normalization.
4. Hidden Layer 3: 8 units with ReLU, Dropout, and Batch Normalization.
5. Output Layer: A single linear unit.

In [2]:
import torch
from torch import nn

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            # first layer (64 units)
            nn.LazyLinear(64), nn.ReLU(),
            # first hidden layer (32 units)
            nn.LazyLinear(32), nn.LazyBatchNorm1d(), nn.ReLU(), nn.Dropout(),
            # second hidden layer (16 units)
            nn.LazyLinear(16), nn.LazyBatchNorm1d(), nn.ReLU(), nn.Dropout(),
            # third hidden layer (8 units)
            nn.LazyLinear(8), nn.LazyBatchNorm1d(), nn.ReLU(), nn.Dropout(),
            # final output layer
            nn.LazyLinear(1))
    
    def forward(self, x):
        return self.net(x)

Although this is pretty much hand waving, the goal is to let the wider layers extract as many underlying features as possible before funneling that information down to a single prediction.