This notebook is adapted from the official PyTorch tutorial on [Build Model](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html).

# Build the Neural Network

Neural networks comprise of layers/modules that perform operations on data.
The [torch.nn](https://pytorch.org/docs/stable/nn.html) namespace provides all the building blocks you need to
build your own neural network. Every module in PyTorch is a subclass of [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html).
A neural network is a module itself that consists of other modules (layers). This nested structure allows for
building and managing complex architectures easily.

In the following sections, we'll build a neural network to predict median house values in the [California Housing](https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset) dataset.


In [1]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

torch.set_printoptions(sci_mode=False, linewidth=300)

## Load data

Let's first load the data into a `DataLoader` as a review of previous tutorials. 



In [2]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

X, y = fetch_california_housing(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [3]:
from torch.utils.data import TensorDataset

X_train = torch.as_tensor(X_train, dtype=torch.float) # an alternative to torch.from_numpy
y_train = torch.as_tensor(y_train, dtype=torch.float)
X_test = torch.as_tensor(X_test, dtype=torch.float)
y_test = torch.as_tensor(y_test, dtype=torch.float)

train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)

train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=64, shuffle=False)

## Define the Class
We define our neural network by subclassing ``nn.Module``, and
initialize the neural network layers in ``__init__``. Every ``nn.Module`` subclass implements
the operations on input data in the ``forward`` method. 

In other words, the ``forward`` method should describe how input data will interact with your layers. 



In [5]:
class NeuralNetwork_v1(nn.Module):
    def __init__(self):
        super(NeuralNetwork_v1, self).__init__() # initialize the parent class 

        self.fc1   = nn.Linear(in_features=8, out_features=16)  # in_features is the data dim, 8
        self.relu1 = nn.ReLU()
        self.fc2   = nn.Linear(in_features=16, out_features=16) # In middle layers, in_features must match the last out_features
        self.relu2 = nn.ReLU()
        self.fc3   = nn.Linear(in_features=16, out_features=1)  # out_features is the label dim, 1

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        return x

We create an instance of ``NeuralNetwork_v1``, and move it to ``device``, and print
its structure.



In [6]:
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
model_v1 = NeuralNetwork_v1().to(device) # .to method can also move an entire model to device
print(model_v1)

NeuralNetwork_v1(
  (fc1): Linear(in_features=8, out_features=16, bias=True)
  (relu1): ReLU()
  (fc2): Linear(in_features=16, out_features=16, bias=True)
  (relu2): ReLU()
  (fc3): Linear(in_features=16, out_features=1, bias=True)
)


To use the model, we pass it the input data. This executes the model's ``forward``,
along with some [background operations](https://github.com/pytorch/pytorch/blob/270111b7b611d174967ed204776985cefca9c144/torch/nn/modules/module.py#L866).
Do not call ``model.forward()`` directly!



In [7]:
inputs, labels = next(iter(train_dataloader))
inputs = inputs.to(device) # data must reside on the same device
labels = labels.to(device)

outputs = model_v1(inputs) # call your model as if it were a function
outputs.size()

torch.Size([64, 1])

## Model Layers

Let's break down the layers in the ``NeuralNetwork_v1`` model. To illustrate it, we
will take a sample minibatch of 3 data vectors of size 8 and see what happens to it as
we pass it through the network.



In [8]:
x = inputs[:3]
print(x.size())

torch.Size([3, 8])


### nn.Linear
The [linear layer](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)
is a module that applies a linear transformation on the input using its stored weights and biases.




In [10]:
fc1_out = model_v1.fc1(x) # nn.Linear(in_features=8, out_features=16)
print(fc1_out.size())

torch.Size([3, 16])


Note that the batch size `3` doesn't change, but the size of the data dimension has changed to `16` because `out_features` is 16. 

### nn.ReLU
Non-linear activations are what create the complex mappings between the model's inputs and outputs.
They are applied after linear transformations to introduce *nonlinearity*, helping neural networks
learn a wide variety of phenomena.

In this model, we use [nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html) between our
linear layers, but there are [other activations](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) to introduce non-linearity in your model.

No activation functions will change the **shape** of their input. 



In [11]:
print(f"Before ReLU: {fc1_out}\n\n")
relu1_out = model_v1.relu1(fc1_out) # nn.ReLU()
print(f"After ReLU: {relu1_out}")

Before ReLU: tensor([[-371.4051, -145.0464, -250.5891,  353.9657, -182.2622,  147.7344,  -92.1813,  125.9600,  396.1821,  422.5164,  353.0605, -182.8116, -252.1051, -340.7018, -361.8319,  -93.8040],
        [-458.4600, -187.4800, -311.7845,  440.8052, -234.7249,  181.6513, -112.1790,  150.2557,  495.9763,  525.2939,  434.9780, -220.7425, -310.0343, -433.5242, -449.1116, -110.9794],
        [-298.5299, -109.0202, -196.1847,  284.0250, -138.7991,  117.8502,  -78.7035,  105.9789,  313.1189,  334.5900,  286.1516, -152.1156, -205.4304, -263.7532, -290.8603,  -79.3804]], device='mps:0', grad_fn=<LinearBackward0>)


After ReLU: tensor([[  0.0000,   0.0000,   0.0000, 353.9657,   0.0000, 147.7344,   0.0000, 125.9600, 396.1821, 422.5164, 353.0605,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
        [  0.0000,   0.0000,   0.0000, 440.8052,   0.0000, 181.6513,   0.0000, 150.2557, 495.9763, 525.2939, 434.9780,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
        [  0.0000,   0.0000,   

### New module: nn.Sequential
[nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) is an ordered
container of modules. The data is passed through all the modules in the same order as defined. We can define a single `nn.Sequential` container in place of the five individual layers in `NeuralNetwork_v1`. 



In [12]:
class NeuralNetwork_v2(nn.Module):
    def __init__(self):
        super(NeuralNetwork_v2, self).__init__() # initialize the parent class 

        self.linear_relu_stack = nn.Sequential(
            nn.Linear(in_features=8, out_features=16),  # in_features is the data dim, 8
            nn.ReLU(),
            nn.Linear(in_features=16, out_features=16), # In middle layers, in_features must match the last out_features
            nn.ReLU(),
            nn.Linear(in_features=16, out_features=1),  # out_features is the label dim, 1
        )

    def forward(self, x):
        return self.linear_relu_stack(x) # no need to pass x around anymore; modules are called sequentially

In [13]:
model_v2 = NeuralNetwork_v2().to(device)
outputs = model_v2(inputs) # call your model as if it were a function
outputs.size() # get outputs of the same size as that of model_v1

torch.Size([64, 1])

### ReLU as a function

Most activation functions do not have learnable parameters (one exception: [nn.PReLU](https://pytorch.org/docs/stable/generated/torch.nn.PReLU.html#prelu), **P**arameterized **ReLU**). Therefore, they can be applied just like an ordinary `torch.*` function. 

In [14]:
class NeuralNetwork_v3(nn.Module):
    def __init__(self):
        super(NeuralNetwork_v3, self).__init__() # initialize the parent class 

        self.fc1 = nn.Linear(in_features=8, out_features=16)  # in_features is the data dim, 8
        self.fc2 = nn.Linear(in_features=16, out_features=16) # In middle layers, in_features must match the last out_features
        self.fc3 = nn.Linear(in_features=16, out_features=1)  # out_features is the label dim, 1

        # homework: use nn.ModuleList to hold the three layers

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(torch.relu(x)) # ReLU now is no longer a "layer", but just a function
        x = self.fc3(torch.relu(x))
        return x

In [16]:
model_v3 = NeuralNetwork_v3().to(device)
outputs = model_v3(inputs) # call your model as if it were a function
outputs.size() # get outputs of the same size as that of model_v1

torch.Size([64, 1])

If your activation function is not listed as `torch.*`, you should be able to find it under [torch.nn.functional](https://pytorch.org/docs/stable/nn.functional.html). 

In [17]:
import torch.nn.functional as F

torch.allclose(F.relu(fc1_out), torch.relu(fc1_out))

True

## Model Parameters
Many layers inside a neural network are *parameterized*, i.e. have associated weights
and biases that are optimized during training. Subclassing ``nn.Module`` automatically
tracks all fields defined inside your model object, and makes all parameters
accessible using your model's ``parameters()`` or ``named_parameters()`` methods.

In this example, we iterate over each parameter, and print its size and a preview of its values.




In [18]:
print(f"Model structure: {model_v1}\n\n")

for name, param in model_v1.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure: NeuralNetwork_v1(
  (fc1): Linear(in_features=8, out_features=16, bias=True)
  (relu1): ReLU()
  (fc2): Linear(in_features=16, out_features=16, bias=True)
  (relu2): ReLU()
  (fc3): Linear(in_features=16, out_features=1, bias=True)
)


Layer: fc1.weight | Size: torch.Size([16, 8]) | Values : tensor([[-0.0577, -0.2311,  0.3286,  0.0309, -0.2904,  0.2855, -0.1345,  0.2278],
        [ 0.0154,  0.0417, -0.1681,  0.2118, -0.1450, -0.2586,  0.1913, -0.1318]], device='mps:0', grad_fn=<SliceBackward0>) 

Layer: fc1.bias | Size: torch.Size([16]) | Values : tensor([ 0.0975, -0.0100], device='mps:0', grad_fn=<SliceBackward0>) 

Layer: fc2.weight | Size: torch.Size([16, 16]) | Values : tensor([[ 0.1630,  0.2013,  0.2361, -0.2312, -0.0823, -0.0157, -0.0339,  0.2347,  0.0601,  0.1327,  0.2467, -0.2486,  0.2320,  0.0461,  0.0623,  0.2198],
        [ 0.1207,  0.0952, -0.1620,  0.1489,  0.2330,  0.2324, -0.1856, -0.0587, -0.0917, -0.1555, -0.1636,  0.0731,  0.1939,  0.2136,  0.0974,  0

--------------


