## Build the Neural Network

Neural networks comprise of layers/modules that perform operations on data. The <i><b>torch.nn</i></b> namespace provides all the building blocks you need to build your own neural network. Every module in PyTorch is a subclass of <i><b>nn.Module</i></b>. A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily. 

In the following sections, we'll build a neural network to predict median house values in the <i><b>California Housing</i></b> dataset.

In [3]:
import torch 
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

torch.set_printoptions(sci_mode=False, linewidth=300)

### Load data

Let's first load the data into a DataLoader as a review of previous tutorials

In [4]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

X, y = fetch_california_housing(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)

In [8]:
from torch.utils.data import TensorDataset

X_train = torch.as_tensor(X_train, dtype=torch.float) # an alternative to torch.from_numpy
y_train = torch.as_tensor(y_train, dtype=torch.float)
X_test = torch.as_tensor(X_test, dtype=torch.float)
y_test = torch.as_tensor(y_test, dtype=torch.float)

train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)

train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=64, shuffle=True)

### Define the Class

We define our neural network by subclassing <i><b>nn.Module</i></b>, and initialize the neural network layers in <i><b>__init__</i></b>. Every <i><b>nn.Module</i></b> subclass implements the operations on input data in the f<i><b>forward</i></b> method.

In other words, the <i><b>forward</i></b> method should describe how the input data will interact with our layers. 

In [10]:
class NeuralNetwork_v1(nn.Module):
    def __init__(self):
        super(NeuralNetwork_v1, self).__init__() # initialize the parent class

        self.fc1 = nn.Linear(in_features=8, out_features=16) #in_features is the data dim, 8
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(in_features=16, out_features=16) # In middle layers, in_features must match the last layer out_features
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(in_features=16, out_features=1) # out_features is the label_dim, 1

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        return X

We create an instance of <i><b>NeuralNetwork_v1</i></b>, and move it to <i><b>device</i></b>, and print its structure.

In [11]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_v1 = NeuralNetwork_v1().to(device) # .to method can also move an entire model to device
print(model_v1)

RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


To use the model, we pass it the input data. This executes the model's <i><b>forward</i></b>, along with some background operations. Do not call <i><b>model.forward()</i></b> directly!

In [None]:
inputs, labels = next(iter(train_dataloader))
inputs = inputs.to(device) # data must reside on the same device
labels = labels.to(device)

outputs = model_v1(inputs) # call your model as if it were a function
outputs.size()

### Model Layers

Let's break down the layers in the <i><b>NeuralNetwork_v1</i></b> model. To illustrate it, we will take a sample minibatch of 3 data vectors of size 8 and see what happens to it as we pass it through the network.

In [None]:
x = inputs[:3]
print(x.size())

### nn.Linear

The linear layer is a module that applies linear transformation on the input using its stored weights and biases

In [None]:
fc1_out = model_v1.fc1(x) # nn.Linear(in_features=8, out_features=16)
print(fc1_out.size())

Note that the batch size <i><b>3</i></b> doesn't change, but the size of the data dimension has changed to <i><b>16</i></b> because <i><b>out_features</i></b> is 16.

### nn.ReLU

non-linear activations are what create the complex mapings between the model's inputs and outputs. They are applied after linear transformations to introduce <i>nonlinearity</i>, helping neural networks learn a wide variety of phenomena. 

In this model, we use <i><b>nn.ReLU</i></b> between our linear layers, but there are <i><b>other activations</i></b> to introduce non-linearity in your model.

No activation functions will change the <i><b>shape</i></b> of their input.

In [None]:
print(f"Before ReLU: {fc1_out} \n\n")
relu1_out = model_v1.relu1(fc1_out) # nn.ReLU()
print(f"After ReLU: {relu1_out}")

### New modeule: nn.Sequential

<i><b>nn.Sequential</i></b> is an ordered container of modules. The data is passed through all the modules in the same order as defined. We can define a single <i><b>nn.Sequential</i></b> container in place of the five individual layers in <i><b>NeuralNetwork_v1</i></b>.

In [None]:
class NeuralNetwork_v2(nn.Module):
    def __init__(self):
        super(NeuralNetwork_v2, self).__init__() # initialize the parent class

        self.linear_relu_stack = nn.Sequential(
            nn.Linear(in_features=8, out_features=16),
            nn.ReLU(),
            nn.Linear(in_features=16, out_features=16),
            nn.ReLU(),
            nn.Linear(in_features=16, out_features=1),
        )

    def forward(self, x):
        return self.linear_relu_stack(x) # no need to pass x around anymore

In [None]:
model_v2 = NeuralNetwork_v2().to(device)
outputs = model_v2(inputs)
outputs.size()

### ReLU as a function

Most activation functions do not have learnable parameters(one exception: <b><i>nn.PReLU</b></i>, Parameterized <b>ReLU</b>). Therefore, they can be applied just like an ordinary <b><i>torch.*</b></i> function.

In [None]:
class NeuralNetwork_v3(nn.Module):
    def __init__(self):
        super(NeuralNetwork_v3, self).__init__()

        self.fc1 = nn.Linear(in_features=8, out_features=16)
        self.fc2 = nn.Linear(in_features=16, out_features=16)
        self.fc3 = nn.Linear(in_features=16, out_features=1)


    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(torch.relu(x)) # ReLU now is no longer a "layer", but just a function
        x = self.fc3(torch.relu(x))

        return X
    


In [None]:
import torch.nn.functional as F

torch.allclose(F.relu(fc1_out), torch.relu(fc1_out))

### Model Parameters

Many layers inside a neural network are parameterized, i.e. have associated weights and biases that are optimized during training. Subclassing <b><i>nn.Module</b></i> automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model's <b><i>parameters()</b></i> or <b><i>named_parameters()</b></i> methods. 

In this example, we iterate over each parameter, and print its size and a preview of its values. 

In [None]:
print(f"Model structure: {model_v1} \n\n")

for name, param in model_v1.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")