# 🔥 Introduction to `PyTorch` -- Building ML models

[Deep Learning](https://dsai.units.it/index.php/courses/deep-learning/) Course @ [UniTS](https://portale.units.it/en), Spring 2024

<a target="_blank" href="https://colab.research.google.com/github/emaballarin/deeplearning-units/blob/main/labs/01_intro_to_pytorch/02_pytorch_models.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/></a>  <a target="_blank" href="https://kaggle.com/kernels/welcome?src=https://github.com/emaballarin/deeplearning-units/blob/main/labs/01_intro_to_pytorch/02_pytorch_models.ipynb"><img src="https://kaggle.com/static/images/open-in-kaggle.svg" alt="Open in Kaggle"/></a>

### Preliminary infrastucture setup

In [1]:
import os

FOLDERNAME: str = "deeplearning_units_2024"
try:
    if os.getenv("COLAB_RELEASE_TAG"):
        # noinspection PyUnresolvedReferences
        from google.colab import drive

        drive.mount(BASEPATH := "/content/drive")
        os.makedirs(FULLPATH := BASEPATH + "/MyDrive/" + FOLDERNAME, exist_ok=True)
    elif os.getenv("KAGGLE_CONTAINER_NAME"):
        os.makedirs(FULLPATH := "/kaggle/working/" + FOLDERNAME, exist_ok=True)
    else:
        os.makedirs(FULLPATH := "./" + FOLDERNAME, exist_ok=True)
    os.chdir(FULLPATH)
except (ModuleNotFoundError, FileExistsError, FileNotFoundError):
    pass

In [2]:
!python -m pip install -q icecream

In [3]:
# Pretty printouts
from icecream import ic

ic.configureOutput(outputFunction=print)
ic.configureOutput(prefix="    | ")

### Some imports

In [4]:
import torch as th
from safetensors.torch import save_file as safe_save_file
from safetensors.torch import save_model as safe_save_model
from safetensors.torch import load_model as safe_load_model
from torch import Tensor

## Example: Linear regression...

### ... with *bare tensors*
By using all the pieces we've seen till now, we can build our first *model* using PyTorch: a linear regressor, *i.e.*:

$$
y = XW + b
$$

which can also be simplified as:

$$
y = XW
$$

if we incorporate the bias $b$ inside $W$ and add to the $X$ a column of ones to the right.

We start by generating our data. We randomly sample $X$ as a $N\times P$ tensor, meaning that we have 1000 datapoints and 100 features and produce $y$ as:
$$
y=XM+\mathcal{N}(0,I)
$$
where $M$ is a randomly drawn projection vector (shape $P\times 1$, same as our weights).
We are adding some iid gaussian noise on the $y$ to avoid the interpolation regime, in which we could be fitting our data perfectly using a linear model.

In [5]:
N: int = 1000
P: int = 100
X_orig: Tensor = th.rand(N, P)
M: Tensor = th.rand(P, 1)
y: Tensor = X_orig @ M + th.normal(
    mean=th.zeros(N, 1), std=th.ones(N, 1)
)  # Convenience functions: `th.zeros`, `th.ones`
# Also: PyTorch supports probability distributions (e.g. `th.normal`)

We can add a column of ones to $X$ to include the bias:

In [6]:
X: Tensor = th.cat(
    tensors=[X_orig, th.ones(N, 1)], dim=1
)  # `th.cat` concatenates tensors along a given dimension

The regressor can be fit with classical statistical methods such as Ordinary Least Squares (OLS), and the optimal $W$ has the form:

$$
W^*=(X^TX)^{-1}X^Ty
$$

In [7]:
W_star: Tensor = ((X.T @ X).inverse()) @ X.T @ y

To assess the quality of this fit we can evaluate the Mean Squared Error (MSE) between the original $y$ and the prediction:

In [8]:
loss: Tensor = th.nn.functional.mse_loss(input=X @ W_star, target=y)
_ = ic(loss)

    | loss: tensor(0.8776)


Fitted model parameters can be saved (and loaded afterwards) using the `torch.save` (and `torch.load`) function:

In [9]:
th.save(W_star, "./W_star_ols.pt")

**Note**:

The `torch.save` function is not limited to save tensors, but can be used to save any kind of object (e.g. models, optimizers, etc.). Under the hood, it uses the (in)famous `pickle` module.

Such setup allows for great convenience, but also for potential security risks. Be careful when loading objects from untrusted sources. Or use [`safetensors`](https://github.com/huggingface/safetensors) instead!

In [10]:
safe_save_file(
    {"W_star": W_star}, "./W_star_ols_safe.safetensors"
)  # The only difference: the saved tensor should be named.

### ... with a `torch.nn.Module`

The same linear regression model can be implemented using the `torch.nn.Module` class. This is the recommended way to build models in PyTorch, as it allows for a more structured and modular approach, and for gradient-based optimization of model parameters.

In general, a PyTorch model is a Python class that inherits from `torch.nn.Module` and implements (at least) these two methods:

1. `__init__`: the constructor, in which we **must** define all learnable parameters of the model (directly as `torch.nn.Parameters`s, or as members of other class objects);
2. `forward`: the method that specifies how input data fed into the model need to be processed in order to produce some outputs.

**Note**:

In our case, the transformation of the inputs we are looking for is already implemented by the `torch.nn.Linear` class, which is a subclass of `torch.nn.Module` itself. We can use it to build our linear regressor.

In [11]:
class LinearRegressor(th.nn.Module):
    def __init__(self, in_features: int, out_features: int, bias: bool = True) -> None:
        super().__init__()
        self.affine_transform = th.nn.Linear(
            in_features=in_features, out_features=out_features, bias=bias
        )

    def forward(self, x: Tensor) -> Tensor:
        return self.affine_transform(x)

    # Just for the fun of it, we can add a method to fit the model using OLS.
    # With gradient-based optimization, this would not be necessary. But it's a good exercise.
    def ols_fit(self, xols: Tensor, yols: Tensor) -> None:
        with th.no_grad():
            xols = th.cat(tensors=[xols, th.ones(xols.shape[0], 1)], dim=1)
            wols: Tensor = ((xols.T @ xols).inverse()) @ xols.T @ yols
            self.affine_transform.weight.data = (
                wols[: self.affine_transform.in_features].T.detach().clone()
            )
            self.affine_transform.bias.data = wols[-1].detach().clone()

Now, we can fit the model on the same data as before:

In [12]:
model: LinearRegressor = LinearRegressor(in_features=P, out_features=1, bias=True)

In [13]:
model.ols_fit(X_orig, y)

And we can evaluate the loss as before:

In [14]:
loss: Tensor = th.nn.functional.mse_loss(input=model(X_orig), target=y)

We can inspect the current parameters of our model by either direct access, or by using the `state_dict` method.

In [15]:
_ = ic(model.affine_transform.weight)
_ = ic(model.affine_transform.bias)

    | model.affine_transform.weight: Parameter containing:
                                     tensor([[ 0.2927,  0.2902, -0.0661,  0.3177,  0.3685,  0.1991,  0.1837,  0.1707,
                                               0.4834,  0.7170,  0.7353,  0.7408,  0.8874,  0.3020,  0.8377,  0.7758,
                                               0.0820,  0.3031,  0.4683,  0.1599,  0.4069,  0.5643,  0.5213,  0.8161,
                                               0.4891,  0.1744,  0.3105,  0.1430,  0.2537,  0.2139,  0.7634,  1.0088,
                                               0.3002,  0.8979,  0.1368,  0.7721,  1.0371,  0.8219,  0.3422,  0.6831,
                                               0.6882,  0.6924,  0.3859,  0.3705,  0.4826,  0.4146,  0.5599,  0.8001,
                                               0.4709,  0.7081, -0.0652,  0.7451,  0.3477,  0.5825,  0.6938,  0.0594,
                                               0.4942,  1.1733,  0.3685,  0.4555,  0.5313,  0.3313,  0.4171,  0.781

In [16]:
_ = ic(model.state_dict())

    | model.state_dict(): OrderedDict([('affine_transform.weight',
                                        tensor([[ 0.2927,  0.2902, -0.0661,  0.3177,  0.3685,  0.1991,  0.1837,  0.1707,
                                    0.4834,  0.7170,  0.7353,  0.7408,  0.8874,  0.3020,  0.8377,  0.7758,
                                    0.0820,  0.3031,  0.4683,  0.1599,  0.4069,  0.5643,  0.5213,  0.8161,
                                    0.4891,  0.1744,  0.3105,  0.1430,  0.2537,  0.2139,  0.7634,  1.0088,
                                    0.3002,  0.8979,  0.1368,  0.7721,  1.0371,  0.8219,  0.3422,  0.6831,
                                    0.6882,  0.6924,  0.3859,  0.3705,  0.4826,  0.4146,  0.5599,  0.8001,
                                    0.4709,  0.7081, -0.0652,  0.7451,  0.3477,  0.5825,  0.6938,  0.0594,
                                    0.4942,  1.1733,  0.3685,  0.4555,  0.5313,  0.3313,  0.4171,  0.7810,
                                    0.6458,  0.3261,  0.2075,  

Model saving/loading is also straightforward:

In [17]:
# Saving with `torch.save`
th.save(
    model.state_dict(), "./model_ols.pt"
)  # Beware: we do not save `model` directly, but its `state_dict`!

# Saving with `safetensors`
safe_save_model(model, "./model_ols_safe.safetensors")

In [18]:
# Loading with `torch.load`
model_loaded = LinearRegressor(in_features=P, out_features=1, bias=True)
model_loaded.load_state_dict(th.load("./model_ols.pt"))

# Loading with `safetensors`
model_loaded_safe = LinearRegressor(in_features=P, out_features=1, bias=True)
_ = safe_load_model(model_loaded_safe, "./model_ols_safe.safetensors")