<a href="https://colab.research.google.com/github/TorchSpatiotemporal/tsl/blob/main/examples/notebooks/a_gentle_introduction_to_tsl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A Gentle Introduction to tsl
---

This a tutorial notebook about __tsl (Torch Spatiotemporal)__, a Python library built upon the PyTorch ecosystem
and tailored for spatiotemporal data processing.

In this notebook we are going to see what are the necessary steps from data loading to model training.

## Installation
---

Let's start by the installation. If you run the notebook in colab, you can install tsl with these commands:

In [None]:
!git clone https://github.com/TorchSpatiotemporal/tsl.git
!pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/torch_stable.html
!pip install torch-scatter torch-sparse torch-geometric -f https://data.pyg.org/whl/torch-1.10.1+cu113.html
!pip install ./tsl

In particular, the second command is installing `torch-geometric` dependencies for the specific environment you have on colab with GPU runtime. Please refer to [PyG installation guidelines](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html) for installation in other environments.

We recommend to install tsl from GitHub repository at the moment, to be sure you are up-to-date with latest version.

Let's check if everything is ok.

In [1]:
import tsl
import torch
import numpy as np

np.set_printoptions(suppress=True)
tsl.logger.disabled = True

print(f"tsl version  : {tsl.__version__}")
print(f"torch version: {torch.__version__}")

tsl version  : 0.1.0
torch version: 1.11.0+cu102


  from .autonotebook import tqdm as notebook_tqdm


## Usage
---

tsl is more than a collection of layers. We can classify the library modules into:

* __Data loading modules__
    Manage how to store, preprocess and visualize spatio-temporal data
* __Inference modules__
    Methods and models exploiting the data to make inferences

We will go deeper on them in next sections.

## Data loading
---

`tsl` comes with several datasets used in spatiotemporal processing literature. You can find them inside the submodule `tsl.datasets`.

### Loading the dataset

As an example, we start by using the [MetrLA](https://paperswithcode.com/sota/traffic-prediction-on-metr-la) dataset, a common benchmark for traffic forecasting. The dataset contains traffic readings collected from 207 loop detectors on highways in Los Angeles County, aggregated in 5 minute intervals for 4 months between March 2012 to June 2012. Loading the dataset is as simple as that:

In [29]:
from airquality import AirQuality as AQ

dataset = AQ(data_dir='../data', is_subgraph=True, sub_size = 70, sub_start= '6.0-79.0-8002.0')
print(dataset)

Found temporal data pickle, loading...	DONE!
Found a valid build, loading... 	DONE!
AQ(length=26309, n_nodes=70, n_channels=1)


We can see that data are organized a 3-dimensional array, with:

* 34.272 temporal steps (1 each 5 minute for 4 months)
* 207 nodes (the loop detectors)
* 1 channels (detected speed)

Nice! Other than storing the data of interest, the dataset comes with useful tools.

In [3]:
print(f"Sampling period: {dataset.freq}\n"
      f"Has missing values: {dataset.has_mask}\n"
      f"Percentage of missing values: {(1 - dataset.mask.mean()) * 100:.2f}%\n"
      f"Has dataset exogenous variables: {dataset.has_exogenous}\n"
      f"Relevant attributes: {', '.join(dataset.attributes.keys())}")

Sampling period: <Hour>
Has missing values: True
Percentage of missing values: 25.67%
Has dataset exogenous variables: False
Relevant attributes: dist


Let's look at the output. We know that the dataset has missing entries, with `dataset.mask` being a binary indicator associated with each timestep, node and channel (with ones indicating valid values).

Also, the dataset has no exogenous variables, i.e., there are no time-varying features paired with the main signal.
Instead it has a useful attribute: the distance matrix. We call *attributes*, features that are static.

You can access exogenous variables and attributes by `dataset.name`:

In [2]:
print(dataset.dist)

            256         585         586         587         588         589  \
256    0.000000  252.826983  252.536928  251.740761  231.443093  223.965083   
585  252.826983    0.000000    2.883137    5.839589   35.657821   46.024426   
586  252.536928    2.883137    0.000000    8.633242   33.308270   43.753870   
587  251.740761    5.839589    8.633242    0.000000   39.556224   49.651786   
588  231.443093   35.657821   33.308270   39.556224    0.000000   10.563369   
..          ...         ...         ...         ...         ...         ...   
661  113.036321  146.810371  145.955655  146.958344  120.800169  112.433682   
663  235.261656   32.229345   34.418405   27.038035   53.429740   60.947425   
664  273.522002   80.547749   83.430880   75.001762  111.915276  120.591217   
665  177.811511  114.900296  112.896597  117.673114   80.256299   69.720197   
666  183.993799  132.404073  130.182809  135.699818   96.943323   86.435629   

            590         594         595         596

This matrix stores the pairwise distance between sensors, with `inf` denoting two non-neighboring sensors.

Let's now check how the speed readings look like.

In [26]:
dataset.dataframe().head(10)

nodes,1.0-103.0-11.0,1.0-113.0-3.0,1.0-73.0-23.0,10.0-1.0-2.0,10.0-3.0-1007.0,10.0-3.0-1008.0,10.0-3.0-2004.0,10.0-5.0-1002.0,11.0-1.0-41.0,11.0-1.0-43.0,...,80.0-2.0-12.0,80.0-2.0-14.0,9.0-1.0-10.0,9.0-1.0-1123.0,9.0-11.0-124.0,9.0-3.0-1003.0,9.0-3.0-25.0,9.0-5.0-5.0,9.0-9.0-2123.0,9.0-9.0-27.0
channels,PM25,PM25,PM25,PM25,PM25,PM25,PM25,PM25,PM25,PM25,...,PM25,PM25,PM25,PM25,PM25,PM25,PM25,PM25,PM25,PM25
DateTime,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2018-01-01 05:00:00,10.9,19.142857,11.44,5.4,13.0,10.0,7.2,5.2,10.0,12.0,...,53.166668,116.166664,11.0,9.0,7.0,6.0,8.0,7.0,0.0,3.0
2018-01-01 06:00:00,9.428533,16.0,13.4,5.2,12.1,12.1,5.6,5.2,14.0,11.0,...,61.833332,137.5,8.0,9.0,7.0,6.0,10.0,6.0,1.0,3.0
2018-01-01 07:00:00,9.472011,15.0,21.1,8.0,13.0,12.8,5.1,5.2,11.0,11.0,...,71.5,141.166672,9.0,5.0,6.0,8.0,7.0,5.0,4.0,3.0
2018-01-01 08:00:00,9.545652,15.0,17.4,7.7,14.1,12.5,4.8,6.1,16.0,9.0,...,344.0,622.0,8.0,3.0,3.0,7.0,3.0,1.0,3.0,1.0
2018-01-01 09:00:00,9.561413,13.0,5.4,7.6,15.7,7.9,4.2,6.0,8.0,10.0,...,292.0,744.0,5.0,4.0,3.0,5.0,3.0,4.0,1.0,0.0
2018-01-01 10:00:00,9.494837,11.0,5.5,6.2,18.0,4.3,3.8,5.4,11.0,9.0,...,278.0,678.0,5.0,4.0,3.0,5.0,3.0,6.0,4.0,3.0
2018-01-01 11:00:00,9.305163,11.0,15.2,5.5,19.1,4.7,4.4,6.6,9.0,8.0,...,296.0,688.0,4.0,5.0,5.0,0.0,2.0,1.0,3.0,4.0
2018-01-01 12:00:00,9.221739,8.0,12.3,5.5,18.9,6.6,5.6,6.5,8.0,6.0,...,286.0,789.0,4.0,4.0,5.0,1.0,2.0,2.0,1.0,2.0
2018-01-01 13:00:00,8.963043,5.0,16.0,3.9,18.700001,7.7,7.3,6.1,8.0,7.0,...,297.0,809.0,5.0,4.0,5.0,5.0,2.0,1.0,5.0,2.0
2018-01-01 14:00:00,8.591033,6.0,17.700001,4.6,17.200001,5.9,6.5,4.7,8.0,6.0,...,385.0,729.0,3.0,3.0,4.0,6.0,3.0,1.0,8.0,1.0


### Get connectivity

Besides the time series, to properly use graph-based models, we need to __connect__ nodes somehow.

With the method `dataset.get_similarity()` we can retrieve nodes' similarities computed with different methods. The available similarity methods for a dataset can be found at `dataset.similarity_options`, while the default one is at `dataset.similarity_score`.

In [13]:
print(f"Default similarity: {dataset.similarity_score}\n"
      f"Available similarity options: {dataset.similarity_options}\n")

sim = dataset.get_similarity("distance")  # same as dataset.get_similarity()
print(sim[:10, :10])  # just check first 10 nodes for readability

Default similarity: distance
Available similarity options: {'distance'}

[[1.         0.9564619  0.99390571 0.52242575 0.51529322 0.50860576
  0.50183616 0.53325834 0.59217674 0.5940329 ]
 [0.9564619  1.         0.97783375 0.52818064 0.5114032  0.50527559
  0.49656053 0.54387295 0.58860374 0.58955044]
 [0.99390571 0.97783375 1.         0.49541105 0.48521184 0.4787652
  0.47144431 0.50778061 0.56204142 0.56361064]
 [0.52242575 0.52818064 0.49541105 1.         0.99788113 0.99779702
  0.99644182 0.99929262 0.99250402 0.99193799]
 [0.51529322 0.5114032  0.48521184 0.99788113 1.         0.99993399
  0.99966454 0.99489148 0.99165567 0.99135145]
 [0.50860576 0.50527559 0.4787652  0.99779702 0.99993399 1.
  0.99982768 0.99464621 0.99019887 0.98985155]
 [0.50183616 0.49656053 0.47144431 0.99644182 0.99966454 0.99982768
  1.         0.99260447 0.98821162 0.98790065]
 [0.53325834 0.54387295 0.50778061 0.99929262 0.99489148 0.99464621
  0.99260447 1.         0.99271759 0.9920618 ]
 [0.59217674 0.5

With this method, we compute weight $w_t^{i,j}$ of the edge connecting $i$-th and $j$-th node as
$$
w^{i,j} = \left\{\begin{array}{cl}
     \exp \left(-\frac{\operatorname{dist}\left(i, j\right)^{2}}{\gamma}\right) & \operatorname{dist}\left(i, j\right) \leq \delta  \\
     0 & \text{otherwise}
\end{array}\right. ,
$$
where $\operatorname{dist}\left(i, j\right)$ is the distance between $i$-th and $j$-th node, $\gamma$ controls the width of the kernel and $\delta$ is a threshold. Notice that in this case the similarity matrix is not symmetric, since the original preprocessed distance matrix is not symmetric too.

So far so good, now we can build an adjacency matrix out ouf the computed similarity.

The method `dataset.get_connectivity()`, wraps the `dataset.get_similarity()` module, provides useful preprocessing options, and, eventually, returns a sparse (weighted) adjacency matrix.

In [30]:
adj = dataset.get_connectivity(threshold=0.1,
                               include_self=False,
                               normalize_axis=1,
                               layout="edge_index")

With this function call, under the hood we:

1. compute the similarity matrix as before
1. set to $0$ values below $0.1$ (`threshold=0.1`)
1. remove self loops (`include_self=False`)
1. normalize edge weights by the in degree of nodes (`normalize_axis=1`)
1. request the sparse COO layout of PyG (`layout="edge_index"`)

The connectivity matrix with `edge_index` layout is provided in COO format, adopting the convention and notation used in PyTorch Geometric. The returned connectivity is a tuple (`edge_index`, `edge_weight`), where `edge_index` lists all edges as pairs of source-target nodes (dimensions `[2, E]`) and `edge_weight` (dimension `[E]`) stores the corresponding weights.

In [31]:
prova = 70 ** 2
prova

4900

In [32]:
edge_index, edge_weight = adj

print(edge_index.shape)
print(edge_weight)

(2, 2316)
[0.00549751 0.0061028  0.00430149 ... 0.02189975 0.01458481 0.03844911]


In [33]:
prova - edge_index.shape[1]

2584

In [15]:
import numpy as np

In [18]:
edge_weight.shape

(287010,)

In [19]:
np.count_nonzero(edge_weight)

287010

The dense layout corresponds to the weighted adjacency matrix $A \in \mathbb{R}^{N \times N}$. The module `tsl.ops.connectivity` contains useful operations for connectivities, including methods to change layout.

In [9]:
from tsl.ops.connectivity import edge_index_to_adj

dense = edge_index_to_adj(edge_index, edge_weight)
print(dense.shape)

(437, 437)


## Data processing
---

In this section, we will see how to transfer data from a dataset to an inference model (e.g., a spatiotemporal graph neural network).

### The SpatioTemporalDataset

The first class that comes in help is `tsl.data.SpatioTemporalDataset`. This class is a subclass of `torch.utils.data.Dataset` can be considered as wrapper of a `tsl` dataset providing the interface for further processing.

In particular, a `SpatioTemporalDataset` object can be used to achieve the following:
* perform the transformations required to feed the data to a model (e.g., casting to `torch.tensor`, handling different `shapes`)
* handling temporal slicing and windowing for training (e.g., split data in $\left( \text{window}, \text{horizon} \right)$ samples)
* defining the layout of inputs and targets (e.g., how node attributes and exogenous variables are arranged)
* preprocess data before creating a batch

Let's see how to go from a `Dataset` to a `SpatioTemporalDataset`.

In [10]:
from tsl.data import SpatioTemporalDataset

torch_dataset = SpatioTemporalDataset(*dataset.numpy(return_idx=True),
                                      connectivity=adj,
                                      mask=dataset.mask,
                                      horizon=12,
                                      window=12)
print(torch_dataset)

SpatioTemporalDataset(n_samples=8737, n_nodes=437, n_channels=1)


As you can see, the number of samples is not the same as the number of steps we have in the dataset. Indeed, we divided the time series with a sliding window of 12 steps for the input (`window=12`), with a corresponding horizon of 12 steps (`horizon=12`). Thus, a sample spans for a total of $24$ steps. But let's look in details to the layout of a sample:

In [11]:
sample = torch_dataset[0]
print(sample)

Data(input:{x=[12, 437, 1], edge_index=[2, 5398], edge_weight=[5398]}, target:{y=[12, 437, 1]}, has_mask=True)


A sample has 5 main attributes:

* `sample.input` is a mapping of data to be forwarded as input to the model.
* `sample.target` is a mapping of data to be forwarded as target for the loss function of the model.
* `sample.mask` store the `mask`, if any. It is useful for computing the loss only on valid data.
* `sample.transform` is a mapping containing as value a transformation function (e.g., scaling, detrending) and as key the name of the tensor to be transformed.
* `sample.pattern` stores the pattern, i.e., a more informative shape representation, of each tensor in `sample`.

Let's check more in details how each of these attributes is composed.

#### Input and Target

A sample is a `tsl.data.Data` object which stores all that is needed to support inference.
Both `input` and `target` are `tsl.data.DataView` of this storage.
This means that they have the same methods, but contain different subsets keys.
As a results, you cannot store two tensors using the key in `input` and `target`.

In [12]:
sample.input.to_dict()

{'x': tensor([[[138.0000],
          [ 89.0000],
          [105.0000],
          ...,
          [ 61.7920],
          [ 75.6980],
          [ 77.2573]],
 
         [[124.0000],
          [ 85.0000],
          [121.0000],
          ...,
          [ 59.1114],
          [ 70.3417],
          [ 72.0449]],
 
         [[127.0000],
          [ 88.0000],
          [130.0000],
          ...,
          [ 58.2766],
          [ 69.5855],
          [ 71.1539]],
 
         ...,
 
         [[147.0000],
          [133.0000],
          [148.0000],
          ...,
          [ 52.9467],
          [ 64.2753],
          [ 63.6968]],
 
         [[188.0000],
          [ 29.0000],
          [167.0000],
          ...,
          [ 53.2139],
          [ 63.2178],
          [ 63.1341]],
 
         [[212.0000],
          [ 33.3333],
          [178.0000],
          ...,
          [ 55.3667],
          [ 64.9315],
          [ 65.9394]]]),
 'edge_index': tensor([[  0,   0,   0,  ..., 434, 435, 436],
         [  1,   2

In [13]:
sample.target.to_dict()

{'y': tensor([[[229.0000],
          [111.0000],
          [179.0000],
          ...,
          [ 55.8811],
          [ 68.1196],
          [ 66.7615]],
 
         [[240.0000],
          [172.0000],
          [184.0000],
          ...,
          [ 53.8635],
          [ 67.5794],
          [ 67.3180]],
 
         [[240.0000],
          [173.0000],
          [188.0000],
          ...,
          [ 52.7024],
          [ 63.6870],
          [ 63.7933]],
 
         ...,
 
         [[119.0000],
          [ 51.0000],
          [ 50.0000],
          ...,
          [ 61.6110],
          [ 73.4010],
          [ 70.2052]],
 
         [[ 48.0000],
          [ 33.0000],
          [ 36.0000],
          ...,
          [ 67.3310],
          [ 78.3555],
          [ 77.4719]],
 
         [[ 37.0000],
          [ 31.0000],
          [ 34.0000],
          ...,
          [ 63.4985],
          [ 76.7035],
          [ 82.3329]]])}

#### Mask and Transform

`mask` and `transform` are just symbolic links to the corresponding object inside the storage. They also expose properties `has_mask` and `has_transform`.

In [14]:
if sample.has_mask:
    print(sample.mask)
else:
    print("Sample has no mask.")

tensor([[[1],
         [1],
         [1],
         ...,
         [0],
         [0],
         [0]],

        [[1],
         [1],
         [1],
         ...,
         [0],
         [0],
         [0]],

        [[1],
         [1],
         [1],
         ...,
         [0],
         [0],
         [0]],

        ...,

        [[1],
         [1],
         [1],
         ...,
         [0],
         [0],
         [0]],

        [[1],
         [1],
         [1],
         ...,
         [0],
         [0],
         [0]],

        [[1],
         [1],
         [1],
         ...,
         [0],
         [0],
         [0]]], dtype=torch.uint8)


In [15]:
if sample.has_transform:
    print(sample.transform)
else:
    print("Sample has no transform functions.")

Sample has no transform functions.


#### Pattern

The `pattern` mapping can be useful to glimpse on how data are arranged.
The convention we use is the following:

* "b" stands for "batch size"
* "c" stands for "number of channels" (per node)
* "e" stands for "number edges"
* "n" stands for "number of nodes"
* "s" stands for "number of time steps"


In [16]:
sample.pattern

{'x': 's n c',
 'edge_index': '2 e',
 'edge_weight': 'e',
 'mask': 's n c',
 'y': 's n c'}

### The SpatioTemporalDataModule

Usually, before running an experiment there are two quite common preprocessing steps:

* splitting the dataset into training/validation/test sets
* data preprocessing (scaling/normalizing data, detrending)

In `tsl`, these operations are carried out in the `tsl.data.SpatioTemporalDataModule`, which is based on `pytorch-lightning`'s data modules.

Let's see an example

In [17]:
from tsl.data import SpatioTemporalDataModule
from tsl.data.preprocessing import StandardScaler

scalers = {'data': StandardScaler(axis=(0, 1))}

splitter = dataset.get_splitter(val_len=0.1, test_len=0.2)

dm = SpatioTemporalDataModule(
    dataset=torch_dataset,
    scalers=scalers,
    splitter=splitter,
    batch_size=64,
)

print(dm)

SpatioTemporalDataModule(train_len=None, val_len=None, test_len=None, scalers=[data], batch_size=64)


 Eventually one could extend the base datamodule to add further processing in case it is needed.

At this point, the `DataModule` object has not actually performed any processing yet (lazy approach).

We can execute the preprocessing routines by calling `setup` method.

Note that

In [18]:
dm.setup()
print(dm)

SpatioTemporalDataModule(train_len=5141, val_len=576, test_len=2884, scalers=[data], batch_size=64)


After `setup` has been called, the datamodule carries the following operations:

1. Carries out the dataset splitting into training/validation/test sets according to the `splitter` argument.
1. Fits all the `Scalers` on the training data in `dataset` corresponding to the scalers' keys.

#### Scalers

The `tsl.data.preprocessing` package offers several of the most common data normalization techniques under the `tsl.data.preprocessing.Scaler` interface.
They adopt an API similar to `scikit-learn`'s scalers, with `fit`/`transform`/`fit_transform`/`inverse_transform` methods. Check the documentation for more details about this.


## Building a Model
---

In this section, we will see how to build a very simple Graph Neural Network for Spatiotemporal data.
All the neural network code inside `tsl` is under the `tsl.nn` module.


### The NN module

The `tsl.nn` module is organized as follows:

```
tsl
└── nn
    ├── base
    ├── blocks
    ├── layers
    ├── models
    ├── metrics
    ├── ops
    └── utils
```

The 3 most important submodules in it are `layers`, `blocks`, and `models`, ordered by increasing level of abstraction.

#### Layers

A _layer_ is a basic building block for our neural networks. In simple words, a layer takes an input, performs one (or few) operations, and return a transformation of the input. Examples of layers are `DiffConv`, which implements [diffusion convolution](https://arxiv.org/abs/1707.01926), or `LayerNorm`.

#### Blocks

_blocks_ perform more complex transformations or combine several operations. We divide blocks into _encoders_, if they provide a representation of the input in a new space, and _decoders_, if they produce a meaningful output from a representation.

#### Models

We wrap a series of operations, represented by blocks and/or layers, in a _model_. A model is meant to takes as input a batch `SpatioTemporalDataset`'s items and return the desired output.

Let's create a very simple model with a RNN encoder and a nonlinear GCN readout.
To do so, we import `RNN` from the encoders and `GCNDecoder` from the decoders in the `tsl.nn.blocks` module.

In [19]:
from tsl.nn.blocks.encoders import RNN
from tsl.nn.blocks.decoders import GCNDecoder


class TimeThenSpaceModel(torch.nn.Module):
    def __init__(self,
                 input_size,
                 hidden_size,
                 rnn_layers,
                 gcn_layers,
                 horizon):
        super(TimeThenSpaceModel, self).__init__()

        self.input_encoder = torch.nn.Linear(input_size, hidden_size)

        self.encoder = RNN(input_size=hidden_size,
                           hidden_size=hidden_size,
                           n_layers=rnn_layers)

        self.decoder = GCNDecoder(
            input_size=hidden_size,
            hidden_size=hidden_size,
            output_size=input_size,
            horizon=horizon,
            n_layers=gcn_layers
        )

    def forward(self, x, edge_index, edge_weight):
        # x: [batches steps nodes channels]
        x = self.input_encoder(x)

        x = self.encoder(x, return_last_state=True)

        return self.decoder(x, edge_index, edge_weight)

Fine, we have a model and we have data, now let's train it!

## Setting up training
---

### The Predictor

In `tsl`, inference engines are implemented as a [`LightningModule`](https://pytorch-lightning.readthedocs.io/en/latest/api/pytorch_lightning.core.LightningModule.html#pytorch_lightning.core.LightningModule). `tsl.predictors.Predictor` is a base class that can be extended to build more complex forecasting approaches.
These modules are meant to wrap deep models in order to ease training and inference phases.

In [20]:
from tsl.nn.metrics.metrics import MaskedMAE, MaskedMAPE
from tsl.predictors import Predictor

loss_fn = MaskedMAE(compute_on_step=True)

metrics = {'mae': MaskedMAE(compute_on_step=False),
           'mape': MaskedMAPE(compute_on_step=False),
           'mae_at_15': MaskedMAE(compute_on_step=False, at=2),  # `2` indicated the third time step,
                                                                 # which correspond to 15 minutes ahead
           'mae_at_30': MaskedMAE(compute_on_step=False, at=5),
           'mae_at_60': MaskedMAE(compute_on_step=False, at=11), }

model_kwargs = {
    'input_size': dm.n_channels,  # 1 channel
    'horizon': dm.horizon,  # 12, the number of steps ahead to forecast
    'hidden_size': 32,
    'rnn_layers': 1,
    'gcn_layers': 2
}

# setup predictor
predictor = Predictor(
    model_class=TimeThenSpaceModel,
    model_kwargs=model_kwargs,
    optim_class=torch.optim.Adam,
    optim_kwargs={'lr': 0.001},
    loss_fn=loss_fn,
    metrics=metrics
)

  rank_zero_warn(


Now let's finalize the last details. We make use of [TensorBoard](https://www.tensorflow.org/tensorboard/) to log and visualize metrics.

In [21]:
# from pytorch_lightning.loggers import TensorBoardLogger


# logger = TensorBoardLogger(save_dir="logs", name="tsl_intro", version=0)

In [22]:
from pytorch_lightning.loggers import CSVLogger

logger = CSVLogger(save_dir='prova' , name='savedata')

In [None]:

#%load_ext tensorboard
#%tensorboard --logdir logs

We let `pytorch_lightning.Trainer` handle the dirty work for us. We can directly pass the datamodule to the trainer for fitting.

If this is the case, the trainer will call the `setup` method, and then load train and validation sets.

In [24]:
import pytorch_lightning as pl
from pytorch_lightning.callbacks import ModelCheckpoint

checkpoint_callback = ModelCheckpoint(
    dirpath='logs',
    save_top_k=1,
    monitor='val_mae',
    mode='min',
)

trainer = pl.Trainer(max_epochs=3,
                     logger=logger,
                     gpus=1 if torch.cuda.is_available() else None,
                     limit_train_batches=100,
                     callbacks=[checkpoint_callback])

trainer.fit(predictor, datamodule=dm)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")

  | Name          | Type               | Params
-----------------------------------------------------
0 | loss_fn       | MaskedMAE          | 0     
1 | train_metrics | MetricCollection   | 0     
2 | val_metrics   | MetricCollection   | 0     
3 | test_metrics  | MetricCollection   | 0     
4 | model         | TimeThenSpaceModel | 12.0 K
-----------------------------------------------------
12.0 K    Trainable params
0         Non-trainable params
12.0 K    Total params
0.048     Total estimated model params size (MB)


Sanity Checking: 0it [00:00, ?it/s]

  rank_zero_warn(


                                                                           

  rank_zero_warn(


Epoch 2: 100%|██████████| 89/89 [10:27<00:00,  7.05s/it, loss=26.5, v_num=1, val_mae=25.10, val_mae_at_15=18.90, val_mae_at_30=25.80, val_mae_at_60=33.30, val_mape=0.620, train_mae=26.30, train_mae_at_15=20.60, train_mae_at_30=27.30, train_mae_at_60=33.40, train_mape=0.616]   


## Testing
---


Now let's see how the trained model behaves on new unseen data.

In [25]:
predictor.load_model(checkpoint_callback.best_model_path)
predictor.freeze()

performance = trainer.test(predictor, datamodule=dm)

  rank_zero_warn(


Testing DataLoader 0: 100%|██████████| 46/46 [00:46<00:00,  1.01s/it]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        test_loss           24.076353073120117
        test_mae            24.635517120361328
     test_mae_at_15         19.228456497192383
     test_mae_at_30         25.976608276367188
     test_mae_at_60          31.25102424621582
        test_mape           0.6429881453514099
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


Cool! We succeeded in creating first a simple, yet effective, SpatioTemporal model!

We are now __tsl ninjas__. We learned how to:

* Load benchmark datasets
* Organize data for processing
* Preprocess the data
* Build a Spatiotemporal GNN
* Train and evaluate the model

We hope you enjoyed this introduction to `tsl`, do not hesitate to contact us if you have any question or problem while using it.

The tsl team.

🧡