# Quickstart

<img src="../img/colab.svg" alt="Colab icon" width="30">  <span> &thinsp;</span> [Run in Google Colab](https://colab.research.google.com/drive/1wo4iZVCchqMDqUqpYK3ogtzb-ebMAS7g)
<span> &emsp;</span>
<img src="../img/github.svg" alt="Colab icon" width="30">  <span> &thinsp;</span> [View on GitHub](https://github.com/ADAPT-uiuc/tglite/blob/main/docs/source/tutorial/quickstart.ipynb)

This section runs through the API for common practice to perform temporal graph learning. In this tutorial, we train [TGAT](https://arxiv.org/abs/2002.07962) on [Wikipedia](https://snap.stanford.edu/jodie/) dataset as an example.

## Basic settings

TGLite uses [PyTorch](https://pytorch.org/) as the backend to perform tensor operations. Here we wrap some helper functions such as dataset handling in [support.py](https://github.com/ADAPT-uiuc/tglite/blob/main/examples/support.py).

In [1]:
import torch
import tglite as tg

import support

Next we set the runtime parameters, including hyper-parameters for TGAT training and system-level optimization configurations. TGLite provides several semantic-preserving system optimization options for CTDG-based models like TGAT, including deduplication, memoization, and time-precomputation. Here we enable all the optimizations with `OPT_DEDUP`, `OPT_CACHE` and `OPT_TIME` being `True`, and set the related cache size. By setting `MOVE = True`, we will make all feature data reside on GPU device memory to reduce data movements.

In [2]:
DATA: str = 'wiki'  # 'wiki', 'reddit', 'mooc', 'mag', 'lastfm', 'gdelt', 'wiki-talk'
DATA_PATH: str = '/shared'
EPOCHS: int = 3
BATCH_SIZE: int = 200
LEARN_RATE: float = 0.0001
DROPOUT: float = 0.1
N_LAYERS: int = 2
N_HEADS: int = 2
N_NBRS: int = 20
DIM_TIME: int = 100
DIM_EMBED: int = 100
N_THREADS: int = 32
SAMPLING: str = 'recent'  # 'recent'or 'uniform'
OPT_DEDUP = True
OPT_CACHE = True
OPT_TIME = True
OPT_ALL = True
OPT_DEDUP: bool = OPT_DEDUP or OPT_ALL
OPT_CACHE: bool = OPT_CACHE or OPT_ALL
OPT_TIME: bool = OPT_TIME or OPT_ALL
CACHE_LIMIT: int = int(2e6)
TIME_WINDOW: int = int(1e4)

MOVE = True
GPU = 0
SEED = 1
PREFIX = ''

Then, specify the training device and the random seed.

In [3]:
device = support.make_device(GPU)
model_path = support.make_model_path('tgat', PREFIX, DATA)
if SEED >= 0:
    support.set_seed(SEED)

<p>&nbsp;</p>

## Loading temporal graph data

[TGraph](../api/python/tglite.graph.rst) object serves as the container for node and edge tensor data. We load graph data to create a `TGraph` object `g` first, and load the features next. `TGraph` also provides the functions to manage graph data. Here, we set computation device to GPU 0 using `g.set_compute(device)`. With `g.move_data(device)`, we move graph features to GPU 0 as well.

In [4]:
import os

g = support.load_graph(os.path.join(DATA_PATH, f'data/{DATA}/edges.csv'))
support.load_feats(g, DATA, DATA_PATH)
dim_efeat = 0 if g.efeat is None else g.efeat.shape[1]
dim_nfeat = g.nfeat.shape[1]

g.set_compute(device)
if MOVE:
    g.move_data(device)

num edges: 157474
num nodes: 9228
edge feat: torch.Size([157474, 172])
node feat: torch.Size([9228, 172])


<p>&nbsp;</p>

## Runtime setup

TGLite uses [TContext](../api/python/tglite.context.rst) as the settings and scratch space for runtime. Here, a `TContext ctx` is initialized with the `TGraph` object `g`. Then, `ctx.need_sampling(True)` will create a TCSR structure inside `TGraph g` for more efficient sampling. Next, we invoke several functions of `ctx` to perform optimization settings.

In [5]:
ctx = tg.TContext(g)
ctx.need_sampling(True)
ctx.enable_embed_caching(OPT_CACHE, DIM_EMBED)
ctx.enable_time_precompute(OPT_TIME)
ctx.set_cache_limit(CACHE_LIMIT)
ctx.set_time_window(TIME_WINDOW)

<p>&nbsp;</p>

## Creating temporal sampler

TGLite provides a [TSampler](../api/python/tglite.sampler.rst) module that exposes 1-hop temporal sampling. Here, by setting `num_threads`, we can control how many threads are used to perform parallel sampling. The sampler will evenly distribute the target nodes in the mini-batch to
each thread. 

In [6]:
sampler = tg.TSampler(N_NBRS, strategy=SAMPLING, num_threads=N_THREADS)

<p>&nbsp;</p>

## Creating models

A [TBatch](../api/python/tglite.batch.rst) object represents a batch of temporal edges to process, which is passed to `TGAT.forward()` as the input.
With a batch, a head `TBlock` is created. [TBlock](../api/python/tglite.block.rst) is the centerpiece of TGLite. A block essentially captures the 1-hop message-flow dependencies between target node-time pairs (i.e. destination nodes) and their temporally sampled neighbors (i.e. source nodes), along with their respective edges.
What's more, TGLite use a doubly-linked list structure for the blocks, each representing one layer of GNN.
Here, we iteratively perform sampling and generate TBlocks.

Another feature TGLite provides to allow users to apply optimizations to `TBlock` before sampling its neighbors so to minimize the size of the following subgraphs and thus minimize potential computations.
Here inside the loops, we invoke `dedup()` and `cache()` provided by `tglite.op` module to perform such optimizations, and then sample with passed `TSampler`.

Once the full linked list of the TBlocks are created, we can load features and perform aggregation to compute node embeddings easily with functions provided by `tglite.op`.
Here we directly use `tglite.nn.TemporalAttnLayer` to construct the TGAT model.

In [7]:
from torch import nn, Tensor
from tglite.nn import TemporalAttnLayer

class TGAT(nn.Module):
    def __init__(self, ctx: tg.TContext,
                dim_node: int, dim_edge: int, dim_time: int, dim_embed: int,
                sampler: tg.TSampler, num_layers=2, num_heads=2, dropout=0.1,
                dedup: bool = True):
        super().__init__()
        self.ctx = ctx
        self.num_layers = num_layers
        self.attn = nn.ModuleList([
            TemporalAttnLayer(ctx,
                num_heads=num_heads,
                dim_node=dim_node if i == 0 else dim_embed,
                dim_edge=dim_edge,
                dim_time=dim_time,
                dim_out=dim_embed,
                dropout=dropout)
            for i in range(num_layers)])
        self.sampler = sampler
        self.edge_predictor = support.EdgePredictor(dim=dim_embed)
        self.dedup = dedup

    def forward(self, batch: tg.TBatch) -> Tensor:
        head = batch.block(self.ctx)
        for i in range(self.num_layers):
            tail = head if i == 0 \
                else tail.next_block(include_dst=True)
            tail = tg.op.dedup(tail) if self.dedup else tail
            tail = tg.op.cache(self.ctx, tail.layer, tail)
            tail = self.sampler.sample(tail)

        tg.op.preload(head, use_pin=True)
        if tail.num_dst() > 0:
            tail.dstdata['h'] = tail.dstfeat()
            tail.srcdata['h'] = tail.srcfeat()
        embeds = tg.op.aggregate(head, list(reversed(self.attn)), key='h')
        del head
        del tail

        src, dst, neg = batch.split_data(embeds)
        scores = self.edge_predictor(src, dst)
        if batch.neg_nodes is not None:
            scores = (scores, self.edge_predictor(src, neg))

        return scores

Now that we've defined the TGAT model, we can proceed to instantiate a new TGAT model with pre-set parameters and transfer it to GPU 0.

In [8]:
model = TGAT(ctx,
    dim_node=dim_nfeat,
    dim_edge=dim_efeat,
    dim_time=DIM_TIME,
    dim_embed=DIM_EMBED,
    sampler=sampler,
    num_layers=N_LAYERS,
    num_heads=N_HEADS,
    dropout=DROPOUT,
    dedup=OPT_DEDUP,)
model = model.to(device)

<p>&nbsp;</p>

## Training models

Here we use [BCEWithLogitsLoss](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) as the loss function and [Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) as the optimizer.

In [9]:
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=LEARN_RATE)

Data is splitted into training set(70%), validating set(15%) and testing set(15%). 
`neg_sampler` randomly picks target nodes as negative samples.
Then, we launch a `support.LinkPredTrainer` to train the model. 

In [10]:
import numpy as np

train_end, val_end = support.data_split(g.num_edges(), 0.7, 0.15)
neg_sampler = lambda size: np.random.randint(0, g.num_nodes(), size)
trainer = support.LinkPredTrainer(
    ctx, model, criterion, optimizer, neg_sampler,
    EPOCHS, BATCH_SIZE, train_end, val_end,
    model_path, None)

trainer.train()
trainer.test()

epoch 0:


  loss:293.4295 val ap:0.9739 val auc:0.9782
  epoch | total:13.45s loop:11.98s eval:1.47s
   loop | forward:6.84s backward:5.08s sample:0.68s prep_batch:0.06s prep_input:0.46s post_update:0.00s
   comp | mem_update:0.00s time_zero:0.99s time_nbrs:0.65s self_attn:3.34s
epoch 1:
  loss:170.0556 val ap:0.9819 val auc:0.9843
  epoch | total:17.91s loop:16.31s eval:1.58s
   loop | forward:7.48s backward:8.73s sample:0.86s prep_batch:0.09s prep_input:0.55s post_update:0.00s
   comp | mem_update:0.00s time_zero:0.36s time_nbrs:1.19s self_attn:3.63s
epoch 2:
  loss:142.4712 val ap:0.9833 val auc:0.9861
  epoch | total:19.56s loop:17.82s eval:1.72s
   loop | forward:8.03s backward:9.68s sample:0.88s prep_batch:0.09s prep_input:0.58s post_update:0.00s
   comp | mem_update:0.00s time_zero:0.46s time_nbrs:1.43s self_attn:3.76s
best model at epoch 2
loading saved checkpoint and testing model...
  test time:1.64s AP:0.9798 AUC:0.9827


To see and run more TGNN models with `tglite`, see [Running Examples](../install/index.rst#running-examples).