# BGRL Tutorial
#### This tutorial illustrates the use of BGRL (Bootstrapped Graph Latents) algorithm [Large-Scale Representation Learning on Graphs via Bootstrapping](https://arxiv.org/abs/2102.06514), a graph representation learning method that learns by predicting alternative augmentations of the input.
#### The tutorial is organized as folows:
#### 1. Preprocessing Data and Loading Configuration
#### 2. Training the model
#### 3. Evaluating the model

## 1. Preprocessing Data and Loading Configuration 
#### First, we load the configuration from yml file and the dataset. 
#### For easy usage, we conduct experiments to search for the best parameter across three datasets and find the proper value of parameters such that the performance of implemented BGRL is similar to the value reported in the paper. Note that we also define the random augmentations used in BGRL.

In [1]:
from src.augment import RandomMask, RandomDropEdge, RandomDropNode, AugmentSubgraph, AugmentorList, AugmentorDict, RandomMaskChannel
from src.methods import BGRL, BGRLEncoder
from src.trainer import SimpleTrainer
from torch_geometric.loader import DataLoader
from src.transforms import NormalizeFeatures, GCNNorm, Edge2Adj, Compose
from src.evaluation import LogisticRegression
from src.data.data_non_contrast import Dataset
import torch, copy
import numpy as np
from src.config import load_yaml


torch.manual_seed(0)
np.random.seed(0)
torch.cuda.manual_seed_all(0)
config = load_yaml('./configuration/bgrl_cs.yml')
device = torch.device("cuda:{}".format(config.gpu_idx) if torch.cuda.is_available() and config.use_cuda else "cpu")
# WikiCS, cora, citeseer, pubmed, photo, computers, cs, and physics
data_name = config.dataset.name
root = config.dataset.root

dataset = Dataset(root=root, name=data_name)
if not hasattr(dataset, "adj_t"):
    data = dataset.data
    dataset.data.adj_t = torch.sparse.FloatTensor(data.edge_index, torch.ones_like(data.edge_index[0]), [data.x.shape[0], data.x.shape[0]])
data_loader = DataLoader(dataset)
# dataset.data.x[7028] = torch.zeros((300))

# Augmentation
augment_1 = AugmentorList([RandomDropEdge(config.model.aug_edge_1), RandomMaskChannel(config.model.aug_mask_1)])
augment_2 = AugmentorList([RandomDropEdge(config.model.aug_edge_2), RandomMaskChannel(config.model.aug_mask_2)])
augment = AugmentorDict({"augment_1":augment_1, "augment_2":augment_2})

  from .autonotebook import tqdm as notebook_tqdm


## 2. Training the Model
#### In the second step, we first initialize the parameters of BGRL. The backbone of the encoder is GCN.
#### Some specific hyper-parameters in the model are the augmentation ratios for the edge drop and feature mask augmentations.
#### You may replace the encoder with the user-defined encoder. Please refer to the framework of the encoder in methods/afgrl.py. Keep in mind that the encoder consists of class initialization, forward function, and get_embs() function.

In [3]:
if data_name=="cora":
    student_encoder = BGRLEncoder(in_channel=dataset.x.shape[1], hidden_channels=[2048])
elif data_name=="photo":
    student_encoder = BGRLEncoder(in_channel=dataset.x.shape[1], hidden_channels=[512, 256])
elif data_name=="wikics":
    student_encoder = BGRLEncoder(in_channel=dataset.x.shape[1], hidden_channels=[512, 256])
elif data_name=="cs":
    student_encoder = BGRLEncoder(in_channel=dataset.x.shape[1], hidden_channels=[256])

teacher_encoder = copy.deepcopy(student_encoder)

method = BGRL(student_encoder=student_encoder, teacher_encoder = teacher_encoder, data_augment=augment)


#### We train the model by calling the trainer.train() function. Please run the code in examples for full demonstration.

In [5]:
# ------------------ Trainer --------------------
trainer = SimpleTrainer(method=method, data_loader=data_loader, device=device, use_ema=True, \
                        moving_average_decay=config.optim.moving_average_decay, lr=config.optim.base_lr, 
                        weight_decay=config.optim.weight_decay, dataset=dataset, n_epochs=config.optim.max_epoch,\
                        patience=config.optim.patience)
trainer.train()

## 3. Evaluating the performance of BGRL
#### In the last step, we evaluate the performance of BGRL. We first get the embedding by calling method.get_embs() function and then use logistic regression to evaluate its performance. 
#### The more choice of classifiers can be found in [classifier.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/classifier.py), including SVM, RandomForest, etc. 
#### Besides, other evaluation methods in an unsupervised setting could be found in [cluster.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/cluster.py) or [sim_search.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/sim_search.py), including K-means method or similarity search.

In [6]:
# ------------------ Evaluator -------------------
method.eval()
data_pyg = dataset.data.to(method.device)
embs = method.get_embs(data_pyg, data_pyg.edge_index).detach()

lg = LogisticRegression(lr=0.01, weight_decay=0, max_iter=100, n_run=20, device=device)
lg(embs=embs, dataset=data_pyg)