# AFGRL Tutorial
#### This tutorial illustrates the use of AFGRL algorithm [Augmentation-Free Self-Supervised Learning on Graphs](https://arxiv.org/abs/2112.02472), an augmentation-free self-supervised learning framework, which generates an alternative view of a graph by discovering nodes that share the local structural information and the global semantics with the graph.
#### The tutorial is organized as folows:
#### 1. Preprocessing Data and Loading Configuration
#### 2. Training the model
#### 3. Evaluating the model

## 1. Preprocessing Data and Loading Configuration 
#### First, we load the configuration from yml file and the dataset. 
#### For easy usage, we conduct experiments to search for the best parameter across three datasets and find the proper value of parameters such that the performance of implemented AFGRL is similar to the value reported in the paper. Note here we also load graph augmentor in AFGRL.

In [1]:
from src.augment import RandomMask, RandomDropEdge, RandomDropNode, AugmentSubgraph, AugmentorList, AugmentorDict, NeighborSearch_AFGRL
from src.methods import AFGRLEncoder, AFGRL
from src.trainer import SimpleTrainer
from torch_geometric.loader import DataLoader
from src.transforms import NormalizeFeatures, GCNNorm, Edge2Adj, Compose
from src.datasets import Planetoid, Amazon, WikiCS
from src.evaluation import LogisticRegression
import torch, copy
import torch_geometric
from src.data.utils import sparse_mx_to_torch_sparse_tensor
from src.data.data_non_contrast import Dataset
import numpy as np
from src.config import load_yaml


torch.manual_seed(0)
np.random.seed(0)
torch.cuda.manual_seed_all(0)
config = load_yaml('./configuration/afgrl_cs.yml')
# config = load_yaml('./configuration/afgrl_wikics.yml')
device = torch.device("cuda:{}".format(config.gpu_idx) if torch.cuda.is_available() and config.use_cuda else "cpu")

# WikiCS, cora, citeseer, pubmed, photo, computers, cs, and physics
data_name = config.dataset.name
root = config.dataset.root

dataset = Dataset(root=root, name=data_name)
if not hasattr(dataset, "adj_t"):
    data = dataset.data
    dataset.data.adj_t = torch.sparse.FloatTensor(data.edge_index, torch.ones_like(data.edge_index[0]), [data.x.shape[0], data.x.shape[0]])
data_loader = DataLoader(dataset)
data = dataset.data
# data.x[7028] = torch.zeros((300))
adj_ori_sparse = torch.sparse.FloatTensor(data.edge_index, torch.ones_like(data.edge_index[0]), [data.x.shape[0], data.x.shape[0]]).to(device)
# Augmentation
augment = NeighborSearch_AFGRL(device=device, num_centroids=config.model.num_centroids, num_kmeans=config.model.num_kmeans, clus_num_iters=config.model.clus_num_iters)

  from .autonotebook import tqdm as notebook_tqdm


## 2. Training the Model
#### In the second step, we first initialize the parameters of AFGRL. The backbone of the encoder is GCN.
#### Some specific hyper-parameters in the model includes, topk: the number of neighbors for nearest neighborhood search, num_centroids: number of centroids in K-means Clustering of the augmentor, num_kmeans: the number of iterations for K-means Clustering.
#### You may replace the encoder with the user-defined encoder. Please refer to the framework of the encoder in methods/afgrl.py. Keep in mind that the encoder consists of class initialization, forward function, and get_embs() function.

In [2]:
# ------------------- Method -----------------
if data_name=="cora":
    student_encoder = AFGRLEncoder(in_channel=dataset.x.shape[1], hidden_channels=[2048])
elif data_name=="photo":
    student_encoder = AFGRLEncoder(in_channel=dataset.x.shape[1], hidden_channels=[512, 512])
elif data_name=="wikics":
    student_encoder = AFGRLEncoder(in_channel=dataset.x.shape[1], hidden_channels=[512, 256])
elif data_name=="cs":
    student_encoder = AFGRLEncoder(in_channel=dataset.x.shape[1], hidden_channels=[512, 256])
teacher_encoder = copy.deepcopy(student_encoder)

method = AFGRL(student_encoder=student_encoder, teacher_encoder = teacher_encoder, data_augment=augment, adj_ori = adj_ori_sparse, topk=config.model.topk)


#### We train the model by calling the trainer.train() function. Please run the code in examples for full demonstration.

In [5]:
# ------------------ Trainer --------------------
trainer = SimpleTrainer(method=method, data_loader=data_loader, device=device, use_ema=True, 
                        moving_average_decay=config.optim.moving_average_decay, lr=config.optim.base_lr, 
                        weight_decay=config.optim.weight_decay, n_epochs=config.optim.max_epoch, dataset=dataset,
                        patience=config.optim.patience)
trainer.train()

## 3. Evaluating the performance of AFGRL
#### In the last step, we evaluate the performance of AFGRL. We first get the embedding by calling method.get_embs() function and then use logistic regression to evaluate its performance. 
#### The more choice of classifiers can be found in [classifier.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/classifier.py), including SVM, RandomForest, etc. 
#### Besides, other evaluation methods in an unsupervised setting could be found in [cluster.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/cluster.py) or [sim_search.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/sim_search.py), including K-means method or similarity search.

In [7]:
# ------------------ Evaluator -------------------
method.eval()
data_pyg = dataset.data.to(method.device)
embs = method.get_embs(data_pyg, data_pyg.edge_index).detach()

lg = LogisticRegression(lr=0.01, weight_decay=0, max_iter=100, n_run=20, device=device)
lg(embs=embs, dataset=data_pyg)