# InfoGraph Tutorial
#### This tutorial illustrates the use of MVGRL algorithm [Contrastive Multi-View Representation Learning on Graphs](https://proceedings.mlr.press/v119/hassani20a/hassani20a.pdf), an self-supervised node and graph-level representation learning method,  which maximizes the mutual information between the original graph and its diffused counterpart.
#### The tutorial is organized as folows:
#### 1. [Preprocessing Data and Loading Configuration](mvgrl.ipynb#L48)
#### 2. [Training the model](mvgrl.ipynb#L100)
#### 3. [Evaluating the model](mvgrl.ipynb#L206)

## 1. Preprocessing Data and Loading Configuration 
#### First, we load the configuration from yml file and the dataset. Then data augmentation (ppr for Persaonlized Page Rank or heat diffusion kernel) is performed on the loaded dataset.
#### For easy usage, we conduct experiments to search for the best parameter across three datasets and find the proper value of parameters such that the performance of implemented MVGRL is similar to the value reported in the paper.

In [None]:
from src.augment import ComputePPR, ComputeHeat
from src.methods import MVGRL, MVGRLEncoder
from src.trainer import SimpleTrainer
from torch_geometric.loader import DataLoader
from src.transforms import NormalizeFeatures, GCNNorm, Edge2Adj, Compose
from src.datasets import Planetoid, Entities, Amazon, WikiCS, Coauthor
from src.evaluation import LogisticRegression
import torch
from src.config import load_yaml
from src.utils.create_data import create_masks
from src.utils.add_adj import add_adj_t

# load the configuration file
# config = load_yaml('./configuration/mvgrl_amazon.yml')
# config = load_yaml('./configuration/mvgrl_coauthor.yml')
# config = load_yaml('./configuration/mvgrl_wikics.yml')
config = load_yaml('./configuration/mvgrl_cora.yml')
torch.manual_seed(config.torch_seed)
device = torch.device("cuda:{}".format(config.gpu_idx) if torch.cuda.is_available() and config.use_cuda else "cpu")

# data
if config.dataset.name == 'cora':
    pre_transforms = Compose([NormalizeFeatures(ord=1), Edge2Adj(norm=GCNNorm(add_self_loops=1))])
    dataset = Planetoid(root='pyg_data', name='cora', pre_transform=pre_transforms)
elif config.dataset.name == 'Amazon':
    pre_transforms = NormalizeFeatures(ord=1)
    dataset = Amazon(root='pyg_data', name='Photo', pre_transform=pre_transforms)
elif config.dataset.name == 'WikiCS':
    pre_transforms = NormalizeFeatures(ord=1)
    dataset = WikiCS(root='pyg_data', pre_transform=pre_transforms)
elif config.dataset.name == 'coauthor':
    pre_transforms = NormalizeFeatures(ord=1)
    dataset = Coauthor(root='pyg_data', name='CS', pre_transform=pre_transforms)
else:
    raise 'please specify the correct dataset root'
if config.dataset.name in ['Amazon', 'WikiCS', 'coauthor']:
    dataset.data = create_masks(dataset.data, config.dataset.name)
dataset = add_adj_t(dataset)
data_loader = DataLoader(dataset, batch_size=config.model.batch_size)

# Augmentation
aug_type = config.model.aug_type
if aug_type == 'ppr':
    augment_neg = ComputePPR(alpha = config.model.alpha)
elif aug_type == 'heat':
    augment_neg = ComputeHeat(t = config.model.t)
else:
    assert False

## 2. Training the Model
#### In the second step, we first initialize the parameters of MVGRL. The base encoder is a single-layer GCN, followed by MLP.
#### You may replace the encoder with the user-defined encoder. Please refer to the framework of the encoder in [mvgrl.py](https://github.com/IDEA-ISAIL/ssl/blob/MVGRL/src/methods/mvgrl.py#L157).

In [None]:
# ------------------- Method -----------------
encoder = MVGRLEncoder(in_channels=config.model.in_channels, hidden_channels=config.model.hidden_channels)
method = MVGRL(encoder=encoder, diff=augment_neg, hidden_channels=config.model.hidden_channels)
method.augment_type = aug_type

#### We train the model by calling the trainer.train() function.

In [None]:
# ------------------ Trainer --------------------
trainer = SimpleTrainer(method=method, data_loader=data_loader, device=device, n_epochs=config.optim.max_epoch, patience=config.optim.patience)
trainer.train()

## 3. Evaluating the performance of MVGRL
#### In the last step, we evaluate the performance of MVGRL. We first get the embedding by calling method.get_embs() function and then use logistic regression to evaluate its performance. 
#### The more choice of classifiers can be found in [classifier.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/classifier.py), including SVM, RandomForest, etc. 
#### Besides, other evaluation methods in an unsupervised setting could be found in [cluster.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/cluster.py) or [sim_search.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/sim_search.py), including K-means method or similarity search.

In [None]:
# ------------------ Evaluator -------------------
data_pyg = dataset.data.to(method.device)
data_neg = augment_neg(data_pyg).to(method.device)
_, _, h_1, h_2, _, _ = method.get_embs(data_pyg.x, data_neg.x, data_pyg.adj_t, data_neg.adj_t, False)
embs = (h_1 + h_2).detach()

lg = LogisticRegression(lr=0.01, weight_decay=0, max_iter=100, n_run=50, device=device)
lg(embs=embs, dataset=data_pyg)