# GraphMAE Tutorial
#### This tutorial illustrates the use of GraphMAE algorithm [GraphMAE:Self-SupervisedMaskedGraphAutoencoders](https://arxiv.org/pdf/2205.10803.pdf), a masked graph autoencoder method for self-supervised graph representation learning. It focuses on feature reconstruction with both a masking strategy and scaled cosine error that benefit the robust training of GraphMAE.
#### The tutorial is organized as folows:
#### 1. [Preprocessing Data and Loading Configuration](GraphMAE.ipynb#L6)
#### 2. [Training the model](GraphMAE.ipynb#L7)
#### 3. [Evaluating the model](GraphMAE.ipynb#L8)

## 1. Preprocessing Data and Loading Configuration 
#### First, we load the configuration from yml file and the dataset. 
#### For easy usage, we conduct experiments to search for the best parameter across three datasets and find the proper value of parameters such that the performance of implemented GraphMAE is similar to the value reported in the paper.

In [None]:
from src.methods.graphmae import GraphMAE, EncoderDecoder, load_graph_classification_dataset, setup_loss_fn, collate_fn
from src.trainer import SimpleTrainer
from dgl.dataloading import GraphDataLoader
from torch.utils.data.sampler import SubsetRandomSampler
from dgl.nn.pytorch.glob import SumPooling, AvgPooling, MaxPooling
import torch
import numpy as np
from src.config import load_yaml
import os
from src.evaluation import LogisticRegression

config = load_yaml('./configuration/graphmae_mutag.yml')
# config = load_yaml('./configuration/graphmae_imdb_b.yml')
# config = load_yaml('./configuration/graphmae_imdb_m.yml')
torch.manual_seed(config.torch_seed)
np.random.seed(config.torch_seed)
device = torch.device("cuda:{}".format(config.gpu_idx) if torch.cuda.is_available() and config.use_cuda else "cpu")

current_folder = os.path.abspath('')
# path = os.path.join(os.path.dirname(os.path.realpath(__file__)), config.dataset.root, config.dataset.name)
path = os.path.join(os.path.dirname(os.path.abspath('')), config.dataset.root, config.dataset.name)

# -------------------- Data --------------------
dataset, num_features = load_graph_classification_dataset(config.dataset.name, raw_dir=path)
train_idx = torch.arange(len(dataset))
train_sampler = SubsetRandomSampler(train_idx)
eval_loader = GraphDataLoader(dataset, collate_fn=collate_fn, batch_size=config.dataset.batch_size, shuffle=False)
in_channels = max(num_features, 1)


## 2. Training the Model
#### In the second step, we first initialize the parameters of GraphMAE. The backbone of the encoder is Graph Isomorphism Network (GIN), while you may change the encoder type to other GNNs, such as 'gat', 'dotgat', 'GCN' or 'MLP' (2-layer MLP). 
#### You may replace the encoder with the user-defined encoder. Please refer to the framework of encoder in the directory (./src/methods/graphmae.py#L16). Keep in mind that the encoder consists of class initialization and forward function.

In [None]:
# ------------------- Method -----------------
pooling = config.model.pooling
if pooling == "mean":
    pooler = AvgPooling()
elif pooling == "max":
    pooler = MaxPooling()
elif pooling == "sum":
    pooler = SumPooling()
else:
    raise NotImplementedError
encoder = EncoderDecoder(GNN=config.model.encoder_type, enc_dec="encoding", in_channels=in_channels,
                         hidden_channels=config.model.hidden_channels, num_layers=config.model.encoder_layers)
decoder = EncoderDecoder(GNN=config.model.decoder_type, enc_dec="decoding", in_channels=config.model.hidden_channels,
                         hidden_channels=in_channels, num_layers=config.model.decoder_layers)
loss_function = setup_loss_fn(config.model.loss_fn, alpha_l=config.model.alpha_l)
method = GraphMAE(encoder=encoder, decoder=decoder, hidden_channels=config.model.hidden_channels, argument=config, loss_function=loss_function)
method.device = device


#### We train the model by calling trainer.train() function.

In [None]:
# ------------------ Trainer --------------------
trainer = SimpleTrainer(method=method, data_loader=dataset, device=device, n_epochs=config.optim.max_epoch,
                        lr=config.optim.base_lr)
trainer.train()


## 3. Evaluating the performance of GraphMAE
#### In the last step, we evaluate the performance of GraphMAE. We first get the embedding of by calling method.get_embeddings() function and then we use logistic regression to evaluate its performance. The more choice of classifier could be found in the directory (./src/evaluation/classifier.py), including svm, randomforest, etc. Besides, other evaluation methods in unsupervised setting could be found in the directory (./src/evaluation/cluster.py or ./src/evaluation/sim_search.py), including kmean method or similarity search.

In [None]:
# ------------------ Evaluation -------------------
x, y = method.get_embeddings(pooler, eval_loader)
y = y.reshape(-1, )
eval_loader.y = torch.tensor(y).long()
eval_loader.x = torch.tensor(x)
lg = LogisticRegression(lr=config.classifier.base_lr, weight_decay=config.classifier.weight_decay,
                        max_iter=config.classifier.max_epoch, n_run=1, device=device)
lg(embs=torch.tensor(x), dataset=eval_loader)
