# Sugrl Tutorial
#### This tutorial illustrates the use of Sugrl algorithm [Simple Unsupervised Graph Representation Learning](https://ojs.aaai.org/index.php/AAAI/article/view/20748), a novel unsupervised graph representation learning method that employs a novel multiplet loss to leverage the complementary nature of structural and neighbor information, thereby enhancing inter-class variation. Additionally, it incorporates an upper bound loss to maintain a finite distance between positive and anchor embeddings, effectively reducing intra-class variation. This approach sidesteps the need for data augmentation and discriminators, enabling the production of efficient, low-dimensional embeddings.
#### The tutorial is organized as folows:
#### 1. [Preprocessing Data and Loading Configuration](InfoGraph.ipynb#L48)
#### 2. [Training the model](InfoGraph.ipynb#L100)
#### 3. [Evaluating the model](InfoGraph.ipynb#L206)

## 1. Preprocessing Data and Loading Configuration 
#### First, we load the configuration from yml file and the dataset. 
#### For easy usage, we conduct experiments to search for the best parameter across three datasets and find the proper value of parameters such that the performance of implemented InfoGraph is similar to the value reported in the paper.

In [1]:
import torch
print(torch.__version__)

1.12.1


In [6]:
import torch 
from src.augment import RandomMask, RandomDropEdge, RandomDropNode, AugmentSubgraph, AugmentorList
from src.methods import SugrlMLP, SugrlGCN
from src.methods import SUGRL
from src.trainer import SimpleTrainer
from torch_geometric.loader import DataLoader
import torch_geometric.transforms as T
from src.transforms import NormalizeFeatures, GCNNorm, Edge2Adj, Compose
from src.datasets import Planetoid, Amazon, WikiCS,Coauthor
from src.utils.create_data import create_masks
from src.evaluation import LogisticRegression
import yaml
from src.utils.add_adj import add_adj_t
from sklearn.impute import SimpleImputer
import os

In [7]:
# load the configuration file
# config = yaml.safe_load(open('./configuration/sugrl_wikics.yml', 'r', encoding='utf-8').read())
config = yaml.safe_load(open("./configuration/sugrl_amazon.yml", 'r', encoding='utf-8').read())
# config = yaml.safe_load(open("./configuration/sugrl_coauthor.yml", 'r', encoding='utf-8').read())
# config = yaml.safe_load(open("./configuration/sugrl_cora.yml", 'r', encoding='utf-8').read())
print(config)
data_name = config['dataset']
torch.manual_seed(0)
# np.random.seed(config.torch_seed)
# device = torch.device("cuda:{}".format(config.gpu_idx) if torch.cuda.is_available() and config.use_cuda else "cpu")

# -------------------- Data --------------------
pre_transforms = Compose([NormalizeFeatures(ord=1), Edge2Adj(norm=GCNNorm(add_self_loops=1))])
data_name = config['dataset']

current_folder = os.path.abspath('')
# path = os.path.join(current_folder, config.dataset.root, config.dataset.name)

if data_name=="cora":
    dataset = Planetoid(root="pyg_data", name="cora", pre_transform=pre_transforms)
if data_name=="photo": #92.9267
    dataset = Amazon(root="pyg_data", name="photo", pre_transform=pre_transforms) 
elif data_name=="coauthor": # 92.0973
    dataset = Coauthor(root="pyg_data", name='cs', transform=pre_transforms)
elif data_name=="wikics": #82.0109
    dataset = WikiCS(root="pyg_data", transform=T.NormalizeFeatures())
    dataset = add_adj_t(dataset)
    nan_mask = torch.isnan(dataset[0].x)
    imputer = SimpleImputer()
    dataset[0].x = torch.tensor(imputer.fit_transform(dataset[0].x))


# dataset = Amazon(root="pyg_data", name="photo", pre_transform=pre_transforms)
data_loader = DataLoader(dataset)
data = dataset.data

{'dataset': 'photo', 'black_list': [4, 5, 6], 'lr': 0.01, 'out_heads': 1, 'task_type': 'Node_Transductive', 'val_interval': 1, 'num_hidden_features': 8, 'epochs': 1000, 'to_undirected_at_neg': True, 'w_loss1': 100, 'w_loss2': 100, 'w_loss3': 1, 'margin1': 0.9, 'margin2': 0.9, 'dim': 128, 'cfg': [512, 128], 'NewATop': 0, 'dropout': 0.1, 'NN': 1, 'num1': 200, 'wd': 0.0, 'weight_decay': 0.0001}


## 2. Training the Model
#### In the second step, we first initialize the parameters of InfoGraph. The backbone of the encoder is Graph Isomorphism Network (GIN), while InfoGraph adopts the idea of Deep InfoMax as one major loss term. 
#### You may replace the encoder with the user-defined encoder. Please refer to the framework of the encoder in [infograph.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/methods/infograph.py#L117). Keep in mind that the encoder consists of class initialization, forward function, and get_embs() function.

In [8]:
# ------------------- Method -----------------
encoder_1 = SugrlMLP(in_channels=data.x.shape[1])
encoder_2 = SugrlGCN(in_channels=data.x.shape[1])
method = SUGRL(encoder=[encoder_1,encoder_2],data = data, config=config,device="cuda:0")

  i = torch.LongTensor([self.data.edge_index[0].numpy(), self.data.edge_index[1].numpy()])


#### We train the model by calling the trainer.train() function.

In [9]:
trainer = SimpleTrainer(method=method, data_loader=data_loader, device="cuda:0")
trainer.train()

  index = torch.range(0, len(lable) - 1)[(lable == j).squeeze()]


Epoch 0: loss: 199.5582, time: 0.8599s
Epoch 1: loss: 196.9003, time: 0.2951s
Epoch 2: loss: 194.3605, time: 0.2820s
Epoch 3: loss: 191.9739, time: 0.3660s
Epoch 4: loss: 189.7443, time: 0.2194s
Epoch 5: loss: 187.6700, time: 0.3041s
Epoch 6: loss: 185.7771, time: 0.2770s
Epoch 7: loss: 184.0526, time: 0.2905s
Epoch 8: loss: 182.5391, time: 0.2973s
Epoch 9: loss: 181.2900, time: 0.2904s
Epoch 10: loss: 180.4335, time: 0.3705s
Epoch 11: loss: 179.9933, time: 0.3128s
Epoch 12: loss: 179.9054, time: 0.3098s
Epoch 13: loss: 180.0424, time: 0.3034s
Epoch 14: loss: 180.2354, time: 0.2955s
Epoch 15: loss: 180.3994, time: 0.3111s
Epoch 16: loss: 180.4191, time: 0.3103s
Epoch 17: loss: 180.3546, time: 0.2991s
Epoch 18: loss: 180.1464, time: 0.3196s
Epoch 19: loss: 179.8667, time: 0.2801s
Epoch 20: loss: 179.4800, time: 0.3580s
Epoch 21: loss: 179.0532, time: 0.2956s
Epoch 22: loss: 178.6348, time: 0.3859s
Epoch 23: loss: 178.1995, time: 0.3008s
Epoch 24: loss: 177.8271, time: 0.2835s
Epoch 25: 

## 3. Evaluating the performance of InfoGraph
#### In the last step, we evaluate the performance of Merit. We first get the embedding by calling method.get_embs() function and then use logistic regression to evaluate its performance. 
#### The more choice of classifiers can be found in [classifier.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/classifier.py), including SVM, RandomForest, etc. 
#### Besides, other evaluation methods in an unsupervised setting could be found in [cluster.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/cluster.py) or [sim_search.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/sim_search.py), including K-means method or similarity search.

In [None]:
# ------------------ Evaluator -------------------
data_pyg = dataset.data.to(method.device)
embs = method.get_embs(data_pyg.x, data_pyg.adj_t).detach()
lg = LogisticRegression(lr=0.001, weight_decay=0, max_iter=3000, n_run=50, device="cuda")
create_masks(data=data_pyg.cpu())
lg(embs=embs, dataset=data_pyg)