# HeCo Tutorial
#### This tutorial illustrates the use of HeCo algorithm [Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning](https://arxiv.org/abs/2105.09111), an self-supervised cross-view contrastive learning method on heterogeneous graphs,  which enables the two views to collaboratively supervise each other and finally learn high-level node embeddings.
#### The tutorial is organized as folows:
#### 1. Preprocessing Data and Loading Configuration
#### 2. Training the model
#### 3. Evaluating the model

## 1. Preprocessing Data and Loading Configuration 
#### First, we load the configuration from yml file and the dataset. 
#### For easy usage, we conduct experiments to search for the best parameter across four datasets and find the proper value of parameters such that the performance of implemented HeCo is similar to the value reported in the original paper.

In [None]:
from torch_geometric.loader import DataLoader
import os
from src.methods import HeCo, Sc_encoder, Mp_encoder, HeCoDBLPTransform
from src.trainer import SimpleTrainer
from src.evaluation import LogisticRegression
from src.datasets import DBLP, aminer, ACM, FreebaseMovies
from src.evaluation import LogisticRegression
from src.utils import create_data
from src.config import load_yaml


config = load_yaml('./configuration/heco_acm.yml')
# config = load_yaml('./configuration/heco_dblp.yml')
# config = load_yaml('./configuration/heco_freebase_movies.yml')
# config = load_yaml('./configuration/heco_aminer.yml')
device = config.device

# -------------------- Data --------------------
current_folder = os.path.abspath('')
path = os.path.join(current_folder, config.dataset.root, config.dataset.name)
if config.dataset.name == 'acm':
    dataset = ACM(root=path)
elif config.dataset.name == 'dblp':
    dataset = DBLP(root=path, pre_transform=HeCoDBLPTransform())
elif config.dataset.name == 'freebase_movies':
    dataset = FreebaseMovies(root=path)
elif config.dataset.name == 'aminer':
    dataset = aminer(root=path)
else:
    raise NotImplementedError

data_loader = DataLoader(dataset)

## 2. Training the Model
#### In the second step, we first initialize the parameters of HeCo. There are two encoders for HeCo, specifically encoder for meta-path view and encoder for network schema view.  
#### You may replace the any of the two encoders with the user-defined encoder. Please refer to the framework of the encoder in [heco.py](https://github.com/IDEA-ISAIL/ssl/edit/heco/src/methods/heco.py). Keep in mind that the encoders consist of class initialization and forward function. If you want to revise HeCo, make sure to include get_embs() function in your implementation.

In [None]:
# ------------------- Method -----------------
encoder1 = Mp_encoder(P=config.dataset.P, hidden_dim=config.model.hidden_dim, attn_drop=config.model.attn_drop)
encoder2 = Sc_encoder(hidden_dim=64, sample_rate=config.model.sample_rate, nei_num=config.dataset.nei_num, attn_drop=config.model.attn_drop)

feats = data_loader.dataset._data['feats']
feats_dim_list = [i.shape[1] for i in feats]
method = HeCo(encoder1=encoder1, encoder2=encoder2, feats_dim_list = feats_dim_list, feat_drop=config.model.feat_drop, tau=config.model.tau)
method.cuda()

#### We train the model by calling the trainer.train() function.

In [None]:
trainer = SimpleTrainer(method=method, data_loader=data_loader, device=device, n_epochs=config.optim.max_epoch, lr=config.optim.lr, patience=config.optim.patience)
trainer.train()

## 3. Evaluating the performance of HeCo
#### In the last step, we evaluate the performance of HeCo. We first get the embedding by calling method.get_embs() function and then use logistic regression to evaluate its performance. 
#### The more choice of classifiers can be found in [classifier.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/classifier.py), including SVM, RandomForest, etc. 
#### Besides, other evaluation methods in an unsupervised setting could be found in [cluster.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/cluster.py) or [sim_search.py](https://github.com/IDEA-ISAIL/ssl/edit/molecure/src/evaluation/sim_search.py), including K-means method or similarity search.

In [None]:
# ------------------ Evaluator -------------------
method.eval()
data_pyg = dataset._data[config.dataset.target_type].to(method.device)
embs = method.get_embs(data_loader.dataset._data['feats'], data_loader.dataset._data['mps']).detach()

lg = LogisticRegression(lr=config.classifier.base_lr, weight_decay=config.classifier.weight_decay,
                        max_iter=config.classifier.max_epoch, n_run=config.classifier.n_run, device=device)

data_pyq = create_data.create_masks(data_pyg.cpu())
lg(embs=embs, dataset=data_pyg)

In [None]:
# ------------------ Optionally Save the Embedding Result -------------------
import pickle as pkl
f = open(os.path.join(current_folder, config.dataset.name + "_embeddings.pkl"),  "wb")
pkl.dump(embs.cpu().data.numpy(), f)
f.close()