## Example of GraphCL

In [1]:
from dig.sslgraph.utils import Encoder
from dig.sslgraph.evaluation import GraphSemisupervised, GraphUnsupervised
from dig.sslgraph.dataset import get_dataset
from dig.sslgraph.method import GraphCL

### 1. Semi-supervised learning on NCI1

#### Load dataset

In this example, we evaluate model on NCI1 dataset in the semi-supervised setting.

In [2]:
dataset, dataset_pretrain = get_dataset('NCI1', task='semisupervised')
feat_dim = dataset[0].x.shape[1]
embed_dim = 128

#### Define your encoder and contrastive model (GraphCL)

For semi-supervised setting, GraphCL uses ResGCN. 

Available augmentation includes: dropN, maskN, permE, subgraph, random[2-4].

In [3]:
encoder = Encoder(feat_dim, embed_dim, n_layers=3, gnn='resgcn')
graphcl = GraphCL(embed_dim, aug_1='subgraph', aug_2='subgraph')

#### Define evaluatior instance

In this example, we use a label rate of 1%.

To setup configurations (num of epochs, learning rates, etc. for pretraining and finetuning), run


`evaluator.setup_train_config(batch_size = 128,
    p_optim = 'Adam', p_lr = 0.0001, p_weight_decay = 0, p_epoch = 100,
    f_optim = 'Adam', f_lr = 0.001, f_weight_decay = 0, f_epoch = 100)`


In [4]:
evaluator = GraphSemisupervised(dataset, dataset_pretrain, label_rate=0.01)

#### Perform evaluation

You can also perform evaluation with grid search on pre-training epoch and
learning rate by running
``
evaluator.grid_search(learning_model=graphcl, encoder=encoder, 
    p_lr_lst=[0.1,0.01,0.001,0.0001], p_epoch_lst=[20,40,60,80,100])
``

In [5]:
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 100: 100%|██████████| 100/100 [18:28<00:00, 11.08s/it, loss=2.447482]
Fold 1, finetuning: 100%|██████████| 100/100 [00:12<00:00,  8.16it/s, acc=0.6399, val_loss=2.5831]
Fold 2, finetuning: 100%|██████████| 100/100 [00:12<00:00,  7.90it/s, acc=0.6326, val_loss=12.9722]
Fold 3, finetuning: 100%|██████████| 100/100 [00:12<00:00,  8.01it/s, acc=0.5718, val_loss=2.3225]
Fold 4, finetuning: 100%|██████████| 100/100 [00:12<00:00,  8.08it/s, acc=0.6277, val_loss=2.9193]
Fold 5, finetuning: 100%|██████████| 100/100 [00:12<00:00,  8.19it/s, acc=0.6229, val_loss=14.4159]
Fold 6, finetuning: 100%|██████████| 100/100 [00:12<00:00,  7.81it/s, acc=0.6594, val_loss=1.9039]
Fold 7, finetuning: 100%|██████████| 100/100 [00:12<00:00,  8.01it/s, acc=0.5937, val_loss=2.8002]
Fold 8, finetuning: 100%|██████████| 100/100 [00:11<00:00,  8.37it/s, acc=0.6034, val_loss=3.4422]
Fold 9, finetuning: 100%|██████████| 100/100 [00:12<00:00,  7.89it/s, acc=0.6180, val_loss=2.2449]
Fold 10, finetunin

(0.625547468662262, 0.04200868681073189)

To reproduce results in the paper, you may want to perform grid search and run evaluation for 5 times and take the average.

#### Another example with a label rate of 10%.

In [6]:
encoder = Encoder(feat_dim, embed_dim, n_layers=3, gnn='resgcn')
graphcl = GraphCL(embed_dim, aug_1='random2', aug_2='random2')
evaluator = GraphSemisupervised(dataset, dataset_pretrain, label_rate=0.1)
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 100: 100%|██████████| 100/100 [13:31<00:00,  8.12s/it, loss=2.185739]
Fold 1, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.47it/s, acc=0.7859, val_loss=0.9314]
Fold 2, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.54it/s, acc=0.7348, val_loss=1.5867]
Fold 3, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.58it/s, acc=0.7226, val_loss=1.3225]
Fold 4, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.27it/s, acc=0.7178, val_loss=1.3762]
Fold 5, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.47it/s, acc=0.7445, val_loss=1.2206]
Fold 6, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.36it/s, acc=0.7299, val_loss=1.3135]
Fold 7, finetuning: 100%|██████████| 100/100 [00:16<00:00,  6.09it/s, acc=0.7056, val_loss=1.6646]
Fold 8, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.63it/s, acc=0.6521, val_loss=1.3334]
Fold 9, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.36it/s, acc=0.7591, val_loss=1.3038]
Fold 10, finetuning:

(0.7469586133956909, 0.02797759510576725)

### 2. Unsupervised representation learning

#### Load dataset

In this example, we evaluate model on MUTAG dataset in the unsupervised setting.

In [7]:
dataset = get_dataset('MUTAG', task='unsupervised')

#### Define your encoder and contrastive model (GraphCL)

For unsupervised setting, GraphCL uses GIN with jumping knowledge (with output_dim = hidden_dim * n_layers). 

Available augmentation includes: dropN, maskN, permE, subgraph, random[2-4].

In [8]:
embed_dim = 32
encoder = Encoder(feat_dim=dataset[0].x.shape[1], hidden_dim=embed_dim, n_layers=3, gnn='gin', bn=True)
graphcl = GraphCL(embed_dim*3, aug_1=None, aug_2='random2', tau=0.2)
evaluator = GraphUnsupervised(dataset, log_interval=10)
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 20: 100%|██████████| 20/20 [00:08<00:00,  2.40it/s, loss=4.617538]

Best epoch 10: acc 0.8886 +/-(0.0685)





(0.8885964912280702, 0.06845478250921638)

#### NCI1 dataset

In [9]:
dataset = get_dataset('NCI1', task='unsupervised', feat_str='')
embed_dim = 32
encoder = Encoder(feat_dim=dataset[0].x.shape[1], hidden_dim=embed_dim, n_layers=3, gnn='gin', bn=True)
graphcl = GraphCL(embed_dim*3, aug_1=None, aug_2='random2', tau=0.2)

evaluator = GraphUnsupervised(dataset, log_interval=10)
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 20: 100%|██████████| 20/20 [21:38<00:00, 64.94s/it, loss=1.007827] 

Best epoch 10: acc 0.7779 +/-(0.0116)





(0.7778588807785889, 0.011586664337707655)

#### RDT-B dataset

In [10]:
dataset = get_dataset('REDDIT-BINARY', task='unsupervised', feat_str='')
embed_dim = 32
encoder = Encoder(feat_dim=dataset[0].x.shape[1], hidden_dim=embed_dim, n_layers=3, gnn='gin', bn=True)
graphcl = GraphCL(embed_dim*3, aug_1=None, aug_2='random2', tau=0.2)

evaluator = GraphUnsupervised(dataset, log_interval=10)
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 20: 100%|██████████| 20/20 [14:57<00:00, 44.85s/it, loss=4.204421]

Best epoch 20: acc 0.8970 +/-(0.0247)





(0.897, 0.024743124746527536)