## Example of GraphCL

In [1]:
from sslgraph.utils import Encoder, get_dataset
from sslgraph.utils.eval_graph import EvalSemisupevised, EvalUnsupevised
from sslgraph.contrastive.model import GraphCL

### 1. Semi-supervised learning on NCI1

#### Load dataset

In this example, we evaluate model on NCI1 dataset in the semi-supervised setting.

In [2]:
dataset, dataset_pretrain = get_dataset('NCI1', task='semisupervised')
feat_dim = dataset[0].x.shape[1]
embed_dim = 128

#### Define your encoder and contrastive model (GraphCL)

For semi-supervised setting, GraphCL uses ResGCN. 

Available augmentation includes: dropN, maskN, permE, subgraph, random[2-4].

In [3]:
encoder = Encoder(feat_dim, embed_dim, n_layers=3, gnn='resgcn')
graphcl = GraphCL(embed_dim, aug_1='subgraph', aug_2='subgraph')

#### Define evaluatior instance

In this example, we use a label rate of 1%.

To setup configurations (num of epochs, learning rates, etc. for pretraining and finetuning), run


`evaluator.setup_train_config(batch_size = 128,
    p_optim = 'Adam', p_lr = 0.0001, p_weight_decay = 0, p_epoch = 100,
    f_optim = 'Adam', f_lr = 0.001, f_weight_decay = 0, f_epoch = 100)`


In [4]:
evaluator = EvalSemisupevised(dataset, dataset_pretrain, label_rate=0.01)

#### Perform evaluation

You can also perform evaluation with grid search on pre-training epoch and
learning rate by running
``
evaluator.grid_search(learning_model=graphcl, encoder=encoder, 
    p_lr_lst=[0.1,0.01,0.001,0.0001], p_epoch_lst=[20,40,60,80,100])
``

In [5]:
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 100: 100%|██████████| 100/100 [14:51<00:00,  8.92s/it, loss=2.518394]
Fold 1, finetuning: 100%|██████████| 100/100 [00:10<00:00,  9.64it/s, acc=0.6399, val_loss=3.3378]
Fold 2, finetuning: 100%|██████████| 100/100 [00:11<00:00,  8.44it/s, acc=0.6229, val_loss=11.9820]
Fold 3, finetuning: 100%|██████████| 100/100 [00:11<00:00,  9.00it/s, acc=0.5596, val_loss=2.5892]
Fold 4, finetuning: 100%|██████████| 100/100 [00:11<00:00,  8.81it/s, acc=0.6010, val_loss=3.5205]
Fold 5, finetuning: 100%|██████████| 100/100 [00:11<00:00,  8.66it/s, acc=0.6107, val_loss=4.9785]
Fold 6, finetuning: 100%|██████████| 100/100 [00:11<00:00,  8.41it/s, acc=0.6691, val_loss=1.5976]
Fold 7, finetuning: 100%|██████████| 100/100 [00:11<00:00,  8.81it/s, acc=0.5937, val_loss=2.1159]
Fold 8, finetuning: 100%|██████████| 100/100 [00:11<00:00,  8.74it/s, acc=0.6058, val_loss=4.4338]
Fold 9, finetuning: 100%|██████████| 100/100 [00:11<00:00,  8.86it/s, acc=0.6326, val_loss=2.1934]
Fold 10, finetuning

(0.6326034069061279, 0.046801406890153885)

To reproduce results in the paper, you may want to perform grid search and run evaluation for 5 times and take the average.

#### Another example with a label rate of 10%.

In [6]:
encoder = Encoder(feat_dim, embed_dim, n_layers=3, gnn='resgcn')
graphcl = GraphCL(embed_dim, aug_1='random2', aug_2='random2')
evaluator = EvalSemisupevised(dataset, dataset_pretrain, label_rate=0.1)
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 100: 100%|██████████| 100/100 [10:36<00:00,  6.37s/it, loss=1.778023]
Fold 1, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.26it/s, acc=0.7737, val_loss=0.7953]
Fold 2, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.49it/s, acc=0.7397, val_loss=1.5799]
Fold 3, finetuning: 100%|██████████| 100/100 [00:16<00:00,  6.04it/s, acc=0.7251, val_loss=1.4510]
Fold 4, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.63it/s, acc=0.7567, val_loss=1.1257]
Fold 5, finetuning: 100%|██████████| 100/100 [00:14<00:00,  6.82it/s, acc=0.7518, val_loss=1.2485]
Fold 6, finetuning: 100%|██████████| 100/100 [00:16<00:00,  6.14it/s, acc=0.7324, val_loss=1.3859]
Fold 7, finetuning: 100%|██████████| 100/100 [00:14<00:00,  6.79it/s, acc=0.7153, val_loss=1.6022]
Fold 8, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.41it/s, acc=0.7251, val_loss=1.5585]
Fold 9, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.43it/s, acc=0.7859, val_loss=1.1088]
Fold 10, finetuning:

(0.7440389394760132, 0.031175630167126656)

### 2. Unsupervised representation learning

#### Load dataset

In this example, we evaluate model on MUTAG dataset in the unsupervised setting.

In [7]:
dataset = get_dataset('MUTAG', task='unsupervised')

#### Define your encoder and contrastive model (GraphCL)

For unsupervised setting, GraphCL uses GIN with jumping knowledge (with output_dim = hidden_dim * n_layers). 

Available augmentation includes: dropN, maskN, permE, subgraph, random[2-4].

In [8]:
embed_dim = 128
encoder = Encoder(feat_dim=dataset[0].x.shape[1], hidden_dim=embed_dim, n_layers=3, gnn='gin')
graphcl = GraphCL(embed_dim*3, aug_1=None, aug_2='random2')

#### Perform evaluation with grid search

In [9]:
evaluator = EvalUnsupevised(dataset, log_interval=5)
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 20: 100%|██████████| 20/20 [00:20<00:00,  1.04s/it, loss=5.231108]

Best epoch 5: acc 0.8681 +/-(0.0751)





(0.8681286549707604, 0.07512400973969227)