# 1. Example of training GNNs using AutoML

GRB provides AutoML approach for training GNNs based on [optuna](https://github.com/optuna/optuna).

In [7]:
import os
import torch
import optuna

import grb.utils as utils
from grb.dataset import Dataset
from grb.trainer.trainer import AutoTrainer

## 1.1. Load Dataset

GRB datasets are named by the prefix *grb-*. There are four *mode* ('easy', 'medium', 'hard', 'full') for test set, representing different average degrees of test nodes, thus different difficulty for attacking them. The node features are processed by *arctan* normalization (first standardization then arctan function), which makes node features fall in the same scale.

In [9]:
from grb.dataset import Dataset

dataset_name = 'grb-cora'
dataset = Dataset(name=dataset_name, 
                  data_dir="../../data/",
                  mode='full',
                  feat_norm='arctan')

Dataset 'grb-cora' loaded.
    Number of nodes: 2680
    Number of edges: 5148
    Number of features: 302
    Number of classes: 7
    Number of train samples: 1608
    Number of val samples: 268
    Number of test samples: 804
    Dataset mode: full
    Feature range: [-0.9406, 0.9430]


## 1.2. AutoML for training GNNs

### 1.2.1. Define parameter search function.

In [3]:
def params_search(trial):
    model_params = {
        "hidden_features": trial.suggest_categorical("hidden_features", 
                                                     [32, 64, 128, 256]),
        "n_layers": trial.suggest_categorical("n_layers", [2, 3, 4, 5]),
        "dropout": trial.suggest_categorical("dropout", [0.5, 0.6, 0.7, 0.8]),
    }
    other_params = {
        "lr": trial.suggest_categorical("lr", [1e-2, 1e-3, 5e-3, 1e-4]),
        "n_epoch": 2000,
        "early_stop": True, 
        "early_stop_patience": 500,
        "train_mode": "inductive",
    }
    
    return model_params, other_params

### 1.2.2. Build AutoTrainer

Example of using AutoTrainer to train GCN.

In [4]:
from grb.model.torch import GCN
from grb.evaluator import metric

autotrainer = AutoTrainer(dataset=dataset, 
                          model_class=GCN,
                          eval_metric=metric.eval_acc,
                          params_search=params_search,
                          n_trials=10,
                          n_jobs=1,
                          seed=42,
                          device="cuda:0")

In [5]:
best_score, best_params, best_score_list = autotrainer.run()

[32m[I 2021-08-17 23:58:35,275][0m A new study created in memory with name: no-name-41f50226-ab24-4def-a4e4-f53db014b6cd[0m


Use default optimizer Adam.
Use default cross-entropy loss.


  0%|          | 0/2000 [00:00<?, ?it/s]

[32m[I 2021-08-17 23:58:44,565][0m Trial 0 finished with value: 0.8022387623786926 and parameters: {'hidden_features': 32, 'n_layers': 2, 'dropout': 0.7, 'lr': 0.0001}. Best is trial 0 with value: 0.8022387623786926.[0m


Training finished. Best validation score: 0.8022
Use default optimizer Adam.
Use default cross-entropy loss.


  0%|          | 0/2000 [00:00<?, ?it/s]

[32m[I 2021-08-17 23:58:55,745][0m Trial 1 finished with value: 0.7574626803398132 and parameters: {'hidden_features': 32, 'n_layers': 5, 'dropout': 0.7, 'lr': 0.01}. Best is trial 0 with value: 0.8022387623786926.[0m


Training early stopped. Best validation score: 0.7575
Use default optimizer Adam.
Use default cross-entropy loss.


  0%|          | 0/2000 [00:00<?, ?it/s]

[32m[I 2021-08-17 23:59:09,055][0m Trial 2 finished with value: 0.7686566710472107 and parameters: {'hidden_features': 32, 'n_layers': 4, 'dropout': 0.5, 'lr': 0.0001}. Best is trial 0 with value: 0.8022387623786926.[0m


Training finished. Best validation score: 0.7687
Use default optimizer Adam.
Use default cross-entropy loss.


  0%|          | 0/2000 [00:00<?, ?it/s]

[32m[I 2021-08-17 23:59:21,431][0m Trial 3 finished with value: 0.8208954930305481 and parameters: {'hidden_features': 128, 'n_layers': 4, 'dropout': 0.6, 'lr': 0.0001}. Best is trial 3 with value: 0.8208954930305481.[0m


Training early stopped. Best validation score: 0.8209
Use default optimizer Adam.
Use default cross-entropy loss.


  0%|          | 0/2000 [00:00<?, ?it/s]

[32m[I 2021-08-17 23:59:28,096][0m Trial 4 finished with value: 0.8283581733703613 and parameters: {'hidden_features': 64, 'n_layers': 3, 'dropout': 0.6, 'lr': 0.005}. Best is trial 4 with value: 0.8283581733703613.[0m


Training early stopped. Best validation score: 0.8284
Use default optimizer Adam.
Use default cross-entropy loss.


  0%|          | 0/2000 [00:00<?, ?it/s]

[32m[I 2021-08-17 23:59:32,525][0m Trial 5 finished with value: 0.8171641826629639 and parameters: {'hidden_features': 256, 'n_layers': 5, 'dropout': 0.5, 'lr': 0.005}. Best is trial 4 with value: 0.8283581733703613.[0m


Training early stopped. Best validation score: 0.8172
Use default optimizer Adam.
Use default cross-entropy loss.


  0%|          | 0/2000 [00:00<?, ?it/s]

[32m[I 2021-08-17 23:59:35,214][0m Trial 6 finished with value: 0.8171641826629639 and parameters: {'hidden_features': 128, 'n_layers': 3, 'dropout': 0.8, 'lr': 0.01}. Best is trial 4 with value: 0.8283581733703613.[0m


Training early stopped. Best validation score: 0.8172
Use default optimizer Adam.
Use default cross-entropy loss.


  0%|          | 0/2000 [00:00<?, ?it/s]

[32m[I 2021-08-17 23:59:46,231][0m Trial 7 finished with value: 0.6194029450416565 and parameters: {'hidden_features': 32, 'n_layers': 5, 'dropout': 0.8, 'lr': 0.001}. Best is trial 4 with value: 0.8283581733703613.[0m


Training early stopped. Best validation score: 0.6194
Use default optimizer Adam.
Use default cross-entropy loss.


  0%|          | 0/2000 [00:00<?, ?it/s]

[32m[I 2021-08-17 23:59:49,686][0m Trial 8 finished with value: 0.8283581733703613 and parameters: {'hidden_features': 128, 'n_layers': 4, 'dropout': 0.6, 'lr': 0.005}. Best is trial 4 with value: 0.8283581733703613.[0m


Training early stopped. Best validation score: 0.8284
Use default optimizer Adam.
Use default cross-entropy loss.


  0%|          | 0/2000 [00:00<?, ?it/s]

[32m[I 2021-08-17 23:59:53,814][0m Trial 9 finished with value: 0.8358208537101746 and parameters: {'hidden_features': 64, 'n_layers': 2, 'dropout': 0.8, 'lr': 0.001}. Best is trial 9 with value: 0.8358208537101746.[0m


Training early stopped. Best validation score: 0.8358
{'hidden_features': 64, 'n_layers': 2, 'dropout': 0.8, 'lr': 0.001}


In [6]:
print("Best validation score: {:.4f}".format(best_score))
print("Best parameters: ", best_params)

Best parameters:  {'model_params': {'hidden_features': 64, 'n_layers': 2, 'dropout': 0.8}, 'other_params': {'lr': 0.001, 'n_epoch': 2000, 'early_stop': True, 'early_stop_patience': 500, 'train_mode': 'inductive'}}
Best validation score: 0.8358
