<a href="https://colab.research.google.com/github/sigeisler/robustness_of_gnns_at_scale/blob/notebook/Quick_start_robustness_gnns_at_scale.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Robustness of Graph Neural Networks at Scale - Quick Start

This notebook can be run in google colab and serves as a quick introduction to the [Robustness of Graph Neural Networks at Scale](https://github.com/sigeisler/robustness_of_gnns_at_scale) repository.

## 0. Setup

First, let's get the code and install requirements.

In [None]:
# clone package repository
!git clone https://github.com/sigeisler/robustness_of_gnns_at_scale.git

# navigate to the repository
%cd robustness_of_gnns_at_scale

# install package requirements
# !pip install -r requirements.txt # colab already has these installed
!pip install -r requirements-dev.txt

# install package
# !python setup.py install
!pip install --use-feature=in-tree-build .

# build kernels
!pip install --use-feature=in-tree-build ./kernels

## 1. Training

For the training and evaluation code we decided to provide Sacred experiments which make it very easy to run the same code from the command line or on your cluster. To train or attack the models you can use the `script_execute_experiment` script and simply specify the respective configuration or execute the experiment directly passing the desired configuration in [experiments/experiment_train.py](https://github.com/sigeisler/robustness_of_gnns_at_scale/blob/main/experiments/experiment_train.py#L74).

In the example below, we train a `GCN` on `Cora ML`. 

In [3]:
from experiments import experiment_train

experiment_train.run(
    data_dir = './data',
    dataset = 'cora_ml',
    model_params = dict(
        label="Vanilla GCN", 
        model="GCN", 
        do_cache_adj_prep=True, 
        n_filters=64, 
        dropout=0.5, 
        svd_params=None, 
        jaccard_params=None, 
        gdc_params={"alpha": 0.15, "k": 64}),
    train_params = dict(
        lr=1e-2,
        weight_decay=1e-3,
        patience=300,
        max_epochs=3000),
    binary_attr = False,
    make_undirected = True,
    seed=0,
    artifact_dir = 'cache',
    model_storage_type = 'demo',
    ppr_cache_params = dict(),
    device = 0,
    data_device = 0,
    display_steps = 100,
    debug_level = "info"     
)

2022-04-11 15:31:58 (INFO): {'dataset': 'cora_ml', 'model_params': {'label': 'Vanilla GCN', 'model': 'GCN', 'do_cache_adj_prep': True, 'n_filters': 64, 'dropout': 0.5, 'svd_params': None, 'jaccard_params': None, 'gdc_params': {'alpha': 0.15, 'k': 64}}, 'train_params': {'lr': 0.01, 'weight_decay': 0.001, 'patience': 300, 'max_epochs': 3000}, 'binary_attr': False, 'make_undirected': True, 'seed': 0, 'artifact_dir': 'cache', 'model_storage_type': 'demo', 'ppr_cache_params': {}, 'device': 0, 'display_steps': 100, 'data_device': 0}
2022-04-11 15:32:01 (INFO): Training set size: 140
2022-04-11 15:32:01 (INFO): Validation set size: 140
2022-04-11 15:32:01 (INFO): Test set size: 2530
2022-04-11 15:32:01 (INFO): Memory Usage after loading the dataset:
2022-04-11 15:32:01 (INFO): 3.087627410888672


Training...:   0%|          | 0/3000 [00:00<?, ?it/s]

2022-04-11 15:32:01 (INFO): 
Epoch    0: loss_train: 1.94546, loss_val: 1.94581, acc_train: 0.12857, acc_val: 0.15714 
2022-04-11 15:32:02 (INFO): 
Epoch  100: loss_train: 0.08831, loss_val: 0.39982, acc_train: 1.00000, acc_val: 0.90714 
2022-04-11 15:32:03 (INFO): 
Epoch  200: loss_train: 0.07764, loss_val: 0.37613, acc_train: 1.00000, acc_val: 0.90000 
2022-04-11 15:32:04 (INFO): 
Epoch  300: loss_train: 0.07414, loss_val: 0.39645, acc_train: 1.00000, acc_val: 0.88571 
2022-04-11 15:32:05 (INFO): 
Epoch  400: loss_train: 0.06669, loss_val: 0.37830, acc_train: 1.00000, acc_val: 0.91429 
2022-04-11 15:32:06 (INFO): 
Epoch  500: loss_train: 0.07280, loss_val: 0.38037, acc_train: 1.00000, acc_val: 0.90714 
2022-04-11 15:32:06 (INFO): Test accuracy is 0.8185770511627197 with seed 0


{'accuracy': 0.8185770511627197,
 'model_path': 'cache/demo/demo_1.pt',
 'trace_train': [1.9454624652862549,
  1.8765193223953247,
  1.7774474620819092,
  1.6563345193862915,
  1.536136507987976,
  1.4086090326309204,
  1.2841672897338867,
  1.1469098329544067,
  1.0158731937408447,
  0.910151481628418,
  0.7819910049438477,
  0.6642155051231384,
  0.6038099527359009,
  0.545272946357727,
  0.4568222463130951,
  0.4078209102153778,
  0.3527106046676636,
  0.34389984607696533,
  0.28623950481414795,
  0.26793235540390015,
  0.25208890438079834,
  0.2332736700773239,
  0.22999583184719086,
  0.20495042204856873,
  0.19868922233581543,
  0.19470176100730896,
  0.18154895305633545,
  0.1853218674659729,
  0.18129600584506989,
  0.1832646280527115,
  0.1607210785150528,
  0.1759190857410431,
  0.1606225222349167,
  0.17753200232982635,
  0.17196036875247955,
  0.17054834961891174,
  0.17432887852191925,
  0.16151244938373566,
  0.17313575744628906,
  0.15916259586811066,
  0.159146785736083

As we can see, the model achieved an accuracy of 0.8186.

## 2. Evaluation

For evaluation, we use the locally stored models. Similarly to training, we provide a script that runs the attacks for different seeds for all pretrained models. For all experiments, please check out the [config](https://github.com/sigeisler/robustness_of_gnns_at_scale/tree/main/config) folder.

### 2.1 Local PR-BCD Attack
We provide an example for a `local PR-BCD` attack on the `Vanilla GCN` model trained previously. First, we create the `config` file for the attack:

In [8]:
import yaml

demo_localprbcd_config = {
    'seml': {'name': 'rgnn_at_scale_attack_evasion_local_direct', 
             'executable': 'experiments/experiment_local_attack_direct.py', 
             'project_root_dir': '../..', 
             'output_dir': 'config/attack_evasion_local_direct/output'},
    'slurm': {'experiments_per_job': 4, 
              'sbatch_options': {'gres': 'gpu:1', 'mem': '16G', 'cpus-per-task': 4, 'time': '1-00:00'}}, 
    'fixed': {'data_dir': 'data/', 
              'artifact_dir': 'cache', 
              'nodes': 'None', 
              'nodes_topk': 4, 
              'attack_params.epochs': 500, 
              'attack_params.fine_tune_epochs': 100, 
              'attack_params.search_space_size': 10000, 
              'attack_params.ppr_recalc_at_end': True, 
              'attack_params.loss_type': 'Margin', 
              'device': 0, 
              'data_device': 0, 
              'binary_attr': False},
    'grid': {'epsilons': {'type': 'choice', 'options': [[0.5]]}, 
             'seed': {'type': 'choice', 'options': [0]}, 
             'dataset': {'type': 'choice', 'options': ['cora_ml']}}, 
    'localprbcd_gcn': {'fixed': {'attack': 'LocalPRBCD', 
                                 'model_label': 'Vanilla GCN', 
                                 'model_storage_type': 'demo', 
                                 'attack_params': {'lr_factor': 0.05}, 
                                 'make_undirected': True}}
}                      

with open(r'/content/robustness_of_gnns_at_scale/config/attack_evasion_local_direct/demo_localprbcd.yaml', 'w') as file:
    documents = yaml.dump(demo_localprbcd_config, file)

Now let's run the attack using this `demo_localprbcd` config:

In [9]:
!python script_execute_experiment.py --config-file 'config/attack_evasion_local_direct/demo_localprbcd.yaml'

2022-04-11 16:03:38,585 - root - DEBUG - Namespace(config_file='config/attack_evasion_local_direct/demo_localprbcd.yaml', kwargs={}, output='output')
2022-04-11 16:03:41,526 - git.cmd - DEBUG - Popen(['git', 'version'], cwd=/content/robustness_of_gnns_at_scale, universal_newlines=False, shell=None, istream=None)
2022-04-11 16:03:41,540 - git.cmd - DEBUG - Popen(['git', 'version'], cwd=/content/robustness_of_gnns_at_scale, universal_newlines=False, shell=None, istream=None)
2022-04-11 16:03:41,552 - git.cmd - DEBUG - Popen(['git', 'diff', '--cached', '--abbrev=40', '--full-index', '--raw'], cwd=/content/robustness_of_gnns_at_scale, universal_newlines=False, shell=None, istream=None)
2022-04-11 16:03:41,566 - git.cmd - DEBUG - Popen(['git', 'diff', '--abbrev=40', '--full-index', '--raw'], cwd=/content/robustness_of_gnns_at_scale, universal_newlines=False, shell=None, istream=None)
2022-04-11 16:03:41,643 - git.cmd - DEBUG - Popen(['git', 'cat-file', '--batch-check'], cwd=/content/robustn

### 2.2 PR-BCD Attack

Now let's do the same with a non-local `PR-BCD` attack:

In [10]:
demo_prbcd_config = {
    'seml': {'name': 'rgnn_at_scale_attack_evasion_global_direct', 
             'executable': 'experiments/experiment_global_attack_direct.py', 
             'project_root_dir': '../..', 
             'output_dir': 'config/attack_evasion_global_direct/output'}, 
    'slurm': {'experiments_per_job': 4, 
              'sbatch_options': {'gres': 'gpu:1', 'mem': '4G', 'cpus-per-task': 4, 'time': '1-00:00'}}, 
    'fixed': {'data_dir': 'data/', 
              'artifact_dir': 'cache', 
              'pert_adj_storage_type': 'evasion_global_adj',
              'pert_attr_storage_type': 'evasion_global_attr',
              'device': 0, 
              'data_device': 0, 
              'binary_attr': False},  
    'grid': {'epsilons': {'type': 'choice', 'options': [[0.5]]}, 
             'seed': {'type': 'choice', 'options': [0]}, 
             'dataset': {'type': 'choice', 'options': ['cora_ml']}}, 
    'prbcd_gcn': {'fixed': {'attack': 'PRBCD',
                            'model_label': 'Vanilla GCN', 
                            'model_storage_type': 'demo',
                            'attack_params': {
                              'epochs': 500,
                              'fine_tune_epochs': 100,
                              'keep_heuristic': 'WeightOnly',
                              'search_space_size': 100_000,
                              'do_synchronize': True,
                              'loss_type': 'tanhMargin'}}}
}                      

with open(r'/content/robustness_of_gnns_at_scale/config/attack_evasion_global_direct/demo_prbcd.yaml', 'w') as file:
    documents = yaml.dump(demo_prbcd_config, file)

In [11]:
!python script_execute_experiment.py --config-file 'config/attack_evasion_global_direct/demo_prbcd.yaml'

2022-04-11 16:18:03,865 - root - DEBUG - Namespace(config_file='config/attack_evasion_global_direct/demo_prbcd.yaml', kwargs={}, output='output')
2022-04-11 16:18:06,859 - git.cmd - DEBUG - Popen(['git', 'version'], cwd=/content/robustness_of_gnns_at_scale, universal_newlines=False, shell=None, istream=None)
2022-04-11 16:18:06,873 - git.cmd - DEBUG - Popen(['git', 'version'], cwd=/content/robustness_of_gnns_at_scale, universal_newlines=False, shell=None, istream=None)
2022-04-11 16:18:06,886 - git.cmd - DEBUG - Popen(['git', 'diff', '--cached', '--abbrev=40', '--full-index', '--raw'], cwd=/content/robustness_of_gnns_at_scale, universal_newlines=False, shell=None, istream=None)
2022-04-11 16:18:06,900 - git.cmd - DEBUG - Popen(['git', 'diff', '--abbrev=40', '--full-index', '--raw'], cwd=/content/robustness_of_gnns_at_scale, universal_newlines=False, shell=None, istream=None)
2022-04-11 16:18:06,986 - git.cmd - DEBUG - Popen(['git', 'cat-file', '--batch-check'], cwd=/content/robustness_