# Example of graph injection attack using GRB

GRB provides a unified evaluation scenario for fair comparisons between attacks and defenses. The scenario is **Black-box, Evasion, Inductive, Injection**. Take the case of a citation-graph classification system for example. The platform collects labeled data from previous papers and trains a GNN model. When a batch of new papers are submitted, it updates the graph and uses the trained model to predict labels for them.

* **Black-box**: Both the attacker and the defender have no knowledge about the applied methods each other uses.
* **Evasion**: GNNs are already trained in trusted data (e.g. authenticated users), which are untouched by the attackers but might have natural noises. Thus, attacks will only happen during the inference phase.
* **Inductive**: GNNs are used to classify unseen data (e.g. new users), i.e. validation or test data are unseen during training, which requires GNNs to generalize to out of distribution data.
* **Injection**: The attackers can only inject new nodes but not modify the target nodes directly. Since it is usually hard to hack into users' accounts and modify their profiles. However, it is easier to create fake accounts and connect them to existing users.

In [1]:
import os
import torch
import grb.utils as utils

## Load Dataset

GRB datasets are named by the prefix *grb-*. There are four *mode* ('easy', 'medium', 'hard', 'full') for test set, representing different average degrees of test nodes, thus different difficulty for attacking them. The node features are processed by *arctan* normalization (first standardization then arctan function), which makes node features fall in the same scale.

In [2]:
from grb.dataset import Dataset

dataset_name = 'grb-cora'
dataset = Dataset(name=dataset_name, 
                  data_dir="../data/",
                  mode='full',
                  feat_norm='arctan')

Dataset 'grb-cora' loaded.
    Number of nodes: 2680
    Number of edges: 5148
    Number of features: 302
    Number of classes: 7
    Number of train samples: 1608
    Number of val samples: 268
    Number of test samples: 804
    Dataset mode: full
    Feature range: [-0.9406, 0.9430]


In [3]:
adj = dataset.adj
features = dataset.features
labels = dataset.labels
num_features = dataset.num_features
num_classes = dataset.num_classes
test_mask = dataset.test_mask

## Graph Injection Attack

For graph injection attack under the black box setting, we need to first train a surrogate model, then transfer the generated attack nodes to a target model. Note that the attacker doesn't have any information of the target model, neither the model architecture nor the parameters. Here is an example of training GCN as the surrogate model, and transfer to other models. 

### Train surrogate model

In [4]:
from grb.model.torch import GCN
from grb.utils.normalize import GCNAdjNorm

model_name = "gcn"
model_sur = GCN(in_features=dataset.num_features,
                out_features=dataset.num_classes,
                hidden_features=64, 
                n_layers=2,
                adj_norm_func=GCNAdjNorm,
                layer_norm=False,
                residual=False,
                dropout=0.5)
print(model_sur)

GCN(
  (layers): ModuleList(
    (0): GCNConv(
      (linear): Linear(in_features=302, out_features=64, bias=True)
      (dropout): Dropout(p=0.5, inplace=False)
    )
    (1): GCNConv(
      (linear): Linear(in_features=64, out_features=7, bias=True)
    )
  )
)


In [5]:
save_dir = "./saved_modes/{}/{}".format(dataset_name, model_name)
save_name = "model_sur.pt"
device = "cuda:0"
feat_norm = None
train_mode = "inductive"  # "transductive"

In [6]:
from grb.utils.trainer import Trainer

trainer = Trainer(dataset=dataset, 
                  optimizer=torch.optim.Adam(model_sur.parameters(), lr=0.01),
                  loss=torch.nn.functional.cross_entropy,
                  lr_scheduler=False,
                  early_stop=True,
                  early_stop_patience=500,
                  feat_norm=feat_norm,
                  device=device)

In [7]:
trainer.train(model=model_sur, 
              n_epoch=2000,
              eval_every=1,
              save_after=0,
              save_dir=save_dir,
              save_name=save_name,
              train_mode=train_mode,
              verbose=False)

  0%|          | 0/2000 [00:00<?, ?it/s]

Training early stopped. Best validation score: 0.8358


In [8]:
# by trainer
test_score = trainer.evaluate(model_sur, dataset.test_mask)
print("Test score of surrogate model: {:.4f}".format(test_score))

Test score of surrogate model: 0.8420


### Injection Attack

**Rules and constraints for attackers**: they have knowledge about the entire graph (including all nodes, edges and labels, excluding labels of the test nodes to attack), but do NOT have knowledge about the target model or the defense mechanism; they are allowed to inject a limited number of new nodes with limited edges, but are NOT allowed to modify the original graph; they are allowed to generate features of injected nodes as long as they remain unnoticeable by defenders (e.g. nodes with features that exceed the range can be easily detected); they are allowed to get the classification results from the target model through limited number of queries.

In [9]:
from grb.attack import PGD
from grb.utils.normalize import GCNAdjNorm

In [10]:
epsilon = 0.01
n_epoch = 1000
n_inject_max = 100
n_edge_max = 200
feat_lim_min = -1
feat_lim_max = 1

In [11]:
attack = PGD(epsilon=epsilon,
             n_epoch=n_epoch,
             n_inject_max=n_inject_max,
             n_edge_max=n_edge_max,
             feat_lim_min=feat_lim_min,
             feat_lim_max=feat_lim_max,
             device=device)

In [12]:
adj_attack, features_attack = attack.attack(model=model_sur,
                                            adj=adj,
                                            features=features,
                                            target_mask=test_mask,
                                            adj_norm_func=model_sur.adj_norm_func)

Attacking: Epoch 999, Loss: 3.25438, Surrogate test score: 0.21766


In [14]:
features_attacked = torch.cat([features.to(device), features_attack])
test_score = utils.evaluate(model_sur, 
                            features=features_attacked,
                            adj=adj_attack,
                            labels=dataset.labels,
                            adj_norm_func=model_sur.adj_norm_func,
                            mask=dataset.test_mask,
                            device=device)
print("Test score after attack for surrogate model: {:.4f}.".format(test_score))

Test score after attack for surrogate model: 0.2164.


### Transfer to target model

In [21]:
model_name = "gcn"
save_dir = "./saved_modes/{}/{}".format(dataset_name, model_name)
save_name = "model.pt"
device = "cuda:0"

In [22]:
model = torch.load(os.path.join(save_dir, save_name))
model = model.to(device)
model.eval()

GCN(
  (layers): ModuleList(
    (0): LayerNorm((302,), eps=1e-05, elementwise_affine=True)
    (1): GCNConv(
      (linear): Linear(in_features=302, out_features=64, bias=True)
      (dropout): Dropout(p=0.5, inplace=False)
    )
    (2): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
    (3): GCNConv(
      (linear): Linear(in_features=64, out_features=64, bias=True)
      (dropout): Dropout(p=0.5, inplace=False)
    )
    (4): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
    (5): GCNConv(
      (linear): Linear(in_features=64, out_features=7, bias=True)
    )
  )
)

In [23]:
test_score = utils.evaluate(model, 
                            features=features_attacked,
                            adj=adj_attack,
                            labels=dataset.labels,
                            adj_norm_func=model.adj_norm_func,
                            mask=dataset.test_mask,
                            device=device)
print("Test score after attack for target model: {:.4f}.".format(test_score))

Test score after attack for target model: 0.3955.
