# Experiments on Graph-Generative-Models
In this notebook, we aim to evluate the performance of "GDSS" proposed in "Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations" (https://arxiv.org/pdf/2202.02514.pdf). The baseline model is tested on 3 datasets (Grid, Protein, 3D Point Cloud) and measured under 4 metrics (degree, clustering, orbit, spectral).

It should be noted that we adopt the same datasets presets as in "Efficient Graph Generation with Graph Recurrent Attention Networks" (https://arxiv.org/pdf/1910.00760.pdf), where:
- Grid: 100 graphs are generated with $100\leq |V| \leq 400$;
- Protein: 918 graphs are generated with $100\leq |V| \leq 500$;
- 3D Point-Cloud (FirstMM-DB): 41 graphs are generated with $\bar{|V|} > 1000$

Following the experimental setting as in "GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models" (https://arxiv.org/abs/1802.08773), we conduct a 80\%-20\% split of the graph samples in each dataset. Then we generate the same size of graph samples as the test dataset and harness the maximum mean discrepancy (MMD) to evaluate the generative graph distribution.

### Experiment on GDSS
Here we immigrate the original terminal-executable GDSS codes into the notebook.

##### Change current directory.

In [2]:
import os
path = "./GDSS/"
os.chdir(path)

##### Install dependencies

In [2]:
!pip install -r requirements.txt
!conda install -c conda-forge rdkit=2020.09.1.0
!yes | pip install git+https://github.com/fabriziocosta/EDeN.git --user!

Defaulting to user installation because normal site-packages is not writeable
Collecting molsets
  Using cached molsets-0.3.1-py3-none-any.whl (51.6 MB)
Collecting sklearn
  Using cached sklearn-0.0.tar.gz (1.1 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting pandas==1.1.5
  Using cached pandas-1.1.5-cp36-cp36m-manylinux1_x86_64.whl (9.5 MB)
Collecting easydict==1.9
  Using cached easydict-1.9.tar.gz (6.4 kB)
  Preparing metadata (setup.py) ... [?25ldone
[31mERROR: Could not find a version that satisfies the requirement kiwisolver==1.3.2 (from versions: 0.1.0, 0.1.1, 0.1.2, 0.1.3, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.3.0, 1.3.1)[0m
[31mERROR: No matching distribution found for kiwisolver==1.3.2[0m
[?25hCollecting package metadata (current_repodata.json): - ^C
\ 
Usage:   
  pip install [options] <requirement specifier> [package-index-options] ...
  pip install [options] -r <requirements file> [package-index-options] ...
  pip install [options] [-e] <vcs project url> .

##### Assign dataset and seed

In [3]:
dataset = 'grid'
seed = 42

##### Generate dataset

In [4]:
!python data/data_generators.py --dataset $dataset

0 289 544
1 247 462
2 247 462
3 342 647
4 140 256
5 192 356
6 169 312
7 190 351
8 154 283
9 180 333
10 255 478
11 180 333
12 224 418
13 198 367
14 198 367
15 247 462
16 130 237
17 342 647
18 289 544
19 216 402
20 342 647
21 180 332
22 192 356
23 169 312
24 210 391
25 168 310
26 120 218
27 168 310
28 306 577
29 270 507
30 221 412
31 169 312
32 240 449
33 204 379
34 110 199
35 234 437
36 170 313
37 120 218
38 180 333
39 224 418
40 130 237
41 180 333
42 323 610
43 285 536
44 198 367
45 182 337
46 306 577
47 168 310
48 204 379
49 190 351
50 266 499
51 288 542
52 288 542
53 255 478
54 240 449
55 110 199
56 190 351
57 198 367
58 143 262
59 168 310
60 165 304
61 160 294
62 120 218
63 154 283
64 323 610
65 165 304
66 225 420
67 132 241
68 170 313
69 288 542
70 224 418
71 198 367
72 110 199
73 216 402
74 247 462
75 160 294
76 132 241
77 238 445
78 221 412
79 192 356
80 180 333
81 156 287
82 285 536
83 195 362
84 192 356
85 238 445
86 190 351
87 240 449
88 306 577
89 234 437
90 234 437
91 130 23

##### Decide which metric to be used

In [4]:
metric_selection = 'EMD'

if metric_selection == 'EMD':
    from sampler import Sampler, Sampler_mol
    from evaluation.stats import eval_graph_list
    from evaluation.mmd import gaussian, gaussian_emd
else:
    from sampler_new import Sampler, Sampler_mol
    from evaluation.stats_new import eval_graph_list
    import evaluation.mmd_new

##### Train the GDSS model

In [None]:
import torch
import argparse
import time
from parsers.config import get_config
from trainer import Trainer

os.environ["CUDA_VISIBLE_DEVICES"] = "6,7"
torch.cuda.empty_cache()

ts = time.strftime('%b%d-%H:%M:%S', time.gmtime())
config = get_config(dataset, seed)
trainer = Trainer(config) 
ckpt = trainer.train(ts)
if 'sample' in config.keys():
    config.ckpt = ckpt
    sampler = Sampler(config) 
    sampler.sample()

##### Generate new graphs by the trained GDSS model

In [7]:
import torch
import argparse
import time
from parsers.config import get_config
from trainer import Trainer

os.environ["CUDA_VISIBLE_DEVICES"] = "6,7"
torch.cuda.empty_cache()

config = get_config(dataset, seed)
ckpt = 'Sep07-17:14:31'
ckpt = 'Sep07-21:43:14'
ckpt = 'grid_5000'
config.ckpt = ckpt
if dataset in ['QM9', 'ZINC250k']:
    sampler = Sampler_mol(config)
else:
    sampler = Sampler(config) 
sampler.sample()

./checkpoints/grid/Sep07-21:43:14.pth loaded
----------------------------------------------------------------------------------------------------
Make Directory grid/test in Logs
(Reverse)+(Langevin): eps=0.0001 denoise=True ema=True || snr=0.1 seps=0.7 n_steps=1 
----------------------------------------------------------------------------------------------------
GEN SEED: 13


 50%|█████     | 1/2 [05:21<05:21, 321.61s/it]

 
Round 0 : 321.55s


100%|██████████| 2/2 [10:43<00:00, 321.89s/it]

 
Round 1 : 322.11s





[91mdegree   [0m : [94m0.000000[0m
[91mcluster  [0m : [94m0.000000[0m
[91morbit    [0m : [94m1.000000[0m
[91mspectral [0m : [94m0.000000[0m


##### Load and calculate the metrics

In [5]:
import pickle
import math

from utils.logger import Logger, set_log, start_log, train_log, sample_log, check_log
from data.data_generators import load_dataset

save_dir = './samples/pkl/grid/test/' + 'grid_5000.pkl'
with open(save_dir, 'rb') as f:
    gen_graph_list = pickle.load(f)

test_split = 0.2

graph_list = load_dataset(data_dir='./data', file_name='grid')
test_size = int(test_split * len(graph_list))
train_graph_list, test_graph_list = graph_list[test_size:], graph_list[:test_size]
methods = ['degree', 'cluster', 'orbit', 'spectral'] 
kernels = {}
if metric_selection == 'EMD':
    kernels = {'degree':gaussian_emd, 
                'cluster':gaussian_emd, 
                'orbit':gaussian,
                'spectral':gaussian_emd}
    result_dict = eval_graph_list(test_graph_list, gen_graph_list, methods, kernels)
else:
    result_dict = eval_graph_list(test_graph_list, gen_graph_list)