# Experiments on Graph-Generative-Models
In this notebook, we aim to evluate the performance of "GDSS" proposed in "Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations" (https://arxiv.org/pdf/2202.02514.pdf). The baseline model is tested on 3 datasets (Grid, Protein, 3D Point Cloud) and measured under 4 metrics (degree, clustering, orbit, spectral).

It should be noted that we adopt the same datasets presets as in "Efficient Graph Generation with Graph Recurrent Attention Networks" (https://arxiv.org/pdf/1910.00760.pdf), where:
- Grid: 100 graphs are generated with $100\leq |V| \leq 400$;
- Protein: 918 graphs are generated with $100\leq |V| \leq 500$;
- 3D Point-Cloud (FirstMM-DB): 41 graphs are generated with $\bar{|V|} > 1000$

Following the experimental setting as in "GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models" (https://arxiv.org/abs/1802.08773), we conduct a 80\%-20\% split of the graph samples in each dataset. Then we generate the same size of graph samples as the test dataset and harness the maximum mean discrepancy (MMD) to evaluate the generative graph distribution.

### Experiment on GDSS
Here we immigrate the original terminal-executable GDSS codes into the notebook.

##### Change current directory.

In [1]:
import os
path = "./GDSS/"
os.chdir(path)

##### Install dependencies

In [None]:
!pip install -r requirements.txt
!conda install -c conda-forge rdkit=2020.09.1.0
!yes | pip install git+https://github.com/fabriziocosta/EDeN.git --user!

##### Assign dataset and seed

In [3]:
dataset = 'grid'
dataset = 'DD'
seed = 42

##### Generate dataset

In [6]:
!python data/data_generators.py --dataset $dataset

Loading graph dataset: DD
334925
843046
Graphs loaded, total num: 1168
DD 1168
903


##### Decide which metric to be used

In [7]:
metric_selection = 'EMD'

if metric_selection == 'EMD':
    from sampler import Sampler, Sampler_mol
    from evaluation.stats import eval_graph_list
    from evaluation.mmd import gaussian, gaussian_emd
else:
    from sampler_new import Sampler, Sampler_mol
    from evaluation.stats_new import eval_graph_list
    import evaluation.mmd_new

##### Train the GDSS model

In [9]:
import torch
import argparse
import time
from parsers.config import get_config
from trainer import Trainer

os.environ["CUDA_VISIBLE_DEVICES"] = "6,7"
torch.cuda.empty_cache()

ts = time.strftime('%b%d-%H:%M:%S', time.gmtime())
config = get_config(dataset, seed)
trainer = Trainer(config) 
ckpt = trainer.train(ts)
if 'sample' in config.keys():
    config.ckpt = ckpt
    sampler = Sampler(config) 
    sampler.sample()

----------------------------------------------------------------------------------------------------
Make Directory DD/test in Logs
tensor(17)


NotImplementedError: max_feat_num mismatch

##### Generate new graphs by the trained GDSS model

In [7]:
import torch
import argparse
import time
from parsers.config import get_config
from trainer import Trainer

os.environ["CUDA_VISIBLE_DEVICES"] = "6,7"
torch.cuda.empty_cache()

config = get_config(dataset, seed)
ckpt = 'Sep07-17:14:31'
ckpt = 'Sep07-21:43:14'
ckpt = 'grid_5000'
config.ckpt = ckpt
if dataset in ['QM9', 'ZINC250k']:
    sampler = Sampler_mol(config)
else:
    sampler = Sampler(config) 
sampler.sample()

./checkpoints/grid/Sep07-21:43:14.pth loaded
----------------------------------------------------------------------------------------------------
Make Directory grid/test in Logs
(Reverse)+(Langevin): eps=0.0001 denoise=True ema=True || snr=0.1 seps=0.7 n_steps=1 
----------------------------------------------------------------------------------------------------
GEN SEED: 13


 50%|█████     | 1/2 [05:21<05:21, 321.61s/it]

 
Round 0 : 321.55s


100%|██████████| 2/2 [10:43<00:00, 321.89s/it]

 
Round 1 : 322.11s





[91mdegree   [0m : [94m0.000000[0m
[91mcluster  [0m : [94m0.000000[0m
[91morbit    [0m : [94m1.000000[0m
[91mspectral [0m : [94m0.000000[0m


##### Load and calculate the metrics

In [5]:
import pickle
import math

from utils.logger import Logger, set_log, start_log, train_log, sample_log, check_log
from data.data_generators import load_dataset

save_dir = './samples/pkl/grid/test/' + 'grid_5000.pkl'
with open(save_dir, 'rb') as f:
    gen_graph_list = pickle.load(f)

test_split = 0.2

graph_list = load_dataset(data_dir='./data', file_name='grid')
test_size = int(test_split * len(graph_list))
train_graph_list, test_graph_list = graph_list[test_size:], graph_list[:test_size]
methods = ['degree', 'cluster', 'orbit', 'spectral'] 
kernels = {}
if metric_selection == 'EMD':
    kernels = {'degree':gaussian_emd, 
                'cluster':gaussian_emd, 
                'orbit':gaussian,
                'spectral':gaussian_emd}
    result_dict = eval_graph_list(test_graph_list, gen_graph_list, methods, kernels)
else:
    result_dict = eval_graph_list(test_graph_list, gen_graph_list)