# Experiments on Graph-Generative-Models
In this notebook, we aim to evluate the performance of "GDSS" proposed in "Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations" (https://arxiv.org/pdf/2202.02514.pdf). The baseline model is tested on 3 datasets (Grid, Protein, 3D Point Cloud) and measured under 4 metrics (degree, clustering, orbit, spectral).

It should be noted that we adopt the same datasets presets as in "Efficient Graph Generation with Graph Recurrent Attention Networks" (https://arxiv.org/pdf/1910.00760.pdf), where:
- Grid: 100 graphs are generated with $100\leq |V| \leq 400$;
- Protein: 918 graphs are generated with $100\leq |V| \leq 500$;
- 3D Point-Cloud (FirstMM-DB): 41 graphs are generated with $\bar{|V|} > 1000$

Following the experimental setting as in "GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models" (https://arxiv.org/abs/1802.08773), we conduct a 80\%-20\% split of the graph samples in each dataset. Then we generate the same size of graph samples as the test dataset and harness the maximum mean discrepancy (MMD) to evaluate the generative graph distribution.

### Experiment on GDSS
Here we immigrate the original terminal-executable GDSS codes into the notebook.

##### Change current directory.

In [1]:
import os
path = "./GDSS/"
os.chdir(path)

##### Install dependencies

In [None]:
!pip install -r requirements.txt
!conda install -c conda-forge rdkit=2020.09.1.0
!yes | pip install git+https://github.com/fabriziocosta/EDeN.git --user!

##### Assign dataset and seed

In [2]:
dataset = 'grid'
dataset = 'DD'
dataset = 'FIRSTMM_DB'
seed = 42

In [5]:
!pip install GPUtil

import torch
from GPUtil import showUtilization as gpu_usage
from numba import cuda

def free_gpu_cache():
    print("Initial GPU Usage")
    gpu_usage()                             

    torch.cuda.empty_cache()

    cuda.select_device(0)
    cuda.close()
    cuda.select_device(0)

    print("GPU Usage after emptying the cache")
    gpu_usage()

free_gpu_cache()                           

Collecting GPUtil
  Downloading GPUtil-1.4.0.tar.gz (5.5 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: GPUtil
  Building wheel for GPUtil (setup.py) ... [?25ldone
[?25h  Created wheel for GPUtil: filename=GPUtil-1.4.0-py3-none-any.whl size=7394 sha256=af89a85ea8e4b0687cbd23afec93f9ee6966f69092bbd4cef728fb642ba53c2b
  Stored in directory: /nethome/hsun409/.cache/pip/wheels/6e/f8/83/534c52482d6da64622ddbf72cd93c35d2ef2881b78fd08ff0c
Successfully built GPUtil
Installing collected packages: GPUtil
Successfully installed GPUtil-1.4.0


ModuleNotFoundError: No module named 'numba'

##### Generate dataset

In [3]:
!python data/data_generators.py --dataset $dataset

Loading graph dataset: FIRSTMM_DB
56468
126038
Graphs loaded, total num: 24
FIRSTMM_DB 24
995


##### Decide which metric to be used

In [4]:
metric_selection = 'EMD'

if metric_selection == 'EMD':
    from sampler import Sampler, Sampler_mol
    from evaluation.stats import eval_graph_list
    from evaluation.mmd import gaussian, gaussian_emd
else:
    from sampler_new import Sampler, Sampler_mol
    from evaluation.stats_new import eval_graph_list
    import evaluation.mmd_new

##### Train the GDSS model

In [None]:
import torch
import argparse
import time
from parsers.config import get_config
from trainer import Trainer

os.environ["CUDA_VISIBLE_DEVICES"] = "6,7"
torch.cuda.empty_cache()

ts = time.strftime('%b%d-%H:%M:%S', time.gmtime())
config = get_config(dataset, seed)
trainer = Trainer(config) 
ckpt = trainer.train(ts)
if 'sample' in config.keys():
    config.ckpt = ckpt
    sampler = Sampler(config) 
    sampler.sample()

##### Generate new graphs by the trained GDSS model

In [None]:
import torch
import argparse
import time
from parsers.config import get_config
from trainer import Trainer

os.environ["CUDA_VISIBLE_DEVICES"] = "2,5,6,7"
torch.cuda.empty_cache()

config = get_config(dataset, seed)
ckpt = 'grid_5000'
ckpt = 'DD_1500'
config.ckpt = ckpt
if dataset in ['QM9', 'ZINC250k']:
    sampler = Sampler_mol(config)
else:
    sampler = Sampler(config) 
sampler.sample()

##### Load and calculate the metrics

In [6]:
import pickle
import math

from utils.logger import Logger, set_log, start_log, train_log, sample_log, check_log
from data.data_generators import load_dataset

save_dir = './samples/pkl/grid/test/' + 'grid_5000.pkl'
save_dir = './samples/pkl/DD/test/Sep09-17:57:49_500-sample.pkl'
# save_dir = './samples/pkl/DD/test/DD_1000-sample.pkl'
with open(save_dir, 'rb') as f:
    gen_graph_list = pickle.load(f)

test_split = 0.2

graph_list = load_dataset(data_dir='./data', file_name=dataset)
print('Target dataset:' + dataset)
test_size = int(test_split * len(graph_list))
train_graph_list, test_graph_list = graph_list[test_size:], graph_list[:test_size]
print('Length of testing dataset:' + str(len(test_graph_list)))
print('Length of gen dataset:' + str(len(gen_graph_list)))
methods = ['degree', 'cluster', 'orbit', 'spectral'] 
kernels = {}
if metric_selection == 'EMD':
    kernels = {'degree':gaussian_emd, 
                'cluster':gaussian_emd, 
                'orbit':gaussian,
                'spectral':gaussian_emd}
    result_dict = eval_graph_list(test_graph_list, gen_graph_list, methods, kernels)
else:
    result_dict = eval_graph_list(test_graph_list, gen_graph_list)

Target dataset:DD
Shape of testing dataset:233
Shape of gen dataset:233
[91mdegree   [0m : [94m0.46610833[0m
[91mcluster  [0m : [94m0.52482759[0m
[91morbit    [0m : [94m0.96940001[0m
[91mspectral [0m : [94m0.45129634[0m
