Towards Better Evaluation of GNN Expressiveness with BREC Dataset

About

This repository is the official implementation of the following paper: Towards Better Evaluation of GNN Expressiveness with BREC Dataset.

We also provide a Pypi package for simple usage. Please refer to Pypi package.

BREC is a new dataset for GNN expressiveness comparison. It addresses the limitations of previous datasets, including difficulty, granularity, and scale, by incorporating 400 pairs of various graphs in four categories (Basic, Regular, Extension, CFI). The graphs are organized pair-wise, where each pair is tested individually to return whether a GNN can distinguish them. We propose a new evaluation method, RPC (Reliable Paired Comparisons), with a contrastive training framework.

Usages

File Structure

We first introduce the general file structure of BREC:

├── Data
    └── raw
        └── brec_v3.npy    # unprocessed BREC dataset in graph6 format
├── BRECDataset_v3.py    # BREC dataset construction file
├── test_BREC.py    # Evaluation framework file
└── test_BREC_search.py    # Run test_BREC.py with 10 seeds for the final result

To test on BREC, there are four steps to follow:

Select a model and go to the corresponding directory.
Prepare dataset based on selected model requirements.
Check test_BREC.py for implementation if you want to test your own GNN.
Run test_BREC_search.py for final result. Only if no failure in reliability check for all seeds is available.

Requirements

Tested combination: Python 3.8.13 + PyTorch 1.13.1 + PyTorch_Geometric 2.2

Other required Python libraries included: numpy, networkx, loguru, etc.

For reproducing other results, please refer to the corresponding requirements for additional libraries.

Data Preparation

Data preparation requires two steps: generate the dataset and arrange it in the correct position.

Step 1: Data Generation

We provide zipped data file BREC_data_all.zip. You can unzip it for 3 data files in npy format. You can also customize dataset refering to Customize Dataset.

For most methods, only brec_v3.npy is needed. More detailed requirements on datasets can refer to corresponding implementations.

Step 2: Data Arrangement

Replace $.txt with $.npy in the corresponding Data/raw directory. For most methods, only brec_v3.txt is in Data/raw directory. Thus replacing brec_v3.txt with brec_v3.npy is enough.

Reproduce Baselines

For baseline results reproduction, please refer to the respective directories:

Baseline	Directory
NGNN	NestedGNN
DS-GNN	SUN
DSS-GNN	SUN
SUN	SUN
PPGN	ProvablyPowerfulGraphNetworks_torch
GNN-AK	GNNAsKernel
DE+NGNN	NestedGNN
KP-GNN	KP-GNN
KC-SetGNN	KCSetGNN
I$^2$-GNN	I2GNN
GSN	GSN
Graphormer	Graphormer
OSAN	OSAN
$\delta$-LGNN(SparseWL)	SparseWL
SWL	SWL
DropGNN	DropGNN
Non-GNN Baselines	Non-GNN
Your Own GNN	Base

Test Your Own GNN

In addition to previous steps in reproducing baselines, implementing test_BREC.py is needed.

Evaluation Step

To test your GNNs, in addition to previous steps, you need to implement test_BREC.py with your model and run (${configs} represents corresponding config usage):

python test_BREC.py ${configs}

test.py is the pipeline for evaluation, including four stages:

1. pre-calculation;

2. dataset construction;

3. model construction;

4. evaluation

Pre-calculation aims to organize offline operations on graphs.

Dataset construction aims to process the dataset with specific operations. BRECDataset is implemented based on InMemoryDataset. It is recommended to use transform and pre_transform to transform the graphs.

Model construction aims to construct the GNN.

Evaluation implements RPC. With the model and dataset, it will produce the final results.

Suppose your own experiment is done by running python main.py. You can easily implement test_BREC.py with main.py. You can drop the training and testing pipeline in main.py and split the rest into corresponding stages in test.py.

Customize BREC Dataset

Some graphs in BREC may be too difficult for some models, like strongly regular graphs that 3-WL can not distinguish. You can discard some graphs from BREC to reduce test time. In addition, the parameter $q$ in RPC can also be adjusted when customizing. Only the customize directory is required.

├── Data     # Original graph file
    └── raw
        ├── basic.npy  # Basic graphs
        ├── regular.npy  # Simple regular graphs
        ├── str.npy   # Strongly regular graphs
        ├── cfi.npy   # CFI graphs
        ├── extension.npy # Extension graphs
        ├── 4vtx.npy  # 4-vertex condition graphs
        └── dr.npy   # Distance regular graphs
├── dataset_v3.py    # Generating brec_v3.npy 
├── dataset_v3_3wl.py # Generating brec_v3_3wl.npy    
└── dataset_v3_no4v_60cfi.py  # Generating brec_v3_no4v_60cfi.npy

Using brec_v3.npy by running python dataset_v3.py is enough for most methods.

For customization, suppose you want to discard distance regular graphs from BREC. You need to delete dr.npy related codes in dataset_v3.py. The total pair number and the "category-id_range" dictionary should also be adjusted.

"NUM" represent $q$ in RPC, which can be adjusted for a different RPC check.

Results Demonstration

The 400 pairs of graphs are from four categories: Basic, Regular, Extension, CFI. We further split 4-vertex condition and distance regular graphs from Regular as a separate category. The "category-id_range" dictionary is as follows:

  "Basic": (0, 60),
  "Regular": (60, 160),
  "Extension": (160, 260),
  "CFI": (260, 360),
  "4-Vertex_Condition": (360, 380),
  "Distance_Regular": (380, 400),

You can refer to the detailed graph in customize/Data/raw for analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
DropGNN		DropGNN
GNNAsKernel		GNNAsKernel
GSN		GSN
Graphormer		Graphormer
I2GNN		I2GNN
KCSetGNN		KCSetGNN
KP-GNN		KP-GNN
NestedGNN		NestedGNN
Non-GNNs		Non-GNNs
OSAN		OSAN
ProvablyPowerfulGraphNetworks_torch		ProvablyPowerfulGraphNetworks_torch
SUN		SUN
SWL		SWL
SparseWL		SparseWL
base		base
customize		customize
.gitignore		.gitignore
BREC_data_all.zip		BREC_data_all.zip
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

License

GraphPKU/BREC

Folders and files

Latest commit

History

Repository files navigation

Towards Better Evaluation of GNN Expressiveness with BREC Dataset

About

Usages

File Structure

Requirements

Data Preparation

Step 1: Data Generation

Step 2: Data Arrangement

Reproduce Baselines

Test Your Own GNN

Evaluation Step

Customize BREC Dataset

Results Demonstration

About

Resources

License

Stars

Watchers

Forks

Languages