To accommodate the urgent requirement of emerging fields and the advance of Heterogeneous Graph Neural Networks (HGNNs), we build a new benchmark for two new fields: risky product detection (ICDM) and takeout recommendation (MTWM). Besides that, we establish benchmark interfaces with over 30 heterogeneous graph datasets from other fields and providea powerful and novel toolkit to research the charactertistics of graph datasets. All of the above is publicly available jusy by several codes.
This work now is deployed into utils of OpenHGNN.
conda create --name hgbi python=3.7
conda activate glibrarypip install -r requirement.txt import utils.hgbi as hgbi
ds_node = hgbi.build_dataset(
name = 'RPDD',task = 'node_classification')
ds_link = hgbi.build_dataset(
name = 'TRD',task = 'link_prediction')
You can also load other graph dataset from other fields:
ds_node = hgbi.build_dataset(
name = 'acm4NSHE',task = 'node_classification')
print(ds_node.g)
ds_link = hgbi.build_dataset(
name = 'ohgbl-yelp2',task = 'link_prediction')
print(ds_link.g)
from utils.tsne_g import *
import utils.hgbi as hgbi
from utils import *
from utils.meta_path_analyse import number_meta_path
dataset = hgbi.build_dataset(
name = 'dblp4GTN',task = 'node_classification')
plot_degree_dist(dataset.g,'./degree.png')
draw_tsne(dataset,'./sne.png')
meta_path_nums, heterophily, edge_radio = number_meta_path(g, meta_paths_dict=dataset.meta_paths_dict, strength=2)
For more details, please refer to the "demo_*.py" files
| Dataset | Ntype | Node | Etype | Edge | Avg Attri | Label | Model | Original (default: Macro/Micro-F1%) | Reproduced (Macro/Micro_F1%) |
|---|---|---|---|---|---|---|---|---|---|
| acm4NSHE | 3 | 11,246 | 4 | 34,852 | 128 | 3 | NSHE | 83.27/84.12 | 84.78/84.95 |
| acm4HeCo | 3 | 11,246 | 4 | 34,852 | 3,043 | 3 | HeCo | 89.04/88.71 | 88.66/88.35 |
| acm4NARS | 3 | 21,488 | 4 | 34,864 | 720 | 3 | NARS | 92.9 (Accuracy) | 91.35/91.44 |
| acm4HetGNN | 3 | 49,708 | 5 | 202,067 | 387 | 4 | HetGNN | 97.8/97.9 | 97.01/97.05 |
| acm4GTN | 3 | 8,994 | 4 | 25,922 | 1,902 | 3 | GTN | 92.68 (F1 score) | 92.03/92 |
| dblp4MAGNN | 4 | 26,128 | 6 | 239,566 | 5,601 | 4 | SimpleHGN | 93.89/94.35 | 86.79/86.75 |
| imdb4MAGNN | 3 | 11,616 | 4 | 34,212 | 3,468 | 3 | MAGNN | 60.43/60.63 | 62.85/62.78 |
| imdb4GTN | 3 | 12,772 | 4 | 37,288 | 1,256 | 4 | GTN | 60.92 (F1 score) | 56.97/58.61 |
| yelp4HeGAN | 5 | 3,913 | 8 | 77,360 | 64 | 3 | HeGAN | 85.24/80.31 | 71.51/79.16 |
| HGBn-ACM | 4 | 10,942 | 8 | 547,872 | 1,902 | 3 | SimpleHGN | 93.2/93.12 | 66.64/88.4 |
| HGBn-DBLP | 4 | 26,128 | 6 | 239,566 | 1,538 | 4 | SimpleHGN | 93.77/94.35 | 86.31/87.24 |
| ohgbn-Freebase | 8 | 12,164,758 | 36 | 62,982,566 | N/A | 8 | RGCN | N/A | 53.07/69.33 |
| ohgbn-yelp2 | 4 | 82,465 | 4 | 30,542,675 | N/A | 16 | RGCN | 5.10/23.24 | 5.04/40.44 |
| ohgbn-acm | 3 | 8,994 | 2 | 25,922 | 1,902 | 3 | fastGTN | N/A | 92.92/92.85 |
| ohgbn-imdb | 3 | 12,772 | 4 | 37,288 | 1,256 | 3 | RGCN | N/A | 57.57/63.66 |
| dblp4GTN | 3 | 18,405 | 4 | 67,946 | 334 | 4 | fastGTN | 94.18 (F1 score) | 90.39/91.39 |
| aifb | 7 | 7,262 | 104 | 48,810 | N/A | 4 | RGCN | 95.83 (Accuracy) | 96.92/97.22 |
| mutag | 5 | 27,163 | 50 | 148,100 | N/A | 2 | RGCN | 73.23 (Accuracy) | 66.40/70.59 |
| bgs | 27 | 94,806 | 122 | 672,884 | N/A | 2 | RGCN | 83.10 (Accuracy) | 88.26/89.66 |
| am | 7 | 1,885,136 | 108 | 5,668,682 | N/A | 11 | RGCN | 89.29 (Accuracy) | 89.41/89.90 |
| RPDD | 7 | 13,806,619 | 7 | 157,814,864 | 256 | 2 | RGCN | N/A | 90.46/98.02 |
| Dataset | Ntype | Node | Etype | Edge | Avg Attri | Label | Model | Paper | AUC_ROC |
|---|---|---|---|---|---|---|---|---|---|
| amazon4SLICE | 1 | 10,099 | 2 | 170,783 | 1,156 | 2 | RGCN | N/A | 74.6(avg) |
| HGBl-ACM | 4 | 10,942 | 8 | 547,872 | 1,902 | 1 | HDE | N/A | 87.41 |
| HGBl-DBLP | 4 | 26,128 | 6 | 239,566 | 1,538 | 1 | HDE | N/A | 98.36 |
| HGBl-IMDB | 4 | 21,420 | 6 | 86,642 | 3,390 | 1 | HDE | N/A | 91.51 |
| HGBl-amazon | 1 | 10,099 | 2 | 148,659 | 1,156 | 2 | GATNE-T | N/A | 80.83(avg) |
| HGBl-LastFM | 3 | 20,612 | 6 | 283,042 | N/A | 1 | RGCN | 81.9 | 76.46 |
| HGBl-PubMed | 4 | 63,109 | 20 | 489,972 | 200 | 1 | RGCN | 88.32 | 89.3 |
| ohgbl-yelp1 | 4 | 2,353,365 | 4 | 10,417,742 | N/A | 1 | CompGCN | N/A | 61.21 |
| ohgbl-yelp2 | 4 | 82,465 | 4 | 31,206,253 | N/A | 1 | RGCN | N/A | 65.6 |
| ohgbl-Freebase | 8 | 12,164,755 | 36 | 63,906,230 | N/A | 1 | RGCN | 50.18 | 58.75 |
| DoubanMovie | 6 | 37,595 | 12 | 3,429,852 | N/A | 1 | RGCN | N/A | 91.55 |
| TRD | 3 | 408,849 | 4 | 18,931,400 | N/A | 1 | RGCN | N/A | 92.69 |