An efficient subgraph matching algorithm optimized for NVIDIA GPUs using CUDA, featuring high GPU utilization, low GPU memory consumption, and load-balanced parallel execution.
This project implements a GPU-accelerated subgraph matching algorithm featuring:
- Parallel BFS and DFS kernels for efficient subgraph isomorphism enumeration
- An optimized DFS backtracking stack that improves GPU thread utilization
- Candidate filtering techniques to produce reduced candidate sets
- Symmetry breaking techniques to ensure each unique subgraph is enumerated exactly once
- A GPU-optimized data structure for retrieving candidate edges
- A bitmap-based data structure to accelerate set intersection
- Load balancing techniques for inter-warp workload distribution
This project requires CMake (version 3.30.1), Make (version 4.3), GCC (version 10.5.0), and NVCC (version 12.5). One can compile the code using the following command.
bash compile.shAfter successful compilation, the binary files bitmap/build/SubgraphMatching, hash_table/build/SubgraphMatching, neighbor/build/SubgraphMatching_unlabeled, and neighbor/build/SubgraphMatching_labeled are created.
The correctness of the algorithm can be verified using the following commands:
cd test
python3 test.py --binary ../bitmap/build/SubgraphMatching
python3 test.py --binary ../hash_table/build/SubgraphMatching
python3 test.py --binary ../neighbor/build/SubgraphMatching_labeledgMatch only accepts specific input formats. An invalid format is likely to cause errors. A valid graph file follows the structure below:
-
Header Line (
t):- Starts with
t, followed by two integers representing:- Total number of vertices
- Total number of edges
- Starts with
-
Vertex Lines (
v):- Each line starts with
v, followed by three integers per vertex:- Vertex ID (0-based index)
- Label (arbitrary numeric value)
- Degree (number of edges connected to this vertex)
- Each line starts with
-
Edge Lines (
e):- Each line starts with
e, followed by two vertex IDs (0-based) representing an undirected edge between them. - Edges are unordered (e.g.,
e 0 1is the same ase 1 0).
- Each line starts with
This format defines an undirected graph where vertices are labeled and degrees are explicitly listed for each node. For a graph t
gMatch also supports a CSR (Compressed Sparse Row) binary format for large data graphs, structured as follows: The file begins with two 32-bit integers representing the vertex count
To specify the input format, use the command-line parameter -b for CSR binary format and -d for text format.
After successful compilation, four binary files are created in the bitmap/build, hash_table/build, and neighbor/build directories. They all solve the subgraph matching problem but are designed for different scenarios.
This is designed for large queries over medium-sized graphs (e.g., dblp, enron, gowalla, github, and wikitalk).
One can execute bitmap/build/SubgraphMatching using the following command.
./bitmap/build/SubgraphMatching -d <data_graph_path> -q <query_graph_path>Here is an example:
./bitmap/build/SubgraphMatching -d test/naive/data.graph -q test/naive/query_graph/Q_0.graph| Argument | Description |
|---|---|
-q |
Path to the query graph (required). |
-d |
Path to the data graph (text format only, required). |
--device |
GPU device ID to use (0 by default). |
--initial |
Threshold for the number of partial matches at which BFS ends (default: |
hash_table/build/SubgraphMatching provides another version of gMatch, using a hash table-based data structure for candidate set retrieval. Hash table-based candidate graph is generally slower than bitmap-based but is able to maintain larger candidate sets.
This is designed for small queries over large-scale graphs (e.g., pokec, friendster, orkut, livejournal, ldbc, and rmat).
neighbor/build/SubgraphMatching_unlabeled is designed for large data graphs (e.g., pokec, friendster, orkut, livejournal, ldbc, and rmat) where candidate graph is deactivated, and local candidate sets are computed by intersecting the neighborhoods of backward neighbors. This program processes unlabeled graphs only. For labeled graphs, please use neighbor/build/SubgraphMatching_labeled.
./neighbor/build/SubgraphMatching_unlabeled -b dataset/orkut/label_1/orkut.bin -q patterns/p2| Argument | Description |
|---|---|
-q |
Path to the query graph file (required). |
-d |
Path to the data graph file in text format (deprecated). |
-b |
Path to the data graph file in binary CSR format. |
--device |
GPU device ID to use (0 by default). |
--memory-pool |
Enable memory pooling for BFS (deprecated, disabled by default). |
--no-filtering |
Manually disable candidate filtering (automatically added when the data graph is too large). |
--reorder [0/1] |
Reorder vertices by degree for CSR graphs (disabled by default; 0/1 for descending/ascending). |
--dump |
Save reordered CSR graph to a cache file for future use (requires --reorder) |
--no-orientation |
Disable the conversion of the data graph to a DAG (an optimization for cliques, enabled by default). |
--no-induced |
Disable induced subgraph optimization (enabled by default). |
To reproduce the experimental results of gMatch in Figure 9 of the paper, please use test_bitmap.py. For example, python3 test_bitmap.py --dataset enron --label 16 --query 12. This will run a batch of 12-vertex queries on the data graph enron.
Note: It is necessary to prepare the data graph as dataset/enron/label_16/data.graph, and all the query graphs in the directory dataset/enron/label_16/query_graph/12/ with file names such as Q_0, Q_1, Q_2, etc.
To reproduce the results of gMatch in Table 4 of the paper, please use neighbor/build/SubgraphMatching_unlabeled. For example, ./neighbor/build/SubgraphMatching_unlabeled -b dataset/livejournal/label_1/livejournal.bin -q patterns/p2.
Our experimental results obtained using an RTX 4090 on the LiveJournal, Orkut, and LDBC datasets are presented below.
| Pattern | LiveJournal | LiveJournal | Orkut | Orkut | LDBC | LDBC |
|---|---|---|---|---|---|---|
| #Matches | GPU Time (ms) | #Matches | GPU Time (ms) | #Matches | GPU Time (ms) | |
| P1 | 1,714,381,584 | 28 | 3,765,505,086 | 101 | 1,076,886,894 | 112 |
| P2 | 238,404,768,456 | 131 | 77,326,707,288 | 195 | 2,961,127,128 | 207 |
| P3 | 305,418,353,368 | 1,450 | 268,395,557,704 | 5,129 | 55,932,016,868 | 4,317 |
| P4 | 412,164,582,216 | 2,016 | 1,020,265,364,600 | 21,464 | 1,436,179,086,848 | 110,907 |
| P5 | 85,748,180,406,532 | 346,679 | 21,361,787,058,228 | 506,696 | 2,356,280,683,116 | 142,160 |
| P6 | 75,367,343,121,332 | 496,569 | 10,416,429,418,264 | 704,716 | 297,027,626,080 | 180,966 |
| P7 | 72,331,185,812,144 | 237,250 | 7,093,759,171,560 | 245,424 | 247,429,007,424 | 84,592 |
| P8 | 63,673,129,086,480 | 140,261 | 3,544,851,291,228 | 70,423 | 36,166,369,836 | 8,481 |
| P9 | 56,091,580,340,880 | 9,778 | 1,891,992,943,200 | 1,526 | 5,062,485,360 | 317 |
This algorithm is from the paper: "Xibo Sun and Qiong Luo. Efficient GPU-Accelerated Subgraph Matching. SIGMOD 2023." We use EGSM as a comparison system for our experiments. The code is from https://github.com/RapidsAtHKUST/EGSM.
One can use the following commands to compile EGSM.
cd compared_system/EGSM
mkdir build
cd build
cmake ..
makeRun compared_system/test_egsm.py for experimental results of EGSM.
This algorithm is from the paper "Yihua Wei, Peng Jiang. STMatch: Accelerating Graph Pattern Matching on GPU with Stack-Based Loop Optimizations. SC 2022."
We use STMatch as a comparison system for our experiments. The code is from https://github.com/HPC-Research-Lab/STMatch. For fairness, we modify the code so that it uses the same matching order for subgraph matching. We also modify the code for graph file parsing so that it can process our graph format correctly.
One can use the following commands to compile STMatch.
cd compared_system/STMatch
mkdir bin
makeThis algorithm is from the paper "Lyuheng Yuan, et al. Faster Depth-First Subgraph Matching on GPUs. ICDE 2024." The code is from https://github.com/lyuheng/tdfs. We modify the code so that it uses the same matching order for subgraph matching. We also modify the code for graph file parsing so that it can process our graph format correctly.
One can use the following commands to compile T-DFS.
cd compared_system/TDFS
mkdir bin
make