MLMQ

Multi-Level-Multi-Queue (MLMQ), a GPU-friendly data structure design to accelerate Single-Source Shortest Path (SSSP) on GPUs

Introduction

The main code of MLMQ is in core, including the high-level templated framework and instantiated queues.

SSSP is the SSSP program based on MLMQ, where users can specify the types of queues and their parameters by manually editing the configuration file.

SSSP_adaptive is the SSSP program with adaptive MLMQ setup policy. The parameters will be automatically configured based on the input graph information.

ADDS is the baseline with the best performance, from "Wang, K., Fussell, D., & Lin, C. (2021, February). A fast work-efficient sssp algorithm for gpus. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 133-146)". You can also get the original version at https://zenodo.org/records/4365954#.X-2s8nVKiZQ.

Other baselines are not included. They can be found in:

H-BF from "Busato, F., & Bombieri, N. (2015). An efficient implementation of the Bellman-Ford algorithm for Kepler GPU architectures. IEEE Transactions on Parallel and Distributed Systems, 27(8), 2222-2233."

Gunrock and Gunrock-CPU at https://github.com/gunrock/gunrock, from "Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., & Owens, J. D. (2016, February). Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN symposium on principles and practice of parallel programming (pp. 1-12)."

Environments

Ubuntu 20.04.1, gcc/g++ 7.5, CUDA >= 11.1, Compute capacity >= 8.0.

Quick Start

Compilation

To compile, enter the target folder and execute the make command. For example,

cd SSSP
make

You can also compile with the script,

cd scripts
sh build_all.sh

Notice: Users need to change the compilation option based on the computational capability of GPU being used. The current released version is for CC 8.6. For example, on NVIDIA Tesla A100, the compilation option -gencode=arch=compute_86,code=sm_86 should be changed to -gencode=arch=compute_80,code=sm_80.

Most graphs we currently use have int edge weight. If you want to run on a graph with float edge weights, change #define TYPE_INT to #define TYPE_FLOAT in core/include/common.h.

Dataset

We use graph data in binary GR format, the same as ADDS. Download graph data at https://zenodo.org/records/4365954#.X-2s8nVKiZQ.

We also provide a tool in graph_transform to convert sparse matrix data in mtx format to gr format. Change #define TYPE_INT to #define TYPE_FLOAT in csr_graph.h to use float edge weights, otherwise int edge weights are used. Run the tool with following commands,

cd graph_transform
make
./transform [sparse matrix file (.mtx)] [graph data file (.gr)]

Run with manual configurations

Change the MLMQ configurations in core/include/common.h. Change MLMQ_TYPE for different types of queues. Changable parameters include node_size, l2_batch_size, mlmq_delta,BNUM,BUCKET_MAX.

The recommended configurations for datasets in the paper are given in configuration.csv

After configuring, compile again and run with ./main -i [graph file]. For example,

cd SSSP
make
./main -i USA-road-d.NY.gr

Run with adaptive setup

Run with our adaptive setup policy to enable automatic queue selection and parameter configuration on current datasets. This version may exhibit slight performance gaps on some graphs compared to optimal manual configurations, but overall, it achieves the performance close to the optimal.

Run with ./main -i [graph file]. For example,

cd SSSP_adaptive
make
./main -i USA-road-d.NY.gr

Run ADDS

Run ADDS baseline in ADDS. ads_int is for int edge weights and ads_float is for float edge weights. Run with ./sssp [graph file]. For example,

cd ADDS/ads_int
make
./sssp USA-road-d.NY.gr

Queue instances

The currently released queue instances include: L0 vector (embedded in SSSP program), L1 vector (L1V), L1 near-far queue (L1NF), L1 filter queue (L1FQ), L2 vector (L2V), L2 priority queue (L2PQ), L2 delta queue (L2DQ). We do not recommend the usage of L2PQ as its performance is extremely low due to the limitation in parallelism.

We are working towards better implementations for L2 Priority Queue (L2PQ), L2 Multi-vector Queue (L2MV), and L2 Multi-queue (L2MPQ).

Developer Guide: MLMQ Queue Interface Specification

This guide describes how to implement custom L1 (warp-local) and L2 (global) queues compatible with the MLMQ system.

L1 Queue (Warp-Local)

L1 queues operate at the warp level, using registers or shared memory. Each warp maintains its own L1 queue instance.

Required Interface

template <typename eletype>
class L1Queue {
public:
  __device__ init_status init(int wid, int lane_id);
  __device__ read_status read(eletype* node_in, int& read_num, int lane_id);
  __device__ write_status write(eletype* node_out, eletype* buffer, int& write_num, int& buffer_num, int lane_id);
  __device__ int get_queue_size();
};

Function Descriptions

Function	Description
`init(int wid, int lane_id)`	Initialize the queue before use. Called per warp.
`read(eletype* node_in, int& read_num, int lane_id)`	Read data from the queue. Updates `read_num` for current thread.
`write(eletype* node_out, eletype* buffer, int& write_num, int& buffer_num, int lane_id)`	Write items to the queue and optionally flush to `buffer` for L2.
`get_queue_size()`	Returns the total size of the queue.

Notes:

lane_id: thread ID within the warp.
buffer: used for flushing data to L2 queue.
write_num, buffer_num, read_num: per-warp counters.

L2 Queue (Global Queue)

L2 queues reside in global memory and support multi-warp, multi-block coordination. Each L2 queue has a manager thread responsible for metadata handling.

Required Interface

template <typename eletype>
class L2Queue {
public:
  init_status host_init(int max_size, mlmq_mdata& mdata, eletype init_limits, mlmq_setup setup);
  __device__ read_status read(eletype* node_in, int& read_num, int bid, int wid, int lane_id, unsigned* debug_time);
  __device__ write_status write(eletype* node_out, int& write_num, int bid, int wid, int lane_id, unsigned* debug_time);
  __device__ int get_queue_size();
  __device__ void manager_run(int wid, int lane_id);
  __device__ void update_done(int on_the_fly_num);
};

Function Descriptions

Function	Description
`host_init(...)`	Host-side queue initialization before kernel launch.
`read(...)`	Read elements from global queue into local buffer.
`write(...)`	Write elements to global queue from local buffer.
`get_queue_size()`	Returns the current global queue size.
`manager_run(...)`	Manager-side function to maintain queue metadata.
`update_done(on_the_fly_num)`	Lazily updates progress tracking metadata.

Notes:

bid, wid, lane_id: block ID, warp ID, thread ID within the warp.
debug_time: optional pointer for timing or profiling.
on_the_fly_num: number of elements being processed, used for manager-worker coordination.

Implementation Tips

L1 Queues: Use warp shuffle and ballot intrinsics for intra-warp coordination.
L2 Queues: Use atomic operations for global coordination. Minimize contention.
Buffering: Accumulate writes in L1 before flushing to L2 for efficiency.

By following this interface, you can plug in custom queue implementations into the MLMQ execution framework and explore the trade-offs between parallelism, memory locality, and priority guarantee.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLMQ

Introduction

Environments

Quick Start

Compilation

Dataset

Run with manual configurations

Run with adaptive setup

Run ADDS

Queue instances

Developer Guide: MLMQ Queue Interface Specification

L1 Queue (Warp-Local)

Required Interface

Function Descriptions

L2 Queue (Global Queue)

Required Interface

Function Descriptions

Implementation Tips

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
ADDS		ADDS
SSSP		SSSP
SSSP_adaptive		SSSP_adaptive
convert_to_GR		convert_to_GR
core		core
graph_transform		graph_transform
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Leo9660/MLMQ

Folders and files

Latest commit

History

Repository files navigation

MLMQ

Introduction

Environments

Quick Start

Compilation

Dataset

Run with manual configurations

Run with adaptive setup

Run ADDS

Queue instances

Developer Guide: MLMQ Queue Interface Specification

L1 Queue (Warp-Local)

Required Interface

Function Descriptions

L2 Queue (Global Queue)

Required Interface

Function Descriptions

Implementation Tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages