A distributed multi-GPU system for fast graph processing.
CUDA is used to implemented Lux.
CUB is used as an external submodule for Lux's tasks.
Legion is the underlying runtime for launching tasks and managing data movement.
After you have cloned Lux, use the following command lines to clone CUB and Legion.
git submodule init git submodule update
- Download Lux source code:
# Using git to download Lux git clone --recursive https://github.com/LuxGraph/Lux
- Compile a Lux application (e.g., PageRank):
cd pagerank make clean; make -j 4
- To build a distributed version of Lux, set
USE_GASNETflag and rebuild:
make clean USE_GASNET=1 make -j 4
The applications take an input graph as well as several runtime flags starting with
-ll:. For example:
./pagerank -ll:gpu 4 -ll:fsize 12000 -ll:zsize 20000 -file twitter-2010.lux -ni 10 ./components -ll:gpu 1 -ll:fsize 6000 -ll:zsize 6000 -file indochina.lux -verbose -check ./sssp -ll:gpu 2 -ll:fsize 12000 -ll:zsize 20000 -file twitter-2010.lux -start 0
-ll:gpu: number of GPU processors to use in an execution
-ll:fsize: size of framebuffer memory for each GPU (in MB)
-ll:zsize: size of zero-copy memory (pinned DRAM with direct GPU access) on each node (in MB)
-file: path to the input graph
-ni: number of iterations to perform
-start: root vertex for SSSP
-verbose: show detailed task execution information
-check: perform correctness check after computation
Lux uses compressed sparse column (CSC) graph in binary format. The specific format is as follows in binary:
<nv> <ne> <c0> <c1> ... <c(nv-1)> <e0> <e1> ... <e(ne-1)>
<ne> denote the number of vertices and direct edges in the graph.
The CSC format starts with a sequence of offsets for the vertices in the CSC format, followed by a sequence of directed edges ordered by their target vertex.
<c(i)> refer to the start and end points of a contiguous block of in-edges for vertex
<e(c(i)-1)> is the collection of all in-neighbors for vertex
For weighted graphs, the weights are stored after the edge sources.
Some example graphs in Lux format are as follows.
Zhihao Jia, Yongkee Kwon, Galen Shipman, Pat McCormick, Mattan Erez, and Alex Aiken. A Distributed Multi-GPU System for Fast Graph Processing. PVLDB 11(3), 2017.