GitHub - ECP-ExaGraph/vite: MPI+OpenMP implementation of Louvain method for Graph Community Detection, with a number of parallel heuristics/approximate computing techniques

ECP-ExaGraph / vite Public
Notifications You must be signed in to change notification settings
Fork 10
Star 26
MPI+OpenMP implementation of Louvain method for Graph Community Detection, with a number of parallel heuristics/approximate computing techniques
BSD-3-Clause license
26 stars 10 forks Branches Tags Activity
Star
Notifications
Branches Tags
Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
converters		converters
generators		generators
parallel-converters		parallel-converters
LICENSE.md		LICENSE.md
Makefile		Makefile
README		README
coloring.cpp		coloring.cpp
coloring.hpp		coloring.hpp
compare.cpp		compare.cpp
compare.hpp		compare.hpp
distgraph.cpp		distgraph.cpp
distgraph.hpp		distgraph.hpp
edge.hpp		edge.hpp
graph.hpp		graph.hpp
locks.hpp		locks.hpp
louvain.cpp		louvain.cpp
louvain.hpp		louvain.hpp
main.cpp		main.cpp
rebuild.cpp		rebuild.cpp
rebuild.hpp		rebuild.hpp
utils.cpp		utils.cpp
utils.hpp		utils.hpp
Repository files navigation

***************
Vite (ˈviːte/)
***************

*******
-------
 ABOUT
-------
*******
`Vite` is an MPI+OpenMP implementation of Louvain method for (undirected) 
graph clustering or community detection, that supports a number of heuristics 
to improve the scalability. Vite can use both real-world and synthetic graph 
(uses an in-memory random geometric graph generator).

Please refer to the following paper that discusses the 
distributed implementation and associated heuristics:
https://ieeexplore.ieee.org/abstract/document/8425242/

We recently added an option to improve the distribution to
be more edge-balanced rather than standard vertex-based ("-b"), 
which can avoid communication and improve execution time. 
Relevant discussion in the following paper:
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8916299


If quality is important for you, please consider the coloring 
options. Coloring will increase the overall execution time and
communication, but it is possible to combine coloring with
other heuristics to improve the performance. Relevant discussion 
in the following paper: 
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8547534

Vite can be downloaded from (please don't use the past
PNNL/PNL link to download Vite mentioned in the paper, the 
current GitHub link is the correct one):
https://github.com/Exa-Graph/vite

This code requires an MPI library (preferably MPI-3 compatible) 
and C++11 compliant compiler for building. 

Please contact the following for any queries or support:

Sayan Ghosh, PNNL (sg0 at pnnl dot gov)
Mahantesh Halappanavar, PNNL (hala at pnnl dot gov)
Antonino Tumeo, PNNL (antonino.tumeo at pnnl dot gov)

Please '*' this repository on GitHub if the code is useful to you.

*************
-------------
 COMPILATION
-------------
*************

Pass -DDEBUG_PRINTF if detailed diagonostics is required along
program run. This program requires OpenMP and C++11 support,
so pass -fopenmp (for g++)/-qopenmp (for icpc) and -std=c++11/
-std=c++0x. For Cray systems, use CC.

Pass -DUSE_32_BIT_GRAPH if number of nodes in the graph are 
within 32-bit range (2 x 10^9), else 64-bit range is assumed.

Pass -DDONT_CREATE_DIAG_FILES if you dont want to create 2 files
per process with detail diagonostics.

Upon building, the program will generate two binaries:
bin/graphClustering (parallel) and bin/fileConvert (serial).

Please use bin/fileConvert for input graph conversion from 
native formats to a binary format that bin/graphClustering will
be able to read. 

Compiling on Intel KNL:

We made some modifications to the code to port it to Cray XC systems 
with Intel KNL. We use lib-memkind to allocate some heavily used data 
structures to KNL MCDRAM (for usage in flat-mode). 
Please ensure lib-memkind module is loaded and pass 
-DUSE_AUTOHBW_MEMALLOC, and pass -xmic-AVX512 instead of -xHost in 
the Makefile.

Experimental device offload:

The modularity computation is the only arithmetic logic in this code.
Even though, due to the limited computation associated with the modularity 
computation, it is doubtful how much GPU can help in improving the 
performance relative to using OpenMP host pragmas. However, we have added
an option to enable OpenMP offload only for the modularity computation
kernel. Vite must be built with -DOMP_TARGET_OFFLOAD and a compiler that
supports OpenMP 4.5. The default compiler options in the Makefile needs to 
be updated as well, for instance, on Summit, for enabling OpenMP offload 
in the XL compiler, one would need to pass: "-qsmp=omp -qoffload" currently.
Initial testing suggests that for certain graphs, it is possible to achieve
about 10-15% improvement in performance relative to host OpenMP.

************************
------------------------
 INPUT GRAPH CONVERSION
------------------------
************************

Convert input files to binary from various formats. 
Possible options (can be combined):

1. -n               : Input graph in SNAP format
2. -m               : Input graph in Matrix-market format
3. -e               : Input graph in METIS format
4. -p               : Input graph in Pajek format
5. -d               : Input graph in DIMACS format. Pass 0 or 1
                      to indicate undirected/directed graph
6. -s               : Input graph is directed edge list
7. -u               : Input graph is undirected edge list 
                      (Can be used for Graph Challenge official 
                      datasets - http://graphchallenge.mit.edu/) 
8. -x               : Read a number of files with edge list 
                      information, usage e.g.: 	
                      -x "<file-path> <start-chunk> <end-chunk>"
                      Requires conformant file names.
9. -i               : Accept weights as is from the file. If this 
                      option is not passed, then by default the 
                      absolute value of the original weight is 
                      chosen. 
10. -r              : Create random edge weights
11. -w              : Initialize edge weights to 1.0
12. -o              : Output binary file with full path
13. -z              : If the index of input graph is 1-based,
                      then this option makes it 0-based
14. -f              : Option preceding input graph file                      

------------------------
File conversion related
------------------------

Typical example: 
bin/./fileConvert -n -f karate.txt -o karate.bin

If your input file does not have edge weights, and you 
want random edge weights, then pass option "-r". 

bin/./fileConvert -m -r -f karate.mtx -o karate.bin

Note: DIMACS file could be directed or undirected, so
pass 0 if input is directed, or a number > 0 input
graph is undirected. Also, we expect DIMACS inputs to
have weights, therefore passing -r has no effect.

bin/./fileConvert -d 0 -f sample.gr -o sample.bin

Same as the one before, except initializes all weights
to 1, no matter what is in the input file.

bin/./fileConvert -d 0 -w -f sample.gr -o sample.bin

Note: If the input file is index 1-based, then indicate
that by passing -z, such that it will be converted to
0-based internally.

bin/./fileConvert -s -z -f network.dat -o lfrtest.bin

Note: We consider files containing edge list, as 'simple' 
formatted files (option "-s" for directed, and option "-u" 
for undirected). If you have a weighted simple format file 
(i.e., is `u v w` format) then you have to additionally
pass "-i". 
If there are weights in such an edge-list file and for some 
reason you do not want to consider them, pass "-w" or "-r" 
in addition to "-s" or "-u" such that we can assign one-weights 
or random-weights internally. 

By default, we assume edge list files to not have any weights 
(as in, just `u v` per line).

bin/./fileConvert -f uk2007-edges.txt -s -w -o uk2007.bin

Note on `-s` and `-u` options
-----------------------------
The `-s` or `-u` option is just to tell the implementation about 
your native graph format (as in if you pass `-s`, we assume in 
your edge list just `u v` exists and not its complement; whereas 
when you pass `-u`, we will assume that both `u v`  and `v u` 
exists). 

Regardless of the native format, internally, we always use an 
undirected graph representation, because the algorithm itself 
assumes an undirected input, and accordingly prints the number of 
edges as double if your native graph was directed (on the other 
hand, if your native graph was undirected, and for e.g., you used 
`-u` to convert to binary, the #edges printed by the implementation 
should exactly match with the #edges of your native graph).

Apart from the simple edge list formats and matrix-market formats, 
we do not actively check the correctness of the other formats.

****************************
----------------------------
 SYNTHETIC GRAPH GENERATION
----------------------------
****************************

RGG
---
Apart from real world graphs, users can use specific options
to generate a Random Geometric Graph (RGG) in parallel.
RGGs have been known to have good community structure:
https://arxiv.org/pdf/1604.03993.pdf

The way we have implemented a parallel RGG generator, vertices
owned by a process will only have cross edges with its logical
neighboring processes (each process owning 1x1/p chunk of the
1x1 unit square). If MPI process mapping is such that consecutive
processes (for e.g., p and p+1) are physically close to each other,
then there is not much communication stress in the application.
Therefore, we allow an option to add extra edges between randomly
chosen vertices, whose owners may be physically far apart. Relevant 
discussion in paper:
https://ieeexplore.ieee.org/abstract/document/8641631

We require the total number of processes to be a power of 2 and
total number of vertices to be perfectly divisible by the number of
processes when parallel RGG generation options are used. 

An n-D random geometric graph (RGG), is generated by randomly placing N 
vertices in an n-D space and connecting pairs of vertices whose Euclidean 
distance is less than or equal to d. We only consider 2D RGGs contained 
within a unit square, [0,1]^2. We distribute the domain such that each 
process receives N/p vertices (where p is the total number of processes).
Each process owns (1 * 1/p) portion of the unit square and d is computed 
as (please refer to Section 4 of above paper for details):

d = (dc + dt)/2;
where, dc = sqrt(ln(N) / pi*N); dt = sqrt(2.0736 / pi*N)

Therefore, the number of vertices (N) passed during miniVite execution on p
processes must satisfy the condition -- 1/p > d.

Please note, the default distribution of graph generated from the in-built random
geometric graph generator causes a process to only communicate with its two
immediate neighbors. If you want to increase the communication intensity for
generated graphs, please use the "-e" option to specify an extra percentage of edges
that will be generated, linking random vertices. As a side-effect, this option
significantly increases the time required to generate the graph.


E.g.:
mpiexec -n 2 bin/./minivite -f karate.bin
mpiexec -n 2 bin/./minivite -l -n 100
mpiexec -n 2 bin/./minivite -n 100
mpiexec -n 2 bin/./minivite -e 2 -n 100


NetworKit bindings
------------------
We also have bindings to NetworKit[*], that can help generate random and scale
free graphs, but this option is actively supported and may be purged from future
Vite versions. This is mostly serial, but NetworKit internally may use OpenMP. 

Note: We have only tested with NetworKit GitHub main (circa Oct 2023), and have no 
plans on supporting other NetworKit versions (we will most probably remove this option 
from future releases).

[*] https://networkit.github.io/

Networkit can be built from GitHub: 
git clone --recursive https://github.com/networkit/networkit.git

Possible options (can be combined):
1.  -e               : Generate Erdos-Renyi random (ER) graph
2.  -c               : Generate Clustered Random Graph (CRG)
3.  -b               : Generate Barabasi-Albert graph
4.  -h               : Generate hyperbolic graph
5.  -p <...>         : Probability for edge creation (for ER graph)
6.  -n <...>         : Number of (max) nodes/vertices
7.  -k <...>         : Number of clusters for CRG (each unit square is 
                       divided into clusters where edges are created 
                       within) OR attachments per
                       node for Barabasi-Albert graph
8.  -m <...>         : Initial nodes attached for Barabasi-Albert                      
9.  -r               : Create random edge weights 

***********************
-----------------------
 EXECUTING THE PROGRAM
-----------------------
***********************

0. Input file conversion to Vite binary format from the respective native 
file formats: convert the input file to binary format using fileConvert.

1. Set the desired number of processes and threads.  

2. Run distributed parallel Louvain algorithm (graphClustering binary) 
using the binary file created in Step #0:

E.g., pass a binary file using "-f" option and include other heuristics: 
mpiexec -n 2 bin/./graphClustering -i -f karate.bin

E.g., generate a random geometric graph by passing #vertices=64:
mpiexec -n 2 bin/./graphClustering -n 64

E.g., generate a random geometric graph by passing #vertices=64
and store the resultant binary graph:
mpiexec -n 2 bin/./graphClustering -n 64 -s rgg64.bin

E.g., read a previously generated random geometric binary graph:
mpiexec -n 2 bin/./graphClustering -f rgg64.bin


Possible options (can be combined):

1. -f <input>    : Input real-world binary file (read #1 above).
2. -c "<ncolors> <single-color-iteration?>"  : 
                   Enable coloring, adjacent vertices are processed
                   in different order. Synchronization happens once
                   per <ncolors>, so this option can significantly
                   increase execution time. 
                   We use incomplete iterative
                   coloring algorithm (Jones-Plassmann), so passed colors 
                   is just a hint, also pass whether a single-color-iteration
                   is expected, by default multiple iterations are invoked.
3. -d "<ncolors>  <single-color-iteration?>"  : 
                   Enable coloring, adjacent vertices are processes
                   in different order. No communication overhead like
                   above, only local overhead of coloring subgraph.
4. -i            : Threshold cycling, threshold changes dynamically
                   in every phase of Louvain algorithm.
5. -t 1          : If a vertex is in the same community for past 3
                   iterations, then consider it inactive.
6. -t 3          : If a vertex is in the same community for past 3
                   iterations, then consider it inactive. Also,
                   adds a communication step to gather inactive 
                   vertices and terminate Louvain if >= 90% vertices
                   at an iteration are inactive.
7. -t 2 -a <0-1> : Early termination* using probability alpha.
8. -t 4 -a <0-1> : Early termination* using probability alpha. Also,
                   adds a communication step to gather inactive 
                   vertices and terminate Louvain if >= 90% vertices
                   at an iteration are inactive.
9. -b            : Only valid for real-world inputs. Attempts to 
                   distribute approximately equal number of edges among 
                   processes. Irregular number of vertices owned by a 
                   particular process. Increases the distributed graph 
                   creation time due to serial overheads, but may 
                   significantly improve the overall execution time.
10. -o           : Output communities into a file. This option will result 
                   in Vite dumping the communities (community-per-vertex in 
                   each line, total number of lines == number of vertices) 
                   in a text file named <input-binary-file>.communities in 
                   the same path as the input binary file.
11. -r <nranks>  : This is used to control the number of aggregators in MPI 
                   I/O and is meaningful when an input binary graph file is 
                   passed with option "-f".
                   naggr := (nranks > 1) ? (nprocs/nranks) : nranks;
12. -g <gfile>   : Pass a ground truth file for community comparison. We 
                   expect the ground truth file to contain N lines (equal to 
                   the total #vertices in the graph), while each line containing 
                   a distinct vertex ID and associated community ID, separated by 
                   a space or tab. Ground truth community comparison is performed
                   in a single node, and it uses OpenMP to parallelize. It may take
                   a substantial amount of time for large files.
13. -z           : Only applicable if "-g <gfile>" option is passed. This tells us
                   that the passed ground truth file is 1-based. If this option is
                   not passed, we assume the ground truth to the 0-based.
14. -n <|V|>     : Generate graph in memory (uses a parallel Random Geometric Graph
                   generator).
15. -e <%>       : Used in conjunction with the "-n <|V|>" option to generate RGG. 
                   This option tells the percentage of edges to be added, randomly 
                   connecting vertices across processes. Currently, the maximum number
                   of randomly added edges to RGG cannot exceed INT_MAX (there is no 
                   check to determine this, so exercise caution).
16. -p           : Run a single phase of Louvain.
17. -s <output>  : Only applicable to the generated graphs (for e.g., -n <|V|>),
                   output is a binary file as per the output file path.
18: -j           : Just process (read or generate) the graph and exit without running
                   community detection.

Coloring:

mpiexec -n 2 bin/graphClustering -c "4" -f karate.bin
mpiexec -n 2 bin/graphClustering -c "8 false" -i -f karate.bin

By default, the coloring method is run for multiple iterations, to make it run for 
just a single iteration, pass "true". For e.g.:
mpiexec -n 2 bin/graphClustering -c "4 true" -f karate.bin


*Note: Option to update inactive vertices percentage is defined as
macro ET_CUTOFF in louvain.hpp, and the default is 2%.

**********************************************
----------------------------------------------
 COMPARING COMMUNITIES WITH GROUND TRUTH DATA
----------------------------------------------
**********************************************

1. Generate benchmark graphs with ground truth
community information. For e.g., we have tested
with LFR benchmark. 
(https://sites.google.com/site/santofortunato/inthepress2)
LFR generates an edge list (index 1-based), along with community
membership of each vertex.

2. Convert the generated network to binary using 
fileConvert binary. 

3. Execute the program as shown in the e.g. below:
mpiexec -n 2 bin/./graphClustering -r 1 -f lfrtest.bin -z -g community.dat

The difference between standard way to execute the program and this way
is that one needs to pass the ground truth vertex-community mapping 
using the "-g" option. Also, the "-z" option tells the program that
the input ground truth file -- `community.dat` is 1-based. If "-z" is
not passed, then we assume that the file is 0-based.

We return some statistics such as F-score, Gini-coefficient, 
false positives/negatives and true positives. We expect that the isolated
vertices will be placed into their respective communities, and not into 
a single one.

However, extra communication per phase is required when these metrics
are to be computed, so the program execution time will increase
significantly. Plus, the community comparison part is currently
multithreaded (uses OpenMP), so everything is brought to a single 
node/PE and then communities are compared. 

***************
---------------
 OUTPUT RESULT
---------------
***************

If -DDONT_CREATE_DIAG_FILES is passed during compilation,
then output is send to stdout.
Otherwise, the output result is dumped per process on files 
named as dat.out.<process-id>. Check dat.out.0 to review 
program diagonostics. 
Output files are cleared with: `make clean`.