-
Notifications
You must be signed in to change notification settings - Fork 10
MPI+OpenMP implementation of Louvain method for Graph Community Detection, with a number of parallel heuristics/approximate computing techniques
License
ECP-ExaGraph/vite
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
*************** Vite (ˈviːte/) *************** ******* ------- ABOUT ------- ******* `Vite` is an MPI+OpenMP implementation of Louvain method for (undirected) graph clustering or community detection, that supports a number of heuristics to improve the scalability. Vite can use both real-world and synthetic graph (uses an in-memory random geometric graph generator). Please refer to the following paper that discusses the distributed implementation and associated heuristics: https://ieeexplore.ieee.org/abstract/document/8425242/ We recently added an option to improve the distribution to be more edge-balanced rather than standard vertex-based ("-b"), which can avoid communication and improve execution time. Relevant discussion in the following paper: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8916299 If quality is important for you, please consider the coloring options. Coloring will increase the overall execution time and communication, but it is possible to combine coloring with other heuristics to improve the performance. Relevant discussion in the following paper: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8547534 Vite can be downloaded from (please don't use the past PNNL/PNL link to download Vite mentioned in the paper, the current GitHub link is the correct one): https://github.com/Exa-Graph/vite This code requires an MPI library (preferably MPI-3 compatible) and C++11 compliant compiler for building. Please contact the following for any queries or support: Sayan Ghosh, PNNL (sg0 at pnnl dot gov) Mahantesh Halappanavar, PNNL (hala at pnnl dot gov) Antonino Tumeo, PNNL (antonino.tumeo at pnnl dot gov) Please '*' this repository on GitHub if the code is useful to you. ************* ------------- COMPILATION ------------- ************* Pass -DDEBUG_PRINTF if detailed diagonostics is required along program run. This program requires OpenMP and C++11 support, so pass -fopenmp (for g++)/-qopenmp (for icpc) and -std=c++11/ -std=c++0x. For Cray systems, use CC. Pass -DUSE_32_BIT_GRAPH if number of nodes in the graph are within 32-bit range (2 x 10^9), else 64-bit range is assumed. Pass -DDONT_CREATE_DIAG_FILES if you dont want to create 2 files per process with detail diagonostics. Upon building, the program will generate two binaries: bin/graphClustering (parallel) and bin/fileConvert (serial). Please use bin/fileConvert for input graph conversion from native formats to a binary format that bin/graphClustering will be able to read. Compiling on Intel KNL: We made some modifications to the code to port it to Cray XC systems with Intel KNL. We use lib-memkind to allocate some heavily used data structures to KNL MCDRAM (for usage in flat-mode). Please ensure lib-memkind module is loaded and pass -DUSE_AUTOHBW_MEMALLOC, and pass -xmic-AVX512 instead of -xHost in the Makefile. Experimental device offload: The modularity computation is the only arithmetic logic in this code. Even though, due to the limited computation associated with the modularity computation, it is doubtful how much GPU can help in improving the performance relative to using OpenMP host pragmas. However, we have added an option to enable OpenMP offload only for the modularity computation kernel. Vite must be built with -DOMP_TARGET_OFFLOAD and a compiler that supports OpenMP 4.5. The default compiler options in the Makefile needs to be updated as well, for instance, on Summit, for enabling OpenMP offload in the XL compiler, one would need to pass: "-qsmp=omp -qoffload" currently. Initial testing suggests that for certain graphs, it is possible to achieve about 10-15% improvement in performance relative to host OpenMP. ************************ ------------------------ INPUT GRAPH CONVERSION ------------------------ ************************ Convert input files to binary from various formats. Possible options (can be combined): 1. -n : Input graph in SNAP format 2. -m : Input graph in Matrix-market format 3. -e : Input graph in METIS format 4. -p : Input graph in Pajek format 5. -d : Input graph in DIMACS format. Pass 0 or 1 to indicate undirected/directed graph 6. -s : Input graph is directed edge list 7. -u : Input graph is undirected edge list (Can be used for Graph Challenge official datasets - http://graphchallenge.mit.edu/) 8. -x : Read a number of files with edge list information, usage e.g.: -x "<file-path> <start-chunk> <end-chunk>" Requires conformant file names. 9. -i : Accept weights as is from the file. If this option is not passed, then by default the absolute value of the original weight is chosen. 10. -r : Create random edge weights 11. -w : Initialize edge weights to 1.0 12. -o : Output binary file with full path 13. -z : If the index of input graph is 1-based, then this option makes it 0-based 14. -f : Option preceding input graph file ------------------------ File conversion related ------------------------ Typical example: bin/./fileConvert -n -f karate.txt -o karate.bin If your input file does not have edge weights, and you want random edge weights, then pass option "-r". bin/./fileConvert -m -r -f karate.mtx -o karate.bin Note: DIMACS file could be directed or undirected, so pass 0 if input is directed, or a number > 0 input graph is undirected. Also, we expect DIMACS inputs to have weights, therefore passing -r has no effect. bin/./fileConvert -d 0 -f sample.gr -o sample.bin Same as the one before, except initializes all weights to 1, no matter what is in the input file. bin/./fileConvert -d 0 -w -f sample.gr -o sample.bin Note: If the input file is index 1-based, then indicate that by passing -z, such that it will be converted to 0-based internally. bin/./fileConvert -s -z -f network.dat -o lfrtest.bin Note: We consider files containing edge list, as 'simple' formatted files (option "-s" for directed, and option "-u" for undirected). If you have a weighted simple format file (i.e., is `u v w` format) then you have to additionally pass "-i". If there are weights in such an edge-list file and for some reason you do not want to consider them, pass "-w" or "-r" in addition to "-s" or "-u" such that we can assign one-weights or random-weights internally. By default, we assume edge list files to not have any weights (as in, just `u v` per line). bin/./fileConvert -f uk2007-edges.txt -s -w -o uk2007.bin Note on `-s` and `-u` options ----------------------------- The `-s` or `-u` option is just to tell the implementation about your native graph format (as in if you pass `-s`, we assume in your edge list just `u v` exists and not its complement; whereas when you pass `-u`, we will assume that both `u v` and `v u` exists). Regardless of the native format, internally, we always use an undirected graph representation, because the algorithm itself assumes an undirected input, and accordingly prints the number of edges as double if your native graph was directed (on the other hand, if your native graph was undirected, and for e.g., you used `-u` to convert to binary, the #edges printed by the implementation should exactly match with the #edges of your native graph). Apart from the simple edge list formats and matrix-market formats, we do not actively check the correctness of the other formats. **************************** ---------------------------- SYNTHETIC GRAPH GENERATION ---------------------------- **************************** RGG --- Apart from real world graphs, users can use specific options to generate a Random Geometric Graph (RGG) in parallel. RGGs have been known to have good community structure: https://arxiv.org/pdf/1604.03993.pdf The way we have implemented a parallel RGG generator, vertices owned by a process will only have cross edges with its logical neighboring processes (each process owning 1x1/p chunk of the 1x1 unit square). If MPI process mapping is such that consecutive processes (for e.g., p and p+1) are physically close to each other, then there is not much communication stress in the application. Therefore, we allow an option to add extra edges between randomly chosen vertices, whose owners may be physically far apart. Relevant discussion in paper: https://ieeexplore.ieee.org/abstract/document/8641631 We require the total number of processes to be a power of 2 and total number of vertices to be perfectly divisible by the number of processes when parallel RGG generation options are used. An n-D random geometric graph (RGG), is generated by randomly placing N vertices in an n-D space and connecting pairs of vertices whose Euclidean distance is less than or equal to d. We only consider 2D RGGs contained within a unit square, [0,1]^2. We distribute the domain such that each process receives N/p vertices (where p is the total number of processes). Each process owns (1 * 1/p) portion of the unit square and d is computed as (please refer to Section 4 of above paper for details): d = (dc + dt)/2; where, dc = sqrt(ln(N) / pi*N); dt = sqrt(2.0736 / pi*N) Therefore, the number of vertices (N) passed during miniVite execution on p processes must satisfy the condition -- 1/p > d. Please note, the default distribution of graph generated from the in-built random geometric graph generator causes a process to only communicate with its two immediate neighbors. If you want to increase the communication intensity for generated graphs, please use the "-e" option to specify an extra percentage of edges that will be generated, linking random vertices. As a side-effect, this option significantly increases the time required to generate the graph. E.g.: mpiexec -n 2 bin/./minivite -f karate.bin mpiexec -n 2 bin/./minivite -l -n 100 mpiexec -n 2 bin/./minivite -n 100 mpiexec -n 2 bin/./minivite -e 2 -n 100 NetworKit bindings ------------------ We also have bindings to NetworKit[*], that can help generate random and scale free graphs, but this option is actively supported and may be purged from future Vite versions. This is mostly serial, but NetworKit internally may use OpenMP. Note: We have only tested with NetworKit GitHub main (circa Oct 2023), and have no plans on supporting other NetworKit versions (we will most probably remove this option from future releases). [*] https://networkit.github.io/ Networkit can be built from GitHub: git clone --recursive https://github.com/networkit/networkit.git Possible options (can be combined): 1. -e : Generate Erdos-Renyi random (ER) graph 2. -c : Generate Clustered Random Graph (CRG) 3. -b : Generate Barabasi-Albert graph 4. -h : Generate hyperbolic graph 5. -p <...> : Probability for edge creation (for ER graph) 6. -n <...> : Number of (max) nodes/vertices 7. -k <...> : Number of clusters for CRG (each unit square is divided into clusters where edges are created within) OR attachments per node for Barabasi-Albert graph 8. -m <...> : Initial nodes attached for Barabasi-Albert 9. -r : Create random edge weights *********************** ----------------------- EXECUTING THE PROGRAM ----------------------- *********************** 0. Input file conversion to Vite binary format from the respective native file formats: convert the input file to binary format using fileConvert. 1. Set the desired number of processes and threads. 2. Run distributed parallel Louvain algorithm (graphClustering binary) using the binary file created in Step #0: E.g., pass a binary file using "-f" option and include other heuristics: mpiexec -n 2 bin/./graphClustering -i -f karate.bin E.g., generate a random geometric graph by passing #vertices=64: mpiexec -n 2 bin/./graphClustering -n 64 E.g., generate a random geometric graph by passing #vertices=64 and store the resultant binary graph: mpiexec -n 2 bin/./graphClustering -n 64 -s rgg64.bin E.g., read a previously generated random geometric binary graph: mpiexec -n 2 bin/./graphClustering -f rgg64.bin Possible options (can be combined): 1. -f <input> : Input real-world binary file (read #1 above). 2. -c "<ncolors> <single-color-iteration?>" : Enable coloring, adjacent vertices are processed in different order. Synchronization happens once per <ncolors>, so this option can significantly increase execution time. We use incomplete iterative coloring algorithm (Jones-Plassmann), so passed colors is just a hint, also pass whether a single-color-iteration is expected, by default multiple iterations are invoked. 3. -d "<ncolors> <single-color-iteration?>" : Enable coloring, adjacent vertices are processes in different order. No communication overhead like above, only local overhead of coloring subgraph. 4. -i : Threshold cycling, threshold changes dynamically in every phase of Louvain algorithm. 5. -t 1 : If a vertex is in the same community for past 3 iterations, then consider it inactive. 6. -t 3 : If a vertex is in the same community for past 3 iterations, then consider it inactive. Also, adds a communication step to gather inactive vertices and terminate Louvain if >= 90% vertices at an iteration are inactive. 7. -t 2 -a <0-1> : Early termination* using probability alpha. 8. -t 4 -a <0-1> : Early termination* using probability alpha. Also, adds a communication step to gather inactive vertices and terminate Louvain if >= 90% vertices at an iteration are inactive. 9. -b : Only valid for real-world inputs. Attempts to distribute approximately equal number of edges among processes. Irregular number of vertices owned by a particular process. Increases the distributed graph creation time due to serial overheads, but may significantly improve the overall execution time. 10. -o : Output communities into a file. This option will result in Vite dumping the communities (community-per-vertex in each line, total number of lines == number of vertices) in a text file named <input-binary-file>.communities in the same path as the input binary file. 11. -r <nranks> : This is used to control the number of aggregators in MPI I/O and is meaningful when an input binary graph file is passed with option "-f". naggr := (nranks > 1) ? (nprocs/nranks) : nranks; 12. -g <gfile> : Pass a ground truth file for community comparison. We expect the ground truth file to contain N lines (equal to the total #vertices in the graph), while each line containing a distinct vertex ID and associated community ID, separated by a space or tab. Ground truth community comparison is performed in a single node, and it uses OpenMP to parallelize. It may take a substantial amount of time for large files. 13. -z : Only applicable if "-g <gfile>" option is passed. This tells us that the passed ground truth file is 1-based. If this option is not passed, we assume the ground truth to the 0-based. 14. -n <|V|> : Generate graph in memory (uses a parallel Random Geometric Graph generator). 15. -e <%> : Used in conjunction with the "-n <|V|>" option to generate RGG. This option tells the percentage of edges to be added, randomly connecting vertices across processes. Currently, the maximum number of randomly added edges to RGG cannot exceed INT_MAX (there is no check to determine this, so exercise caution). 16. -p : Run a single phase of Louvain. 17. -s <output> : Only applicable to the generated graphs (for e.g., -n <|V|>), output is a binary file as per the output file path. 18: -j : Just process (read or generate) the graph and exit without running community detection. Coloring: mpiexec -n 2 bin/graphClustering -c "4" -f karate.bin mpiexec -n 2 bin/graphClustering -c "8 false" -i -f karate.bin By default, the coloring method is run for multiple iterations, to make it run for just a single iteration, pass "true". For e.g.: mpiexec -n 2 bin/graphClustering -c "4 true" -f karate.bin *Note: Option to update inactive vertices percentage is defined as macro ET_CUTOFF in louvain.hpp, and the default is 2%. ********************************************** ---------------------------------------------- COMPARING COMMUNITIES WITH GROUND TRUTH DATA ---------------------------------------------- ********************************************** 1. Generate benchmark graphs with ground truth community information. For e.g., we have tested with LFR benchmark. (https://sites.google.com/site/santofortunato/inthepress2) LFR generates an edge list (index 1-based), along with community membership of each vertex. 2. Convert the generated network to binary using fileConvert binary. 3. Execute the program as shown in the e.g. below: mpiexec -n 2 bin/./graphClustering -r 1 -f lfrtest.bin -z -g community.dat The difference between standard way to execute the program and this way is that one needs to pass the ground truth vertex-community mapping using the "-g" option. Also, the "-z" option tells the program that the input ground truth file -- `community.dat` is 1-based. If "-z" is not passed, then we assume that the file is 0-based. We return some statistics such as F-score, Gini-coefficient, false positives/negatives and true positives. We expect that the isolated vertices will be placed into their respective communities, and not into a single one. However, extra communication per phase is required when these metrics are to be computed, so the program execution time will increase significantly. Plus, the community comparison part is currently multithreaded (uses OpenMP), so everything is brought to a single node/PE and then communities are compared. *************** --------------- OUTPUT RESULT --------------- *************** If -DDONT_CREATE_DIAG_FILES is passed during compilation, then output is send to stdout. Otherwise, the output result is dumped per process on files named as dat.out.<process-id>. Check dat.out.0 to review program diagonostics. Output files are cleared with: `make clean`.
About
MPI+OpenMP implementation of Louvain method for Graph Community Detection, with a number of parallel heuristics/approximate computing techniques
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published