Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add R-MAT generator in tests/bfs #62

Merged
merged 15 commits into from
Oct 29, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions tests/bfs/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,10 @@ else()
message("Skipping bfs, OpenMP required")
endif()

add_executable(compute_degree_distribution compute_degree_distribution.cpp)
install(TARGETS compute_degree_distribution
LIBRARY DESTINATION lib
ARCHIVE DESTINATION lib/static
RUNTIME DESTINATION bin )

add_subdirectory(rmat_edge_generator)
44 changes: 41 additions & 3 deletions tests/bfs/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,50 @@
# Run
# Usages

## Generate edge list using a R-MAT generator
cd rmat_edge_generator

./generate_edge_list

-o \[out edge list file name (required)\] \\\\

-s \[seed for random number generator; default is 123\] \\\\

-v \[SCALE; The logarithm base two of the number of vertices; default is 17\] \\\\

-e \[#of edges; default is 2^{SCALE} x 16\] \\\\

-a \[Initiator parameter A; default is 0.57\] \\\\

-b \[Initiator parameter B; default is 0.19\] \\\\

-c \[Initiator parameter C; default is 0.19\] \\\\

-r \[If true, scrambles edge IDs\; default is true] \\\\

-u \[If true, generate edges for the both direction; default is true\]

* As for the initiator parameters,
see [Graph500, 3.2 Detailed Text Description](https://graph500.org/?page_id=12#sec-3_2) for more details.
* Our edge list ingest program read edge lists as directed graph.
If you use the ingest program, please specify 'true' to -u option (its default value is true).


## Ingest Edge List (construct CSR graph)

./ingest_edge_list -g /l/ssd/csr_graph_file /l/ssd/edgelist1 /l/ssd/edgelist2

* Load edge data from files /l/ssd/edgelist1 and /l/ssd/edgelist2 (you can specify arbitrary number of files)
* Load edge data from files /l/ssd/edgelist1 and /l/ssd/edgelist2 (you can specify arbitrary number of files).
* This is a multi-threads (OpenMP) program.
You can control the number of threads using the environment variable OMP_NUM_THREADS.
* Each line of the input files must be a pair of source and destination vertex IDs (unsigned 64bit number).
* The graph is constructed to /l/ssd/csr_graph_file
* As for real-world datasets, [SNAP Datasets](http://snap.stanford.edu/data/index.html) is a very popular in the graph processing community.
Please note that some datasets in SNAP are a little different.
For example, the first line is a comment; you have to delete the line before running this program.

## Run BFS
./run_bfs -n \[#of vertices\] -m \[#of edges\] -g /path/to/graph_file

* You can get #of vertices and #of edges by running ingest_edge_list
* You can get #of vertices and #of edges by running ingest_edge_list
* This is a multi-threads (OpenMP) program.
You can control the number of threads using the environment variable OMP_NUM_THREADS.
93 changes: 93 additions & 0 deletions tests/bfs/compute_degree_distribution.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
/*
This file is part of UMAP. For copyright information see the COPYRIGHT
file in the top level directory, or at
https://github.com/LLNL/umap/blob/master/COPYRIGHT
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU Lesser General Public License (as published by the Free
Software Foundation) version 2.1 dated February 1999. This program is
distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the IMPLIED WARRANTY OF MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE. See the terms and conditions of the GNU Lesser General Public License
for more details. You should have received a copy of the GNU Lesser General
Public License along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/

#include <iostream>
#include <fstream>
#include <string>
#include <unordered_map>
#include <vector>
#include <algorithm>

/// This is a utility program to compute a degree distribution
/// This program treat the input files as directed graph
/// Usage:
/// ./compute_degree_distribution [out file name] [edge list file names]
int main(int argc, char **argv) {
if (argc < 3) {
std::cerr << "Wrong number of arguments" << std::endl;
std::abort();
}

std::string out_file_name(argv[1]);

// -- Count degree -- //
std::unordered_map<uint64_t, uint64_t> degree_table;
for (int i = 2; i < argc; ++i) {
std::ifstream input_edge_list(argv[i]);
if (!input_edge_list.is_open()) {
std::cerr << "Cannot open " << argv[i] << std::endl;
continue;
}

uint64_t source;
uint64_t destination;
while (input_edge_list >> source >> destination) {
if (degree_table.count(source) == 0) {
degree_table[source] = 0;
}
++degree_table[source];
}
}

// -- Compute degree distribution table -- //
std::unordered_map<uint64_t, uint64_t> degree_dist_table;
for (const auto &item : degree_table) {
const uint64_t degree = item.second;
if (degree_dist_table.count(degree) == 0) {
degree_dist_table[degree] = 0;
}
++degree_dist_table[degree];
}

// -- Sort the degree distribution table -- //
std::vector<std::pair<uint64_t, uint64_t>> sorted_degree_dist_table;
for (const auto &item : degree_dist_table) {
const uint64_t degree = item.first;
const uint64_t count = item.second;
sorted_degree_dist_table.emplace_back(degree, count);
}
std::sort(sorted_degree_dist_table.begin(), sorted_degree_dist_table.end(),
[](const std::pair<uint64_t, uint64_t> &lh, const std::pair<uint64_t, uint64_t> &rh) {
return (lh.first < rh.first); // Sort in the ascending order of degree
});

// -- Dump the sorted degree distribution table -- //
std::ofstream ofs(out_file_name);
if (!ofs.is_open()) {
std::cerr << "Cannot open " << out_file_name << std::endl;
std::abort();
}
ofs << "Degree\tCount" << std::endl;
for (const auto &item : sorted_degree_dist_table) {
const uint64_t degree = item.first;
const uint64_t count = item.second;
ofs << degree << " " << count << "\n";
}
ofs.close();

std::cout << "Finished degree distribution computation" << std::endl;

return 0;
}
7 changes: 7 additions & 0 deletions tests/bfs/rmat_edge_generator/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS}")

add_executable(generate_edge_list generate_edge_list.cpp)
install(TARGETS generate_edge_list
LIBRARY DESTINATION lib
ARCHIVE DESTINATION lib/static
RUNTIME DESTINATION bin )
120 changes: 120 additions & 0 deletions tests/bfs/rmat_edge_generator/generate_edge_list.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
/*
This file is part of UMAP. For copyright information see the COPYRIGHT
file in the top level directory, or at
https://github.com/LLNL/umap/blob/master/COPYRIGHT
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU Lesser General Public License (as published by the Free
Software Foundation) version 2.1 dated February 1999. This program is
distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the IMPLIED WARRANTY OF MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE. See the terms and conditions of the GNU Lesser General Public License
for more details. You should have received a copy of the GNU Lesser General
Public License along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/

#include <unistd.h>
#include <iostream>
#include <fstream>
#include <string>
#include <vector>

#include "rmat_edge_generator.hpp"

// ---------------------------------------- //
// Option
// ---------------------------------------- //
struct rmat_option_t {
uint64_t seed{123};
uint64_t vertex_scale{17};
uint64_t edge_count{(1ULL << 17) * 16};
double a{0.57};
double b{0.19};
double c{0.19};
bool scramble_id{true};
bool generate_both_directions{true};
};

bool parse_options(int argc, char **argv, rmat_option_t *option, std::string *out_edge_list_file_name) {
int p;
while ((p = getopt(argc, argv, "o:s:v:e:a:b:c:r:u:")) != -1) {
switch (p) {
case 'o':*out_edge_list_file_name = optarg; // required
break;

case 's':option->seed = std::stoull(optarg);
break;

case 'v':option->vertex_scale = std::stoull(optarg);
break;

case 'e':option->edge_count = std::stoull(optarg);
break;

case 'a':option->a = std::stod(optarg);
break;

case 'b':option->b = std::stod(optarg);
break;

case 'c':option->c = std::stod(optarg);
break;

case 'r':option->scramble_id = static_cast<bool>(std::stoi(optarg));
break;

case 'u':option->generate_both_directions = static_cast<bool>(std::stoi(optarg));
break;

default:std::cerr << "Illegal option" << std::endl;
std::abort();
}
}

if (out_edge_list_file_name->empty()) {
std::cerr << "edge list file name (-o option) is required" << std::endl;
std::abort();
}

std::cout << "seed: " << option->seed
<< "\nvertex_scale: " << option->vertex_scale
<< "\nedge_count: " << option->edge_count
<< "\na: " << option->a
<< "\nb: " << option->b
<< "\nc: " << option->c
<< "\nscramble_id: " << static_cast<int>(option->scramble_id)
<< "\ngenerate_both_directions: " << static_cast<int>(option->generate_both_directions)
<< "\nout_edge_list_file_name: " << *out_edge_list_file_name << std::endl;

return true;
}

// ---------------------------------------- //
// Main
// ---------------------------------------- //
int main(int argc, char **argv) {

rmat_option_t rmat_option;
std::string out_edge_list_file_name;
parse_options(argc, argv, &rmat_option, &out_edge_list_file_name);

rmat_edge_generator rmat(rmat_option.seed, rmat_option.vertex_scale, rmat_option.edge_count,
rmat_option.a, rmat_option.b, rmat_option.c,
1.0 - (rmat_option.a + rmat_option.b + rmat_option.c),
rmat_option.scramble_id, rmat_option.generate_both_directions);

std::ofstream edge_list_file(out_edge_list_file_name);
if (!edge_list_file.is_open()) {
std::cerr << "Cannot open " << out_edge_list_file_name << std::endl;
std::abort();
}

for (auto edge : rmat) {
edge_list_file << edge.first << " " << edge.second << "\n";
}
edge_list_file.close();

std::cout << "Finished edge list generation" << std::endl;

return 0;
}
Loading