Skip to content
FLEET: Butterfly Estimation from a Bipartite Graph Stream
C++ CMake
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
media
source
CMakeLists.txt
README.md
in.in

README.md

FLEET: Butterfly Estimation from a Bipartite Graph Stream

Paper

You may find our paper on ArXiv: https://pdfs.semanticscholar.org/0f05/3a75fbba33ec765faa723d258d9fe87f9813.pdf

Our paper is published in CIKM2019: https://dl.acm.org/citation.cfm?id=3357983

Please use the following BiBTeX for citing our paper:
@inproceedings{sanei2019fleet,
  title={FLEET: Butterfly Estimation from a Bipartite Graph Stream},
  author={Sanei-Mehri, Seyed-Vahid and Zhang, Yu and Sariy{\"u}ce, Ahmet Erdem and Tirthapura, Srikanta},
  booktitle={Proceedings of the 28th ACM International Conference on Information and Knowledge Management},
  pages={1201--1210},
  year={2019},
  organization={ACM}
}

Abstract

We consider space-efcient single-pass estimation of the number of butterflies, a fundamental bipartite graph motif, from a massive bipartite graph stream where each edge represents a connection between entities in two different partitions. We present a space lower bound for any streaming algorithm that can estimate the number of butterflies accurately, as well as FLEET, a suite of algorithms for accurately estimating the number of butterflies in the graph stream. Estimates returned by the algorithms come with provable guarantees on the approximation error, and experiments show good tradeoffs between the space used and the accuracy of approximation. We also present space-efcient algorithms for estimating the number of butterflies within a sliding window of the most recent elements in the stream. While there is a signifcant body of work on counting subgraphs such as triangles in a unipartite graph stream, our work seems to be one of the few to tackle the case of bipartite graph streams.

Four butterflies in the entire graph G.

Set up for processing a graph stream.

Compiling and Running the Code

To compile the code, make sure two prerequisite software already be installed on your computer:
  • Make tools: we use CMake for managing the build process, version ≥ 2.8.
  • C++ Compiler: g++ or visual studio C++, support C++11 standard.

Our code includes the file CMakeLists.txt for program build, please use the following steps to compile and build to generate executable program:

  • In the main folder of this repo, type “cmake .” (twice).
  • Run “make”, there will generate executable program named ”fleet”.
  • Run “./fleet [option]” to start, where option is the algorithm name. For example, use command “./fleet Fleet1” to run the Fleet1 algorithm. Please refer main.cpp for all the algorithm names.

Parameter Settings

Our program provides a user-friendly way to interactively input the parameters. After execute the “fleet” command, please check the printed messages to type in the parameters. The parameter γ is the sub-sampling probability – please refer to the paper for the impact of this parameter. A good value for this parameter is 0.75. The parameter M is the reservoir size – please refer to the experiments section of the paper to get an idea of the ranges of M that lead to good relative errors for the datasets that we have used.

For sequence-based sliding window, there is a parameter called “power of γ ”, which sets the sampling probability based on the ratio of the memory budget to the window size M/W . For example, when M/W is 5%, set this parameter to be 28, as 0.9 28 ≈ 0.05. Note that this code uses a default value of 0.9 for γ.

Our Previous Work on Butterfly Counting in Static Bipartite Graphs

You may find our paper on ArXiv: https://arxiv.org/pdf/1801.00338v4.pdf

This paper is published in KDD2018: https://dl.acm.org/citation.cfm?id=3220097

Please use the following BiBTeX for citing this paper:

@inproceedings{sanei2018butterfly,
  title={Butterfly Counting in Bipartite Networks},
  author={Sanei-Mehri, Seyed-Vahid and Sariyuce, Ahmet Erdem and Tirthapura, Srikanta},
  booktitle={Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  pages={2150--2159},
  year={2018},
  organization={ACM}
}

You can’t perform that action at this time.