Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(GraphX): better partitioning strategies #38

Open
hucheng opened this issue Aug 26, 2015 · 3 comments
Open

(GraphX): better partitioning strategies #38

hucheng opened this issue Aug 26, 2015 · 3 comments

Comments

@hucheng
Copy link
Contributor

hucheng commented Aug 26, 2015

There are four partitioning strategies in GraphX:

  1. random hash
  2. edge1D (src or dst)
  3. edgePartition2D

Besides, we also implemented:

  1. DBH (Degree-Based Hashing)
  2. balanced label propagation from Facebook (http://stanford.edu/~jugander/papers/wsdm13-blp.pdf and https://code.facebook.com/posts/274771932683700/large-scale-graph-partitioning-with-apache-giraph/)
  3. Bounded and Balanced Partitioner (two stages, edges belongs to vertex partition that has larger degree, and a re-balanced partitioner, details later. )
@bhoppi
Copy link
Contributor

bhoppi commented Sep 11, 2015

BBR has been tested. Experimental result suggests that BBR is slower than DBH when both partitioners work, but when the topic number is large or it has too many partitions, DBH can't work while BBR works well.
Label propagation partitioner has basically close performance with BBR, but it has two weaknesses: 1. It is an iterative algorithm so it need re-partition again and again so it is slow; 2. It needs transfer a P * P matrix in each iteration which P is the partition number. When P is large, the rdd.collect() call can be very slow and cause the driver OOM.

@hucheng
Copy link
Contributor Author

hucheng commented Sep 12, 2015

Good.

@benmccann
Copy link
Contributor

Can you make these part of PartitionStrategy instead of implementing in a separate package? Seems like they belong in the graphx2 package...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants