[FLINK-2634] [gelly] Added a vertex-centric Triangle Count Lib Method #1105

andralungu · 2015-09-08T12:19:52Z

This PR adds a vertex centric version of the TriangleCount algorithm as discussed in #1054.

The main difference is that the reduceOn* calls have been replaced with groupReduceOn* calls.

Depending on the input graph, one may need to use the GAS version (skewed networks) or the vertex-centric version.

…thod

fhueske · 2015-09-11T13:37:35Z

flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/TriangleCount.java

+ * Tuple1 which contains a single integer representing the number of triangles.
+ */
+public class TriangleCount implements
+		GraphAlgorithm<Long, NullValue, NullValue, DataSet<Tuple1<Integer>>> {


Why not return Integer instead of Tuple1<Integer>?

Why is the algorithm typed to Long keys. Shouldn't it work with any key type?
Couldn't we also allow any vertex and edge value type and replace them internally with NullValue?

For the Tuple1 comment: I wanted to be consistent with the other examples. If you run it on a cluster, you'd like the result printed in a file. You could print it with regular Java functions, but writeAsCsv and print were more appealing (to me at least) :)

For the key type: At some point, you check that your id is lower than the neighbors' ids... How would that comparison be made if you don't know the type. If let's say you received String, you'd need to do Long.parseLong.... If you know a way, I'd gladly change it, but I don't think this is the only library method that has the key type problem... Even ConnectedComponents uses Long. Maybe we can open a JIRA for that...

The key must implement the org.apache.flink.types.Key interface. Otherwise, you could not group, join, sort on this type. Key implements Comparable, so you can simply do key1.compareTo(key2).

DataSet<Integer>.writeAsText() prints the result of Integer.toString(). print() does also work for Integer.

fhueske · 2015-09-11T13:53:33Z

I did not go through the code in detail. Maybe I am missing something, but isn't this algorithm doing pretty much the same thing as the example we have already for the DataSet API examples (Java + Scala). In the existing examples (EnumTrianglesOpt) 3 reduce and 1 join to enumerate all triangles. We would need one more reduce to compute the number of triangles which makes 4 reduce and 1 join.
In your code you are using 5 reduce and 2 joins if I counted correctly.

Are those two algorithms doing different things or could you basically port the existing code to Gelly?

andralungu · 2015-09-11T14:43:49Z

The only argument I have for this particular code (as opposed to the one in the DataSet API) is that his one simulates what Pregel is doing... But since it's a library method, it should be used as is, without caring about the model. IMO, the discussion in #1054 asked for this particular version, but if there is a consensus I can change it to use the more efficient one... or even close the PR and point people to the DataSet API...

fhueske · 2015-09-11T14:49:23Z

I think having TriangleCounting (and TriangleEnumeration as well) would be very good additions for Gelly. As such they should be implemented as efficient as possible. As I said, I did not look into the code in detail so I do not exactly know what is happening inside, but the DataSet example is pretty straightforward and quite efficient AFAIK.

andralungu · 2015-09-12T21:16:02Z

If it's okay with you, I'd like to see what @vasia has to say about adding the Triangle Count example from the DataSet API + a reduce as a library method. IMO, it's a better addition, but for some reason, we preferred the Pregel/GSA implementations at a certain point (it was because in my thesis, I take Pregel as a baseline).

Also the generic K instead of Long makes perfect sense. However, if we decide to change it, I'll have to open a JIRA to revise the entire set of library methods because apart from PageRank, I think they all restrict themselves to one common key type. I would be kind of scared of type erasure in the K implements Key case :-S

fhueske · 2015-09-13T08:18:31Z

Sure, always good to get more input :-)
Btw. the DataSet implementation is a Flink port of a MapReduce algorithm described in this article: http://lintool.github.io/UMD-courses/bigdata-2015-Spring/content/Cohen_2009.pdf

vasia · 2015-09-14T04:21:29Z

Hi,

I agree with @fhueske that we should consider porting the existing DataSet example to the Gelly library. The algorithm is a bit different and it's not very clear (at least to me) which one would be the most efficient. The DataSet implementation applies the optimization that grouping happens on vertices with low degrees. The one we currently have in Gelly (and this one) applies the optimization that the vertex with the lowest ID will be the one detecting the triangle.

@andralungu, we have chosen these implementations for your thesis in order to test a specific method and prove a point. This doesn't necessarily mean that this is the most efficient way to implement triangle counting. Ideally, we should analyze the complexity of each implementation and run experiments to determine which one is best to add to the library.

Also, we should definitely support any Key type and value types and we should update existing library methods that assume a particular type for no good reason.

fhueske · 2015-09-14T08:29:38Z

As I said, I did not look at the actual implementation of the PR besides counting expensive operators.
Since we have implementations for both, we could simply run a few benchmarks to get an idea of the trade-offs. Of course these will be just a few points in the whole space but maybe better than nothing...

andralungu · 2015-09-14T16:25:34Z

Hey you two :)

I suggest that we add this one.
Let the JIRA that @vasia opened handle the keys.
Then add the DataSet example (another JIRA) and a JIRA for benchmarks.

I think that would more or less make everyone happy.

fhueske · 2015-09-14T16:36:52Z

Fair enough, we can stick with your implementation. I was only guessing about the performance implications. I did not check the code in detail though.

vasia · 2015-09-14T16:42:57Z

Sounds good to me! A benchmark would be great, since I can also only guess about the performance ;)

andralungu · 2015-09-20T07:57:32Z

I've created the two JIRA issues: FLINK-2714 and FLINK-2715.

@vasia, if no other objections, can you have another look at this and merge it?

vasia · 2015-09-20T09:53:28Z

flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/TriangleCount.java

+ * This algorithm operates in three phases. First, vertices select neighbors with id greater than theirs
+ * and send messages to them. Each received message is then propagated to neighbors with higher id.
+ * Finally, if a node encounters the target id in the list of received messages, it increments the number
+ * of triangles found.


Can we add a link to the paper (or some other resource) where this algorithm is described?

vasia · 2015-09-20T10:01:54Z

Hey @andralungu,

I left some minor comments. Apart from those, I think we should also address the previous comments on (a) returning an Integer instead of Tuple1, (b) generic vertex keys and (c) allowing any vertex and edge value type and replace them internally with NullValue.
I know that other library methods also have these problems (and there's a JIRA), but this PR is open, so we can at least fix them for this one.

andralungu · 2015-09-23T14:47:52Z

Hi,

I agree with the issues raised in this discussion. They most definitely should be adressed, but unfortunately, I cannot do it. I am handling many bureaucratic tasks these days and the next months will be a bit hectic.

This being said, can someone please pick the issue up? I've never had to do this before, so I don't know how it works. Do I close the PR or leave it open and hope that someone will continue on it?

Thanks!

vasia · 2015-09-23T15:14:00Z

Hi @andralungu,

if you're not planning to continue on this PR, then I believe you should close it. However, you might want to keep your local branch, in case someone else wants to pick up where you left.

This is not an urgent issue, it's a nice-to-have feature, so you can come back to it later :)
If you think that it might take too much time, then it might also be a good idea to un-assign the JIRA.

Thanks!

[FLINK-2634] [gelly] Added a vertex-centric Triangle Count Library Me…

6f24c69

…thod

fhueske reviewed Sep 11, 2015
View reviewed changes

vasia reviewed Sep 20, 2015
View reviewed changes

andralungu closed this Sep 23, 2015

andralungu mentioned this pull request Oct 22, 2015

[FLINK-2634] [gelly] [WIP] Vertex Centric Triangle Count #1291

Closed

rmetzger added the component=Library/GraphProcessing(Gelly) label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-2634] [gelly] Added a vertex-centric Triangle Count Lib Method #1105

[FLINK-2634] [gelly] Added a vertex-centric Triangle Count Lib Method #1105

andralungu commented Sep 8, 2015

fhueske Sep 11, 2015

fhueske Sep 11, 2015

andralungu Sep 11, 2015

fhueske Sep 11, 2015

fhueske Sep 11, 2015

fhueske commented Sep 11, 2015

andralungu commented Sep 11, 2015

fhueske commented Sep 11, 2015

andralungu commented Sep 12, 2015

fhueske commented Sep 13, 2015

vasia commented Sep 14, 2015

fhueske commented Sep 14, 2015

andralungu commented Sep 14, 2015

fhueske commented Sep 14, 2015

vasia commented Sep 14, 2015

andralungu commented Sep 20, 2015

vasia Sep 20, 2015

vasia commented Sep 20, 2015

andralungu commented Sep 23, 2015

vasia commented Sep 23, 2015

[FLINK-2634] [gelly] Added a vertex-centric Triangle Count Lib Method #1105

[FLINK-2634] [gelly] Added a vertex-centric Triangle Count Lib Method #1105

Conversation

andralungu commented Sep 8, 2015

fhueske Sep 11, 2015

Choose a reason for hiding this comment

fhueske Sep 11, 2015

Choose a reason for hiding this comment

andralungu Sep 11, 2015

Choose a reason for hiding this comment

fhueske Sep 11, 2015

Choose a reason for hiding this comment

fhueske Sep 11, 2015

Choose a reason for hiding this comment

fhueske commented Sep 11, 2015

andralungu commented Sep 11, 2015

fhueske commented Sep 11, 2015

andralungu commented Sep 12, 2015

fhueske commented Sep 13, 2015

vasia commented Sep 14, 2015

fhueske commented Sep 14, 2015

andralungu commented Sep 14, 2015

fhueske commented Sep 14, 2015

vasia commented Sep 14, 2015

andralungu commented Sep 20, 2015

vasia Sep 20, 2015

Choose a reason for hiding this comment

vasia commented Sep 20, 2015

andralungu commented Sep 23, 2015

vasia commented Sep 23, 2015