-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-3768] [gelly] Clustering Coefficient #1896
Conversation
aa1141f
to
6cfd98f
Compare
Hi @greghogan, |
It's certainly nicer to review small PRs, but I also like to present the big picture and context in which the features will be used. What if I leave this here, break out the dependencies into new PRs, and then rebase this PR down if those features are accepted? I'm not quite sure what a design document would cover. There's not much sophisticated here as with SG or GSA. |
|
||
import java.text.NumberFormat; | ||
|
||
public class TriangleListing { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add JavaDoc to ALL new Java or Scala classes added to source? Explanation why and how it is interact with other code is something we are looking for. Thanks.
Ideally, each PR should make a single change and reference a single JIRA issue. If you could break this into smaller changes, that'd be great. Otherwise, it would really help if you could give a more detailed description of changes and additions. Thanks! |
9489061
to
306390e
Compare
I have rebased this PR against the newly committed dependent features so it should be good for discussion. |
Please address my comments about adding Javadoc header info to ALL new Java class added to source repo. I understand some of the classes are just for examples, but would be good to also have Javdoc explaining why the examples are added. |
|
||
import java.text.NumberFormat; | ||
|
||
public class LocalClusteringCoefficient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is just an example, but would be nice to add Javadoc information to give high level information why this class exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
8a58b41
to
3c31601
Compare
Thanks for the docs update. |
If there are no further review comments I'll look at merging this today. |
3c31601
to
1fdb056
Compare
Hi @greghogan, |
The local clustering coefficient measures the connectedness of each vertex's neighborhood. Scores range from 0.0 (no edges between neighbors) to 1.0 (neighborhood is a clique).
a76bb32
to
453e642
Compare
453e642
to
dd42d7b
Compare
The local clustering coefficient measures the connectedness of each vertex's neighborhood. Scores range from 0.0 (no edges between neighbors) to 1.0 (neighborhood is a clique). This closes apache#1896
Provides an algorithm for local clustering coefficient and dependent functions for degree annotation, algorithm caching, and graph translation.
I worked to improve the performance of
TriangleEnumerator
. Perhaps the API has changed sinceEdge.reverse()
is not in-place and the edges were not being sorted by degree. TheJoinHint
is also important so that theTriad
s are not spilled to disk.On an AWS ec2.4xlarge (16 vcores, 30 GiB) I am seeing for the following timings of 5s, 29s, and 183s for
TriangleListing
. WithTriangleEnumerator
the timings are 7s, 45s, and 281s. Without theJoinHint
the latterTriangleEnumerator
timings are 58s and 347s.The command I had used to run the tests: