-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-5104] Bipartite graph validation #2985
Conversation
I noticed that public Boolean validate(GraphValidator<K, VV, EV> validator) throws Exception {
return validator.validate(this);
} What is the reason for returning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mushketyk, with a few comments for your review. I would change Graph.validate
to return boolean unboxed as we aren't promissing ABI compatibility and the use of Boolean
is vestigial from an old implementation which returned DataSet<Boolean>
.
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.flink.graph.validation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to org.apache.flink.graph.bipartite.validation
?
DataSet<KT> invalidTopIds = invalidIds(bipartiteGraph.getTopVertices(), edgesTopIds); | ||
DataSet<KB> invalidBottomIds = invalidIds(bipartiteGraph.getBottomVertices(), edgesBottomIds); | ||
|
||
return invalidTopIds.count() == 0 && invalidBottomIds.count() == 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count()
executes the program so shouldn't be called twice. Could use an accumulator in RichOutputFormat
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these two datasets have different types I couldn't figure out how to use accumulators and RichOutputFormat
to run collect()
only once, but I've replaced it with two outer joins and one count()
call.
Is it a feasible approach?
} | ||
|
||
@ForwardedFields("f0") | ||
private class GetTopIdsMap<KT, KB, EV> implements MapFunction<BipartiteEdge<KT,KB,EV>, Tuple1<KT>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inner Flink functions should be static
to prevent serialization of the outer class.
|
||
@Override | ||
public boolean validate(BipartiteGraph<KT, KB, VVT, VVB, EV> bipartiteGraph) throws Exception { | ||
DataSet<Tuple1<KT>> edgesTopIds = bipartiteGraph.getEdges().map(new GetTopIdsMap<KT, KB, EV>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can GetTopIdsMap
be replaced with .<Tuple1<KT>>project(0)
?
Hi @greghogan , Thank you for your feedback. I've updated the PR accordingly. The only thing that I did differently is that I've replaced projections and two |
@mushketyk the build is failing due to unused imports (IntelliJ can be configured to automatically remove these). There are so many ways to implement the validator. Projections would reduce the data to be transmitted and sorted. I like to document when we are performing a different kind of join (here, an anti-join) for the day when these are available in Flink. I don't know how counting in the We'll need to rebase this PR once FLINK-5311 has been committed. |
Sorry for the failing build. I've removed the unused imports so it should be fine now. |
@greghogan Thank you for the review. |
@mushketyk I had meant just with a simple comment noting that with an anti-join we could also remove the FlatJoinFunction. |
87ee795
to
eea815c
Compare
Rebased this PR on top of |
I'm closing this as "Abandoned", since there is no more activity and the code base has moved on quite a bit. Please re-open this if you feel otherwise and work should continue. |
Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the How To Contribute guide.
In addition to going through the list, please provide a meaningful description of your changes.
General
Documentation
Tests & Build
mvn clean verify
has been executed successfully locally or a Travis build has passedThis PR duplicates some the user documentation for the graph added in #2984 but I think this won't be an issue when documentation PR is pushed to the
master
.