Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-2785] [gelly] implements fromCsvReader for gelly-scala #1205

Closed
wants to merge 1 commit into from

Conversation

vasia
Copy link
Contributor

@vasia vasia commented Oct 1, 2015

This is the last method missing from the Scala Graph implementation.
In order to be as close as possible to the Java implementation, I implemented one method where different options are given with optional parameters. This resulted into the method exceeding the allowed max number of parameters, as defined in the scala checkstyle. I decided to disable the checkstyle for this particular method, because most of the parameters are optional and are there mainly for completeness. In the common case, a user would only give the path and delimiters. If you think there's a better way to do this without having to disable the checkstyle, let me know!

* Creates a Graph with from a CSV file of vertices and a CSV file of edges
*
* @param pathVertices The file path containing the vertices.
* @param vertexValue Defines whether the vertices have associated values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we rename this parameter to sth like readVertices :Boolean and making it mandatory.
If readVertices is set, pathVertices must be specified. It also implies that there are vertex values (otherwise it would make no sense to read a vertex file, right?).

@fhueske
Copy link
Contributor

fhueske commented Oct 6, 2015

Hi, you might have noticed, I got a bit confused about the vertexValue parameter and how it affects the graph creation. I made a proposal inline to rename that parameter and making it mandatory. Let me know what you think. The documentation should discuss the vertexValue in more detail.

Otherwise the PR looks good.

@vasia
Copy link
Contributor Author

vasia commented Oct 6, 2015

Thanks so much for the review @fhueske!
I agree with your comments. I will rename the parameter to readVertices, make it mandatory, and better explain it on the docs. I also like the suggestion for edgeValue -> hasEdgeValues.

@chiwanpark
Copy link
Member

I'm worrying about the name of method fromCsvReader. It seems that the method should receive CsvReader object and read graph data from the reader object. Why don't we use other name such as fromCsvFile?

@vasia
Copy link
Contributor Author

vasia commented Oct 6, 2015

Hey @chiwanpark,
this is how the Java method works, i.e. creates a CsvReader object and returns the graph when calling the appropriate types methods. I only kept this name in Scala to be consistent with the Java API.

@chiwanpark
Copy link
Member

Okay, I understand it.

// This method exceeds the max allowed number of parameters -->
def fromCsvReader[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag,
EV: TypeInformation : ClassTag](
pathVertices: String = null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change the order of the parameters and move the most common once first? This will allow to specify some parameters without names. How about:

  • env (mandatory)
  • pathEdges (mandatory)
  • readVertices
  • pathVertices
  • hasEdgeValues
  • ... (the others in the current order)

@fhueske
Copy link
Contributor

fhueske commented Oct 7, 2015

Thanks for the fast update! Good to merge.
I added a comment about the parameter order that could be addressed if you agree.

@vasia
Copy link
Contributor Author

vasia commented Oct 7, 2015

Great, thanks! I will re-order the parameters and merge.

@vasia
Copy link
Contributor Author

vasia commented Oct 7, 2015

There was an error in RandomSamplerTest in one of the builds. I'll open a JIRA and merge this.

@asfgit asfgit closed this in 47b5cb7 Oct 7, 2015
lofifnc pushed a commit to lofifnc/flink that referenced this pull request Oct 8, 2015
@vasia vasia deleted the gelly-readCsv branch October 19, 2015 12:29
cfmcgrady pushed a commit to cfmcgrady/flink that referenced this pull request Oct 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants