New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Scala Pipeline API] Add scala logisticRegression api for spark pipeline #70
Conversation
Can an Admin verify this patch? |
thanks @Wenpei; @niketanpansare could you please take a look. |
Hi Wenpei, Thanks for your contribution. But, I believe your code does exactly the same thing as https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/api/ml/LogisticRegression.java. This class is fully compatible with Scala API and won't require separate jar. May be I am missing something. Is there any particular reason why you think we should create wrappers in Scala instead ? Thanks, Niketan. |
@niketanpansare So I propose a scala version example since some weakness for java version.
Wenpei. |
@Wenpei Btw, I am OK with having a Scala API into incubator-systemml if following conditions are met:
Thanks, Niketan. |
I'm in favor of replacing the existing Java APIs (LogisticRegression, MLContext, etc.) with Scala versions in order to alleviate the types of issues raised here that are inherent with user-facing APIs. We could simply place these Scala APIs in So, currently our To alleviate build issues, we can just integrate Scala compilation into Maven. A quick search shows that the scala-maven-plugin can do just this by simply adding a few lines to the pom.xml, as shown in 1. Eclipse integration appears to also be easy by simply adding a few more lines to the pom.xml, as show at the bottom of 2. Then, a simply This route would also be great in preparation for SYSTEMML-451, which aims add DML as an embedded Scala DSL. |
@dusenberrymw +1 on creating Scala API for LogisticRegression, but am not sure whether we need to convert MLContext to Scala. scala-maven-plugin solution looks good to resolve condition 1. The flag "cTestGoals" should ideally allow us to run junit tests 1. Wrt condition 2, I had bad experience with Scala IDE during initial prototyping of Spark backend, especially for mixed scala/java projects (eg: code completion/refactoring/references). This was about a year and half ago and may be it has got better. Before we make the switch, let's double check the IDE support on both windows/mac. Then @Wenpei can push this commit to src/main/scala. |
@niketanpansare Great! The goal for replacing MLContext as well would relate to the issue of default parameters. There are currently 18 variations of Also, I can't speak for Eclipse, but IntelliJ works great for Spark (which is now a mixed Scala/Java project) as well as SystemML. |
@niketanpansare @dusenberrymw
I am not very good at maven config / pom, so is anyone can help to review this. beside this, systemml has folder like "/systemml/src/test_suites/java", we may need change it to "/systemml/src/test/java", and add "systemml/src/test/scala" to follow plugin rule. Regards. |
@Wenpei Will look into this today. |
@Wenpei Thanks for updating the POM file.
|
@Wenpei Please do following things before making the switch:
|
Well, I don't have a strong opinion on the wrapper code. The only hard requirement is that it has to work nicely with eclipse - no matter what. |
@Wenpei I was not able to reproduce the second issue. |
@niketanpansare I change flag in pom.xml to I will
|
Thanks @Wenpei, I really appreciate the effort :) |
@niketanpansare I finish instruction, test suite. For Instruction, I just leave it there. But may @deroneriksson want add it to http://apache.github.io/incubator-systemml/ as a part of instruction for contributor, can address this in another pr. Regards. |
Jenkins test this. |
test this please. |
@Wenpei Please check https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/185/console and fix following errors:
|
I will rebase this pr will new. |
Thanks :) |
e036255
to
7a74d01
Compare
@niketanpansare I rebase pr with new. Please launch a test again. |
@akchinSTC test this please. |
@niketanpansare @mboehm7 @gweidner |
Thanks @gweidner for looking into it :) |
I have add eclipse version to @gweidner Can you paste a failed picture here? class not found or other error? |
Thank you @Wenpei for the information. When using the latest scala ide sdk (4.3.0 downloaded from http://scala-ide.org/download/current.html), I've observed two sets of errors: |
@gweidner I do some test today, for scala IDE and eclipse luna
For eclipse luna: If this works for you, I can add as tips to instruction doc |
Thank you WenPei for your time and tips. That did work except for Project-Clean... build. The configuration that worked best for me was Luna + Scala IDE 4.0 plugin + maven scala connector. That combination had no Eclipse build type issues shown in previous captures and full Project-Clean build was successful. Note my testing was on Windows only. |
@gweidner @deroneriksson @niketanpansare |
@gweidner @deroneriksson Do you have any more concerns for this PR ? @Wenpei Can you please send an email with the tips on dev mailing list with a link to this PR ? |
@niketanpansare @gweidner @mboehm7 |
Yes this works with Luna (my preferred IDE). I'm preparing a documentation PR with specific Eclipse version information and tips for cases where developers prefer not to include Scala |
@niketanpansare I send a mail to dev mailing list with PR and tips link. |
* under the License. | ||
*/ | ||
|
||
package org.apache.sysml.api.ml.scala |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, this should just be in the org.apache.sysml.api.ml
package, as it shouldn't matter to the end-user if the source was written in Scala.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It cause a build error since we have both java and scala Logisiticregression in same package.
So i add scala version to seperated folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, it might be better to move the Java version to a org.apache.sysml.api.ml.java
folder (or possibly remove it), and move this file to org.apache.sysml.api.ml
in order to encourage other Scala contributions to the API.
cc @niketanpansare Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to go ahead and merge this and then work out any additional updates in separate JIRAs/PRs since the conversation count is already 50+.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be fine as well. @Wenpei Can you go ahead and make a JIRA for this as well?
@Wenpei Awesome work here! I left a couple of comments, the later of which would be particularly interesting to explore (and may even be an item for a subsequent JIRA instead). Aside from that, this should definitely be able to be merged. |
Also, just to track, I just pulled this into IntelliJ, and there are no issues encountered thus far. |
Jenkins, test this please. |
Build passed https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/210/ Two jira issues submit This PR is ready for merge. |
Great, LGTM. I'll merge this today. |
I've opened SYSTEMML-580 to reference this PR, and I've set the Reporter and Assignee to you, @Wenpei. |
This adds a Scala version of the LogisiticRegression Spark ML pipeline API, as well as Scala build support for the project, effectively turning the project into a mixed Scala/Java project. Closes apache#70.
PR merged. Thanks for the hard work, @Wenpei! |
Also, to anyone interested, there are several follow-up JIRA issues attached to SYSTEMML-580. |
1. Main DML script (mice_linearReg.dml) 2. Java test file (BuiltinMiceLinearRegTest.java) 3. DML test script (mice_linearRegression.dml) Closes #70.
I wrote a scala ml pipeline wrapper for LogisticRegression Model as a example for scala user.
I thought those api (java, python, scala) should be put to separated jar, but just follow what it is this time.
Regards.
Wenpei.