Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15605] [ML] [Examples] Fix broken ML JavaDeveloperApiExample. #13353

Closed
wants to merge 2 commits into from

Conversation

yanboliang
Copy link
Contributor

@yanboliang yanboliang commented May 27, 2016

What changes were proposed in this pull request?

See SPARK-15605 for the detail of this bug. This PR fix 2 major bugs in this example:

  • The java example class use Param maxIter, it will fail when calling Param.shouldOwn. We need add a public method which return the maxIter Object. Because Params.params use java reflection to list all public method whose return type is Param, and invoke them to get all defined param objects in the instance.
  • The uid member defined in Java class will be initialized after Scala traits such as HasFeaturesCol. So when HasFeaturesCol being constructed, they get uid with null, which will cause Param.shouldOwn check fail.

so, here is my changes:

  • Add public method:
    public IntParam getMaxIterParam() {return maxIter;}
  • Use Java anonymous class overriding uid() to defined the uid, and it solve the second problem described above.
  • To make the getMaxIterParam can be invoked using java reflection, we must make the two class (MyJavaLogisticRegression and MyJavaLogisticRegressionModel) public. so I make them become inner public static class.

How was this patch tested?

Offline tests.

@SparkQA
Copy link

SparkQA commented May 27, 2016

Test build #59478 has finished for PR 13353 at commit 742cc12.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

cc @mengxr @jkbradley

@jkbradley
Copy link
Member

I don't think the answer is to remove it; the example used to work. We should identify which PR broke it and try to fix it. I'd guess it's some change to Params or another abstraction.

@yanboliang yanboliang changed the title [SPARK-15605] [ML] [Examples] Remove JavaDeveloperApiExample [SPARK-15605] [ML] [Examples] Fix broken ML JavaDeveloperApiExample. Jun 1, 2016
@SparkQA
Copy link

SparkQA commented Jun 1, 2016

Test build #59736 has finished for PR 13353 at commit 92317a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • // where index i corresponds to class i (i = 0, 1).

@yanboliang
Copy link
Contributor Author

@jkbradley Yes, I have found another way to fix this issue. Please let me know whether this fix is ok to you.

* Param for max number of iterations
* <p>
* NOTE: The usual way to add a parameter to a model or algorithm is to include:
* - val myParamName: ParamType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda confusing for a Java example having Scala code, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @jkbradley , we will remove this example for 2.0. I will address this comments when we add this example back.

@jkbradley
Copy link
Member

jkbradley commented Jun 1, 2016

Thanks for exploring this! I had hoped we could figure out how to fix the abstractions. However, if it is not feasible in a short time, then maybe we should remove the example and add it back in later once we can fix the abstractions. I'll spend some more time trying to track down the change in the abstractions which broke things.

Update: I guess it's that params never worked, but shouldOwn was introduced and was called from set, thereby breaking set. After more exploring, this API is really broken for Java. I'd be OK with removing this example for now and targeting a fix for 2.1.

What do you think?

@yanboliang
Copy link
Contributor Author

yanboliang commented Jun 2, 2016

@jkbradley I found the main problem is that the uid defined in Java class will be initialized after Scala traits such as HasFeaturesCol. So when HasFeaturesCol being constructed, they get uid with null, which will cause Param.shouldOwn check fail. I think moving the uid as constructor argument of Classifier will solve this problem. But it involves public API changes for Classifier, Regressor, ProbabilisticClassifier, etc and should be further defined, I agree to remove this for now.
I will track this issue continuous and send my proposal of fix to JIRA for discussion. I think we can find an appropriate way to fix it at 2.1 and adding this example back.

@yanboliang
Copy link
Contributor Author

yanboliang commented Jun 2, 2016

@jkbradley @MLnick I removed this example now and it's ready to be get in. I think we can left the JIRA open and target to 2.1. Please feel free to assign it to me for further work, I have some ideas in my mind.

@SparkQA
Copy link

SparkQA commented Jun 2, 2016

Test build #59844 has finished for PR 13353 at commit b6e5f28.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jun 2, 2016

LGTM @yanboliang do you want to try committing this?

@yanboliang
Copy link
Contributor Author

@srowen Please help to merge it. I have not set up commit environment completely and I will try to get it work in a few days. Thanks!

@srowen
Copy link
Member

srowen commented Jun 2, 2016

Merged to master/2.0 but the JIRA is not closed

asfgit pushed a commit that referenced this pull request Jun 2, 2016
## What changes were proposed in this pull request?
See [SPARK-15605](https://issues.apache.org/jira/browse/SPARK-15605) for the detail of this bug. This PR fix 2 major bugs in this example:
* The java example class use Param ```maxIter```, it will fail when calling ```Param.shouldOwn```. We need add a public method which return the ```maxIter``` Object. Because ```Params.params``` use java reflection to list all public method whose return type is ```Param```, and invoke them to get all defined param objects in the instance.
* The ```uid``` member defined in Java class will be initialized after Scala traits such as ```HasFeaturesCol```. So when ```HasFeaturesCol``` being constructed, they get ```uid``` with null, which will cause ```Param.shouldOwn``` check fail.

so, here is my changes:
* Add public method:
```public IntParam getMaxIterParam() {return maxIter;}```

* Use Java anonymous class overriding ```uid()``` to defined the ```uid```, and it solve the second problem described above.
* To make the ```getMaxIterParam ``` can be invoked using java reflection, we must make the two class (MyJavaLogisticRegression and MyJavaLogisticRegressionModel) public. so I make them become inner public static class.

## How was this patch tested?
Offline tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #13353 from yanboliang/spark-15605.

(cherry picked from commit 4fe7c7b)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@asfgit asfgit closed this in 4fe7c7b Jun 2, 2016
@jkbradley
Copy link
Member

@yanboliang Thanks for tracking down the issue. With your suggested fix, we could leave the existing no-argument constructors for compatibility but deprecate them. Since few people have complained about this so far, I'd be OK with fixing it now or in 2.1.

@yanboliang yanboliang deleted the spark-15605 branch June 3, 2016 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants