Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29363][MLLIB] Make o.a.s.regression.Regressor public #26033

Closed
wants to merge 1 commit into from

Conversation

zero323
Copy link
Member

@zero323 zero323 commented Oct 5, 2019

What changes were proposed in this pull request?

  • Removal of private[ml] modifier from Regressor.
  • Marking Regressor as @DeveloperApi.

Why are the changes needed?

Consistency with the rest of ML API as described in the corresponding JIRA ticket.

Does this PR introduce any user-facing change?

Yes, as described above.

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented Oct 5, 2019

Test build #111805 has finished for PR 26033 at commit dadf1a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 5, 2019

Test build #111806 has finished for PR 26033 at commit 1f42e82.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, it looks reasonable because this makes Regressor and RegressionModel consistent in the same file.

cc @srowen , @viirya , @jiangxb1987

@dongjoon-hyun dongjoon-hyun added MLLIB and removed ML labels Oct 5, 2019
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29363][ML] Make o.a.s.regression.Regressor public [SPARK-29363][MLLIB] Make o.a.s.regression.Regressor public Oct 5, 2019
Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beside consistency, is there any benefit?

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I think this was an oversight.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Oct 6, 2019

Thank you for review, @srowen and @viirya .
Merged to master.

@viirya . This makes it as one of the official developer API at least.

@dongjoon-hyun
Copy link
Member

Since SPARK-29363 is marked as an Improvement, I merged this to master only, @zero323 .

@viirya
Copy link
Member

viirya commented Oct 6, 2019

Ok. Sounds good. This is not a bug fix. I think we only need merging to master.

@zero323
Copy link
Member Author

zero323 commented Oct 6, 2019

Since SPARK-29363 is marked as an Improvement, I merged this to master only, @zero323 .

Makes perfect sense @dongjoon-hyun, thank you!

@zero323
Copy link
Member Author

zero323 commented Oct 6, 2019

Beside consistency, is there any benefit?

@viirya I believe there is (I tried to make this point in the linked JIRA, as well as somewhat related SPARK-29212), though I guess I am not that good in pointing that out.

Regressor and similarly positioned traits have very little purely technical value - from the point of view of simple composition they can be simply replaced by Prediction / PredictionModel, however, they provide very useful information about meaning of the types, that extend them.

Let's imagine for a moment I want to build ensemble learners. Clearly one for regression problem will expect different type of partial learners than one for classification problems:

class EnsembleRegressor extends Estimator[EnsembleRegressor] with ... {
  def setModels(learners: Array[Regressor]: this.type
  ...
}

class EnsembleClassifier extends Estimator[EnsembleClassifier] with ...  {
  def setModels(learners: Array[Classifier]): this.type
  ...
}

So having these traits public, allows users to write more precise code, without hacks, like putting 3rd party code in o.a.s.

Obviously benefits rather limited, as dynamic nature of DataFrame leaks all over the place, but at the same time there is no harm giving users more precise tools to express their intentions.

@zero323 zero323 deleted the SPARK-29363 branch October 6, 2019 08:26
@zero323
Copy link
Member Author

zero323 commented Oct 6, 2019

Thank you @dongjoon-hyun, @srowen, @viirya.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants