Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20249][ML][PYSPARK] Add summary for LinearSVCModel #17586

Closed
wants to merge 2 commits into from

Conversation

zjffdu
Copy link
Contributor

@zjffdu zjffdu commented Apr 10, 2017

What changes were proposed in this pull request?

Add summary for LinearSVCModel so that user can get the training process status, such as loss value of each iteration and total iteration number.

How was this patch tested?

Tested manually in both spark example and pyspark example.

@zjffdu
Copy link
Contributor Author

zjffdu commented Apr 10, 2017

I didn't add metrics like roc for this summary yet, I can add it if it is necessary.

@SparkQA
Copy link

SparkQA commented Apr 10, 2017

Test build #75645 has started for PR 17586 at commit 368af49.

@SparkQA
Copy link

SparkQA commented Apr 10, 2017

Test build #75648 has finished for PR 17586 at commit d44bec1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class LinearSVCTrainingSummary(
  • class LinearSVCTrainingSummary(JavaWrapper):

@zjffdu
Copy link
Contributor Author

zjffdu commented Apr 10, 2017

@hhbyyh @jkbradley Please help review.

@SparkQA
Copy link

SparkQA commented Apr 10, 2017

Test build #75655 has finished for PR 17586 at commit 0daa041.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class LinearSVCTrainingSummary(
  • class LinearSVCTrainingSummary(JavaWrapper):

Copy link
Contributor

@hhbyyh hhbyyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Not sure whether this will get merged before 2.2 release. Better check with @jkbradley

@@ -287,6 +290,16 @@ class LinearSVCModel private[classification] (
@Since("2.2.0")
def setWeightCol(value: Double): this.type = set(threshold, value)

private var trainingSummary: LinearSVCTrainingSummary = _
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use Option[] here to be consistent with other implementation.

// Obtain the objective per iteration.
val objectiveHistory = trainingSummary.objectiveHistory
println("objectiveHistory:")
objectiveHistory.foreach(loss => println(loss))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update Java example

* Abstraction for Linear SVC Training results.
* Currently, the training summary ignores the training weights except
* for the objective trace.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we didn't include the weight column in the summary. Maybe add an option[String] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weight column also is not included in LogisticRegressionTrainingSummary, should I add that as well ?

@@ -355,6 +368,19 @@ object LinearSVCModel extends MLReadable[LinearSVCModel] {
}

/**
* Abstraction for Linear SVC Training results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to add the abstraction here.

* Currently, the training summary ignores the training weights except
* for the objective trace.
*/
case class LinearSVCTrainingSummary(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I would put the summary related code at the bottom of this file.

Copy link
Contributor Author

@zjffdu zjffdu Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classes below LinearSVCTrainingSummary are private classes, so I think it would be better to keep LinearSVCTrainingSummary there (above the private classes)

this
}

def summary: LinearSVCTrainingSummary = trainingSummary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since, and maybe some comments like in python

@@ -172,6 +172,47 @@ def intercept(self):
"""
return self._call_java("intercept")

@property
@since("2.2.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this targeting 2.2 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@SparkQA
Copy link

SparkQA commented Apr 12, 2017

Test build #75718 has finished for PR 17586 at commit b34603c.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 12, 2017

Test build #75721 has finished for PR 17586 at commit 6d1e5fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@hhbyyh hhbyyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks fine. But I think we probably would need to add some unit tests.

*/
@Since("2.2.0")
def summary: LinearSVCTrainingSummary = trainingSummary.getOrElse(
throw new SparkException("No training summary available for this LinearSVCModel")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know other places has the same code here. But I don't think throwing an exception here is proper. If summary is not defined, we should simply return none, or keep this field as Option[LinearSVCTrainingSummary] to make that expected by the users. Need to confirm with committers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be marked Since 3.0.0 too.

"""
.. note:: Experimental

Abstraction for LinearSVC Training results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the abstraction here.

@SparkQA
Copy link

SparkQA commented Apr 14, 2017

Test build #75801 has finished for PR 17586 at commit 7f20fc8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MLnick
Copy link
Contributor

MLnick commented May 16, 2017

This won't be in 2.2 just given time frames. So should we expand the summary to incorporate stats similar to the other classification summaries such as LogReg? I should think we can try to re-use and consolidate the code between them?

@sethah
Copy link
Contributor

sethah commented May 16, 2017

@MLnick There was some discussion here and also on the JIRA for that pr. We definitely want to design it carefully so it's easy to share code.

Also see: https://gist.github.com/sethah/83c57fd77385979579cb44f3d5730e67

@MLnick
Copy link
Contributor

MLnick commented May 17, 2017

Thanks @sethah. It seems #15435 doesn't quite yet implement a generic trait hierarchy for summaries that could be used here. I'd say this PR should ideally wait for that trait hierarchy to be properly defined and implemented, and it should then be pretty straightforward to do the summary for LinearSVC based off that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants