-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20249][ML][PYSPARK] Add summary for LinearSVCModel #17586
Conversation
I didn't add metrics like roc for this summary yet, I can add it if it is necessary. |
Test build #75645 has started for PR 17586 at commit |
Test build #75648 has finished for PR 17586 at commit
|
@hhbyyh @jkbradley Please help review. |
Test build #75655 has finished for PR 17586 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Not sure whether this will get merged before 2.2 release. Better check with @jkbradley
@@ -287,6 +290,16 @@ class LinearSVCModel private[classification] ( | |||
@Since("2.2.0") | |||
def setWeightCol(value: Double): this.type = set(threshold, value) | |||
|
|||
private var trainingSummary: LinearSVCTrainingSummary = _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use Option[] here to be consistent with other implementation.
// Obtain the objective per iteration. | ||
val objectiveHistory = trainingSummary.objectiveHistory | ||
println("objectiveHistory:") | ||
objectiveHistory.foreach(loss => println(loss)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also update Java example
* Abstraction for Linear SVC Training results. | ||
* Currently, the training summary ignores the training weights except | ||
* for the objective trace. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we didn't include the weight column in the summary. Maybe add an option[String] ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weight column also is not included in LogisticRegressionTrainingSummary
, should I add that as well ?
@@ -355,6 +368,19 @@ object LinearSVCModel extends MLReadable[LinearSVCModel] { | |||
} | |||
|
|||
/** | |||
* Abstraction for Linear SVC Training results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to add the abstraction here.
* Currently, the training summary ignores the training weights except | ||
* for the objective trace. | ||
*/ | ||
case class LinearSVCTrainingSummary( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I would put the summary related code at the bottom of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The classes below LinearSVCTrainingSummary
are private classes, so I think it would be better to keep LinearSVCTrainingSummary
there (above the private classes)
this | ||
} | ||
|
||
def summary: LinearSVCTrainingSummary = trainingSummary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, and maybe some comments like in python
@@ -172,6 +172,47 @@ def intercept(self): | |||
""" | |||
return self._call_java("intercept") | |||
|
|||
@property | |||
@since("2.2.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this targeting 2.2 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
Test build #75718 has finished for PR 17586 at commit
|
Test build #75721 has finished for PR 17586 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally looks fine. But I think we probably would need to add some unit tests.
*/ | ||
@Since("2.2.0") | ||
def summary: LinearSVCTrainingSummary = trainingSummary.getOrElse( | ||
throw new SparkException("No training summary available for this LinearSVCModel") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know other places has the same code here. But I don't think throwing an exception here is proper. If summary is not defined, we should simply return none, or keep this field as Option[LinearSVCTrainingSummary] to make that expected by the users. Need to confirm with committers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be marked Since 3.0.0 too.
python/pyspark/ml/classification.py
Outdated
""" | ||
.. note:: Experimental | ||
|
||
Abstraction for LinearSVC Training results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the abstraction here.
Test build #75801 has finished for PR 17586 at commit
|
This won't be in 2.2 just given time frames. So should we expand the summary to incorporate stats similar to the other classification summaries such as LogReg? I should think we can try to re-use and consolidate the code between them? |
@MLnick There was some discussion here and also on the JIRA for that pr. We definitely want to design it carefully so it's easy to share code. Also see: https://gist.github.com/sethah/83c57fd77385979579cb44f3d5730e67 |
Thanks @sethah. It seems #15435 doesn't quite yet implement a generic trait hierarchy for summaries that could be used here. I'd say this PR should ideally wait for that trait hierarchy to be properly defined and implemented, and it should then be pretty straightforward to do the summary for |
What changes were proposed in this pull request?
Add summary for
LinearSVCModel
so that user can get the training process status, such as loss value of each iteration and total iteration number.How was this patch tested?
Tested manually in both spark example and pyspark example.