[SPARK-29095][ML] add extractInstances#25802
[SPARK-29095][ML] add extractInstances#25802zhengruifeng wants to merge 4 commits intoapache:masterfrom
Conversation
| } | ||
|
|
||
| /** | ||
| * Extract [[labelCol]], weightCol(if any) and [[featuresCol]] from the given dataset, |
There was a problem hiding this comment.
I place it in PredictorParam so that methods like GBTModel.evaluateEachIteration can reuse it in the future.
|
Test build #110632 has finished for PR 25802 at commit
|
|
retest this please |
|
Test build #110717 has finished for PR 25802 at commit
|
|
Test build #110721 has finished for PR 25802 at commit
|
|
Test build #110743 has finished for PR 25802 at commit
|
|
retest this please |
|
Test build #110748 has finished for PR 25802 at commit
|
|
friendly ping @srowen |
srowen
left a comment
There was a problem hiding this comment.
As mostly refactoring, seems OK. One question below.
| } else { | ||
| lit(1.0) | ||
| } | ||
| case _ => lit(1.0) |
There was a problem hiding this comment.
If it doesn't have a weight column, does it mean there's no point in selecting lit(1.0) as a weight column as it will be unused? or do some algorithms not have a weight column but nevertheless have ways of using a weight?
There was a problem hiding this comment.
You are right, if an alg do not have weightCol, it should not deal with weighting.
So, what about raising an exception instead of assign it to 1.0?
There was a problem hiding this comment.
I think an error will occur elsewhere? is it necessary to handle it here vs just not making an empty col?
There was a problem hiding this comment.
Since this method will only be called internally, so I think it is update to the developers to decide whether to use it or not. If an algorithm (like GBT) do not support weighting now, it can use existing extractLabeledPoints instead.
There was a problem hiding this comment.
OK, I suppose I'm just concerned about the small overhead of adding an unused column.
You're saying that it's up to implementations to call the method they need, one with weights or not? yeah I agree, and they will call the right method in this change? If true, then do you even need this check? it will already fail (correctly) if the code is calling the wrong method.
There was a problem hiding this comment.
OK, I will remove this line. Impls that do not support weighting call this method should fail.
|
Test build #111212 has finished for PR 25802 at commit
|
| if (isDefined(p.weightCol) && $(p.weightCol).nonEmpty) { | ||
| col($(p.weightCol)).cast(DoubleType) | ||
| } else { | ||
| lit(1.0) |
There was a problem hiding this comment.
Here too do you need a weight col, if the implementation doesn't support it (and shouldn't be calling this method)? or is it different?
There was a problem hiding this comment.
It is different from the above place. Even if a ML impl supports weighting, its weightCol is not necessary to be set, in this case, lit(1) is used implictly. Current all algs supporting weighting deal with weightCol in this way.
|
Merged to master |
What changes were proposed in this pull request?
common methods support extract weights
Why are the changes needed?
today more and more ML algs support weighting, add this method will make impls simple
Does this PR introduce any user-facing change?
no
How was this patch tested?
existing testsuites