-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39446][MLLIB][FOLLOWUP] Modify constructor of RankingMetrics class #36920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may create a private rdd, and than use it internally to minimize changes.
something like this:
private val rdd = predictionAndLabels.map {
case (pred: Array[T], lab: Array[T]) => ...
case (pred: Array[T], lab: Array[T], rel: Array[Double]) => ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that we use rdd as a private variable whose type is RDD[(Array[T], Array[T], Array[Double])], and keep other methods almost the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, hope this can reduce modifications
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which do you think is better for ndcgAt?
- The previous one, where to use binary is decided based on whether
relis an empty array. - The current one, where to use binary is decided based on user input directly.
IMHO, the current one is easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhengruifeng @srowen
Sorry to interrupt you, but which do you think is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think The previous one maybe more concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your opinion.
Hm, I thought the current one may be more concise for the developers because you could easily understand the calculation process is different by input type (even though I wrote both of them).
Anyway, I changed this to the previous one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can rename this and the constructor arg; the constructor arg may also have relevance; this does not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want not to change the name because
MulticlassMetricsalso has the arguments whose name ispredictionAndLabels.
spark/mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala
Lines 28 to 35 in b588d07
/** * Evaluator for multiclass classification. * * @param predictionAndLabels an RDD of (prediction, label, weight, probability) or * (prediction, label, weight) or (prediction, label) tuples. */ @Since("1.1.0") class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[_ <: Product]) { - We change the interface of the constructor to be able to support more inputs. Specific names like
predictionAndLabelsWithOptionalRelevancemay go against the goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK that's reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why discarding rel here?
If rdd is a RDD[Array[T], Array[T], Array[Double]], then in the ndcgAt, we can simply check whether rel is empty?
also, maybe we can rename rdd to a more meaningful name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT @uchiiii ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about fullRDD?
IMO, Names like predictionsAndLabelsAndRelevances are too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe predictionsLabelsRelevances?
da3c694 to
6944723
Compare
|
Merged to master |
What changes were proposed in this pull request?
RDD[_ <: Product].Why are the changes needed?
The previous code treatsrelas an empty array whenrelis not provided, which is not that beautiful. This change removes this.Does this PR introduce any user-facing change?
NO
How was this patch tested?