Skip to content

Conversation

@uchiiii
Copy link
Contributor

@uchiiii uchiiii commented Jun 20, 2022

What changes were proposed in this pull request?

  • Merged the two constructor into one using RDD[_ <: Product].

Why are the changes needed?

  • To make code simpler.
  • To support even more inputs.
  • The previous code treats rel as an empty array when rel is not provided, which is not that beautiful. This change removes this.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

@github-actions github-actions bot added the MLLIB label Jun 20, 2022
@zhengruifeng zhengruifeng changed the title [MLLIB] Modify constructor of RankingMetrics class [SPARK-39446][MLLIB][FOLLOWUP] Modify constructor of RankingMetrics class Jun 20, 2022
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may create a private rdd, and than use it internally to minimize changes.

something like this:

private val rdd = predictionAndLabels.map {
    case (pred: Array[T], lab: Array[T]) => ...
    case (pred: Array[T], lab: Array[T], rel: Array[Double]) => ...
}

Copy link
Contributor Author

@uchiiii uchiiii Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that we use rdd as a private variable whose type is RDD[(Array[T], Array[T], Array[Double])], and keep other methods almost the same?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, hope this can reduce modifications

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Comment on lines 154 to 156
Copy link
Contributor Author

@uchiiii uchiiii Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which do you think is better for ndcgAt?

  • The previous one, where to use binary is decided based on whether rel is an empty array.
  • The current one, where to use binary is decided based on user input directly.

IMHO, the current one is easier to understand.

Copy link
Contributor Author

@uchiiii uchiiii Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhengruifeng @srowen
Sorry to interrupt you, but which do you think is better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think The previous one maybe more concise.

Copy link
Contributor Author

@uchiiii uchiiii Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your opinion.

Hm, I thought the current one may be more concise for the developers because you could easily understand the calculation process is different by input type (even though I wrote both of them).

Anyway, I changed this to the previous one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can rename this and the constructor arg; the constructor arg may also have relevance; this does not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want not to change the name because

  • MulticlassMetrics also has the arguments whose name is predictionAndLabels.
    /**
    * Evaluator for multiclass classification.
    *
    * @param predictionAndLabels an RDD of (prediction, label, weight, probability) or
    * (prediction, label, weight) or (prediction, label) tuples.
    */
    @Since("1.1.0")
    class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[_ <: Product]) {
  • We change the interface of the constructor to be able to support more inputs. Specific names like predictionAndLabelsWithOptionalRelevance may go against the goal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that's reasonable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why discarding rel here?

If rdd is a RDD[Array[T], Array[T], Array[Double]], then in the ndcgAt, we can simply check whether rel is empty?

also, maybe we can rename rdd to a more meaningful name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT @uchiiii ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about fullRDD?
IMO, Names like predictionsAndLabelsAndRelevances are too long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe predictionsLabelsRelevances?

@uchiiii uchiiii force-pushed the modify_ranking_metrics branch from da3c694 to 6944723 Compare June 22, 2022 11:20
@srowen srowen closed this in f465a3d Jun 25, 2022
@srowen
Copy link
Member

srowen commented Jun 25, 2022

Merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants