Skip to content

Conversation

imatiach-msft
Copy link
Contributor

What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: #16557
as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

The updates to the regression metrics were based on (and updated with new changes based on comments):
https://issues.apache.org/jira/browse/SPARK-11520
("RegressionMetrics should support instance weights")
but the pull request was closed as the changes were never checked in.

How was this patch tested?

I added tests to the metrics class.

@SparkQA
Copy link

SparkQA commented Feb 27, 2017

Test build #73528 has finished for PR 17085 at commit 48800eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor Author

@sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen would you be able to take a look? I've split the larger pull request into three parts as suggested.

@imatiach-msft
Copy link
Contributor Author

ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? thank you!

@imatiach-msft imatiach-msft force-pushed the ilmat/regression-evaluate branch from 48800eb to d5acd46 Compare April 16, 2018 16:52
@SparkQA
Copy link

SparkQA commented Apr 16, 2018

Test build #89406 has finished for PR 17085 at commit d5acd46.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft imatiach-msft force-pushed the ilmat/regression-evaluate branch from d5acd46 to 17c1626 Compare April 16, 2018 19:03
@imatiach-msft
Copy link
Contributor Author

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Apr 16, 2018

Test build #89414 has finished for PR 17085 at commit 17c1626.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft imatiach-msft changed the title [SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator [SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator May 14, 2018
@imatiach-msft imatiach-msft force-pushed the ilmat/regression-evaluate branch from 17c1626 to aca6255 Compare December 11, 2018 05:12
@imatiach-msft
Copy link
Contributor Author

ping @sethah @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? I've updated the PR to latest and made it similar to the multiclass PR that was merged: #17086

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99947 has finished for PR 17085 at commit 0de3209.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99948 has finished for PR 17085 at commit 0480721.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99952 has finished for PR 17085 at commit 0cb2daf.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99946 has finished for PR 17085 at commit aca6255.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks OK to me. What about the classification evaluators? is there the same meaningful notion of weights? I'd imagine it's possible but not sure I've seen weighted accuracy, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't think the rename was necessary, but it is OK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Contributor Author

@imatiach-msft imatiach-msft Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, it looks like the build failed because the private variable conflicts with the public variable that was defined:

/**

  • Sum of weights.
    */
    override def weightSum: Double = totalWeightSum

I think this may be the best name for the public variable so I would prefer to keep it. The private variable now follows the naming convention of the other private array variables so I think this makes sense.

@imatiach-msft
Copy link
Contributor Author

imatiach-msft commented Dec 11, 2018

@srowen yes, exactly, there is a third PR here for classification: #17084
But I need to update it in a similar way to how I just updated this PR (eg 2.2.0 -> 3.0.0).

The original PR had all three but it was recommended that I break it up into 3 parts so I closed it and opened three separate PRs:
#16557

@imatiach-msft imatiach-msft force-pushed the ilmat/regression-evaluate branch from 0cb2daf to f708edb Compare December 11, 2018 16:11
@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99982 has finished for PR 17085 at commit f708edb.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99984 has finished for PR 17085 at commit 24b66da.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Dec 12, 2018

Merged to master

@srowen srowen closed this in 570b8f3 Dec 12, 2018
holdenk pushed a commit to holdenk/spark that referenced this pull request Jan 5, 2019
…ed weight column for regression evaluator

## What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: apache#16557
 as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

The updates to the regression metrics were based on (and updated with new changes based on comments):
https://issues.apache.org/jira/browse/SPARK-11520
 ("RegressionMetrics should support instance weights")
 but the pull request was closed as the changes were never checked in.

## How was this patch tested?

I added tests to the metrics class.

Closes apache#17085 from imatiach-msft/ilmat/regression-evaluate.

Authored-by: Ilya Matiach <ilmat@microsoft.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ed weight column for regression evaluator

## What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: apache#16557
 as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

The updates to the regression metrics were based on (and updated with new changes based on comments):
https://issues.apache.org/jira/browse/SPARK-11520
 ("RegressionMetrics should support instance weights")
 but the pull request was closed as the changes were never checked in.

## How was this patch tested?

I added tests to the metrics class.

Closes apache#17085 from imatiach-msft/ilmat/regression-evaluate.

Authored-by: Ilya Matiach <ilmat@microsoft.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants