Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd contains zero (backport PR for 2.2) #19026

Closed

Conversation

WeichenXu123
Copy link
Contributor

What changes were proposed in this pull request?

This is backport PR of #18896

fix bug of MLOR do not work correctly when featureStd contains zero

We can reproduce the bug through such dataset (features including zero variance), will generate wrong result (all coefficients becomes 0)

    val multinomialDatasetWithZeroVar = {
      val nPoints = 100
      val coefficients = Array(
        -0.57997, 0.912083, -0.371077,
        -0.16624, -0.84355, -0.048509)

      val xMean = Array(5.843, 3.0)
      val xVariance = Array(0.6856, 0.0)  // including zero variance

      val testData = generateMultinomialLogisticInput(
        coefficients, xMean, xVariance, addIntercept = true, nPoints, seed)

      val df = sc.parallelize(testData, 4).toDF().withColumn("weight", lit(1.0))
      df.cache()
      df
    }

How was this patch tested?

testcase added.

@SparkQA
Copy link

SparkQA commented Aug 23, 2017

Test build #81021 has started for PR 19026 at commit 4221fc6.

@WeichenXu123
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 23, 2017

Test build #81026 has finished for PR 19026 at commit 4221fc6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM
Thanks @WeichenXu123 !
Merging with branch-2.2

asfgit pushed a commit that referenced this pull request Aug 24, 2017
…td contains zero (backport PR for 2.2)

## What changes were proposed in this pull request?

This is backport PR of #18896

fix bug of MLOR do not work correctly when featureStd contains zero

We can reproduce the bug through such dataset (features including zero variance), will generate wrong result (all coefficients becomes 0)
```
    val multinomialDatasetWithZeroVar = {
      val nPoints = 100
      val coefficients = Array(
        -0.57997, 0.912083, -0.371077,
        -0.16624, -0.84355, -0.048509)

      val xMean = Array(5.843, 3.0)
      val xVariance = Array(0.6856, 0.0)  // including zero variance

      val testData = generateMultinomialLogisticInput(
        coefficients, xMean, xVariance, addIntercept = true, nPoints, seed)

      val df = sc.parallelize(testData, 4).toDF().withColumn("weight", lit(1.0))
      df.cache()
      df
    }
```
## How was this patch tested?

testcase added.

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #19026 from WeichenXu123/fix_mlor_zero_var_bug_2_2.
@jkbradley
Copy link
Member

@WeichenXu123 could you please close this manually? Thanks!

@WeichenXu123 WeichenXu123 deleted the fix_mlor_zero_var_bug_2_2 branch August 29, 2017 02:10
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…td contains zero (backport PR for 2.2)

## What changes were proposed in this pull request?

This is backport PR of apache#18896

fix bug of MLOR do not work correctly when featureStd contains zero

We can reproduce the bug through such dataset (features including zero variance), will generate wrong result (all coefficients becomes 0)
```
    val multinomialDatasetWithZeroVar = {
      val nPoints = 100
      val coefficients = Array(
        -0.57997, 0.912083, -0.371077,
        -0.16624, -0.84355, -0.048509)

      val xMean = Array(5.843, 3.0)
      val xVariance = Array(0.6856, 0.0)  // including zero variance

      val testData = generateMultinomialLogisticInput(
        coefficients, xMean, xVariance, addIntercept = true, nPoints, seed)

      val df = sc.parallelize(testData, 4).toDF().withColumn("weight", lit(1.0))
      df.cache()
      df
    }
```
## How was this patch tested?

testcase added.

Author: WeichenXu <WeichenXu123@outlook.com>

Closes apache#19026 from WeichenXu123/fix_mlor_zero_var_bug_2_2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants