Skip to content

[SPARK-30504][PYTHON][ML] Set weightCol in OneVsRest(Model) _to_java and _from_java#27190

Closed
zero323 wants to merge 4 commits intoapache:masterfrom
zero323:SPARK-30504
Closed

[SPARK-30504][PYTHON][ML] Set weightCol in OneVsRest(Model) _to_java and _from_java#27190
zero323 wants to merge 4 commits intoapache:masterfrom
zero323:SPARK-30504

Conversation

@zero323
Copy link
Member

@zero323 zero323 commented Jan 13, 2020

What changes were proposed in this pull request?

This PR adjusts _to_java and _from_java of OneVsRest and OneVsRestModel to preserve weightCol.

Why are the changes needed?

Currently both Params don't preserve weightCol Params when data is saved / loaded:

from pyspark.ml.classification import LogisticRegression, OneVsRest, OneVsRestModel
from pyspark.ml.linalg import DenseVector

df = spark.createDataFrame([(0, 1, DenseVector([1.0, 0.0])), (0, 1, DenseVector([1.0, 0.0]))], ("label", "w", "features"))

ovr = OneVsRest(classifier=LogisticRegression()).setWeightCol("w")
ovrm = ovr.fit(df)
ovr.getWeightCol()
## 'w'
ovrm.getWeightCol()
## 'w'

ovr.write().overwrite().save("/tmp/ovr")
ovr_ = OneVsRest.load("/tmp/ovr")
ovr_.getWeightCol()
## KeyError   
## ...
## KeyError: Param(parent='OneVsRest_5145d56b6bd1', name='weightCol', doc='weight column name. ...)

ovrm.write().overwrite().save("/tmp/ovrm")
ovrm_ = OneVsRestModel.load("/tmp/ovrm")
ovrm_ .getWeightCol()
## KeyError   
## ...
## KeyError: Param(parent='OneVsRestModel_598c6d900fad', name='weightCol', doc='weight column name ...

Does this PR introduce any user-facing change?

After this PR is merged, loaded objects will have weightCol Param set.

How was this patch tested?

  • Manual testing.
  • Extension of existing persistence tests.

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116636 has finished for PR 27190 at commit c9ccbb2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116646 has finished for PR 27190 at commit 88298ba.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116648 has finished for PR 27190 at commit 83b9565.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jan 14, 2020

CC @huaxingao @zhengruifeng as you've been looking at handling weight cols recently

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116726 has finished for PR 27190 at commit aae798d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor

LGTM

@srowen
Copy link
Member

srowen commented Jan 15, 2020

Merged to master

@srowen srowen closed this in 525c569 Jan 15, 2020
@zero323 zero323 deleted the SPARK-30504 branch January 15, 2020 14:45
@zero323
Copy link
Member Author

zero323 commented Jan 15, 2020

Thanks everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants