Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12468] [Pyspark] extractParamMap returns empty dictionary #10419

Closed
wants to merge 2 commits into from
Closed

[SPARK-12468] [Pyspark] extractParamMap returns empty dictionary #10419

wants to merge 2 commits into from

Conversation

ZacharySBrown
Copy link

This addresses an issue where extractParamMap() method for a model that has been fit returns an empty dictionary, e.g. (from the Pyspark ML API Documentation):

from pyspark.mllib.linalg import Vectors
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.param import Param, Params

# Prepare training data from a list of (label, features) tuples.
training = sqlContext.createDataFrame([
    (1.0, Vectors.dense([0.0, 1.1, 0.1])),
    (0.0, Vectors.dense([2.0, 1.0, -1.0])),
    (0.0, Vectors.dense([2.0, 1.3, 1.0])),
    (1.0, Vectors.dense([0.0, 1.2, -0.5]))], ["label", "features"])

# Create a LogisticRegression instance. This instance is an Estimator.
lr = LogisticRegression(maxIter=10, regParam=0.01)
# Print out the parameters, documentation, and any default values.
print "LogisticRegression parameters:\n" + lr.explainParams() + "\n"

# Learn a LogisticRegression model. This uses the parameters stored in lr.
model1 = lr.fit(training)

# Since model1 is a Model (i.e., a transformer produced by an Estimator),
# we can view the parameters it used during fit().
# This prints the parameter (name: value) pairs, where names are unique IDs for this
# LogisticRegression instance.
print "Model 1 was fit using parameters: "
print model1.extractParamMap()

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ZacharySBrown ZacharySBrown changed the title [SPARK-12468] [Pyspark] [SPARK-12468] [Pyspark] extractParamMap returns empty dictionary Dec 21, 2015
@yanboliang
Copy link
Contributor

@ZacharySBrown Thanks for catching this bug. But I think setting a._paramMap with self.extractParamMap() is not appropriate, because you set the child/Model's _paramMap with its parent/Estimator's _paramMap. I think what we should do is to call _transfer_params_from_java which will transforms the embedded params from the companion Scala/Java model to Python one.

@yanboliang
Copy link
Contributor

Further more, I think we should update the PySpark ML API Documentation which you mentioned. If you want to view the parameters used during fit(), you should call model1.parent.extractParamMap() rather than model1.extractParamMap().

@chrispe
Copy link

chrispe commented Mar 22, 2016

Is there any workaround, until this gets fixed?
I would like for example to be able and save the parameters used for StringIndexerModel.
Is that possible?

@jkbradley
Copy link
Member

@ZacharySBrown Thanks for this PR. I think it's a duplicate of [SPARK-10931], so could you close this PR please? As @yanboliang mentioned, a good fix will require transferring the Params from Java, which will also require having the Models contain the actual Params. It would be great to get your input on the other PR.

@chrispe92 There is not a great solution, but you can access the underlying Java object via the _java_obj attribute: list(pythonIndexer._java_obj.labels())

@asfgit asfgit closed this in 6acc72a Apr 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants