Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33592][ML][PYTHON][3.0] Backport Fix: Pyspark ML Validator params in estimatorParamMaps may be lost after saving and reloading #30590

Closed

Commits on Dec 3, 2020

  1. [SPARK-33592] Fix: Pyspark ML Validator params in estimatorParamMaps …

    …may be lost after saving and reloading
    
    Fix: Pyspark ML Validator params in estimatorParamMaps may be lost after saving and reloading
    
    When saving validator estimatorParamMaps, will check all nested stages in tuned estimator to get correct param parent.
    
    Two typical cases to manually test:
    ~~~python
    tokenizer = Tokenizer(inputCol="text", outputCol="words")
    hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
    lr = LogisticRegression()
    pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
    
    paramGrid = ParamGridBuilder() \
        .addGrid(hashingTF.numFeatures, [10, 100]) \
        .addGrid(lr.maxIter, [100, 200]) \
        .build()
    tvs = TrainValidationSplit(estimator=pipeline,
                               estimatorParamMaps=paramGrid,
                               evaluator=MulticlassClassificationEvaluator())
    
    tvs.save(tvsPath)
    loadedTvs = TrainValidationSplit.load(tvsPath)
    
    ~~~
    
    ~~~python
    lr = LogisticRegression()
    ova = OneVsRest(classifier=lr)
    grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
    evaluator = MulticlassClassificationEvaluator()
    tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, evaluator=evaluator)
    
    tvs.save(tvsPath)
    loadedTvs = TrainValidationSplit.load(tvsPath)
    
    ~~~
    
    Bug fix.
    
    No
    
    Unit test.
    
    Closes apache#30539 from WeichenXu123/fix_tuning_param_maps_io.
    
    Authored-by: Weichen Xu <weichen.xu@databricks.com>
    Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
    (cherry picked from commit 8016123)
    Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
    WeichenXu123 committed Dec 3, 2020
    Configuration menu
    Copy the full SHA
    58b0c79 View commit details
    Browse the repository at this point in the history
  2. update

    WeichenXu123 committed Dec 3, 2020
    Configuration menu
    Copy the full SHA
    e3c04ea View commit details
    Browse the repository at this point in the history