Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SparkML] PipelineModel don't load when contains a CatBoostClassificationModel #1936

Closed
AlexKbit opened this issue Nov 26, 2021 · 3 comments
Closed
Assignees

Comments

@AlexKbit
Copy link

Problem: PipelineModel don't load when contains a CatBoostClassificationModel
catboost version: ai.catboost:catboost-spark_3.0_2.12:1.0.3
Operating System: Google CoLab (https://colab.research.google.com/drive/1syJg_WOLFMd4z16pjMzmufJxGPA-lHfS?usp=sharing)
CPU: 4
GPU: 0

Hi, my issue relate with save/load process with SparkML PipelineModel.
I build and fit pipeline(with CatBoostClassification) to get PipelineModel.

sex_indexer = StringIndexer(inputCol='sex', 
                            outputCol="sex_index")
car_class_indexer = StringIndexer(inputCol='car_class',
                                  outputCol="car_class_index")
features = ["age", "sex_index", "car_class_index", "driving_experience", 
            "speeding_penalties", "parking_penalties", "total_car_accident"]
assembler = VectorAssembler(inputCols=features, outputCol='features')
classifier = catboost_spark.CatBoostClassifier(featuresCol='features', 
                                               labelCol=TARGET_LABEL)
pipeline = Pipeline(stages=[sex_indexer, car_class_indexer, assembler, classifier])
model = pipeline.fit(train_df)

Then I save it and try to load

model.write().overwrite().save('catboost_pipeline')
from pyspark.ml.pipeline import PipelineModel
PipelineModel.load('catboost_pipeline')

And I catch a below Exception:

Exception ignored in: <function JavaWrapper.__del__ at 0x7f67122887a0>
Traceback (most recent call last):
  File "/content/spark-3.0.3-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 42, in __del__
    if SparkContext._active_spark_context and self._java_obj is not None:
AttributeError: 'CatBoostClassificationModel' object has no attribute '_java_obj'
TypeError                                 Traceback (most recent call last)
<ipython-input-34-9521ec2f514b> in <module>()
      1 from pyspark.ml.pipeline import PipelineModel
----> 2 PipelineModel.load('catboost_pipeline')

5 frames
/tmp/spark-9b120f03-4fd5-4e00-831e-c355a529ddde/userFiles-87dc1f0f-6983-4768-ac52-34427f077389/ai.catboost_catboost-spark_3.0_2.12-1.0.3.jar/catboost_spark/core.py in _from_java_patched_for_catboost(java_stage)
     55     if issubclass(py_type, JavaParams):
     56         # Load information from java_stage to the instance.
---> 57         py_stage = py_type()
     58         py_stage._java_obj = java_stage
     59         py_stage._resetUid(java_stage.uid())

TypeError: __init__() missing 1 required positional argument: 'java_model'
@AlexKbit AlexKbit changed the title PipelineModel don't load when contains a CatBoostClassificationModel SparkML: PipelineModel don't load when contains a CatBoostClassificationModel Nov 26, 2021
@AlexKbit AlexKbit changed the title SparkML: PipelineModel don't load when contains a CatBoostClassificationModel [SparkML] PipelineModel don't load when contains a CatBoostClassificationModel Nov 26, 2021
@Evgueni-Petrov-aka-espetrov
Copy link
Contributor

Hi @AlexKbit !
@andrey-khropov will have a look into this shortly.

arcadia-devtools pushed a commit that referenced this issue Dec 31, 2021
…. MLTOOLS-6076. #1936.

ref:6d89a9639071e1495d8170129e8e760e49fc90f1
@andrey-khropov
Copy link
Member

This issue has been fixed by 835c1a2 and will be included in the next release.

@AlexKbit
Copy link
Author

AlexKbit commented Jan 9, 2022

Cool! Thank's for quick fix.

robot-piglet pushed a commit that referenced this issue Jan 16, 2023
…. MLTOOLS-6076. #1936.

ref:6d89a9639071e1495d8170129e8e760e49fc90f1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants