Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark pipeline cannot load with spark CatBoostClassifierModel #2402

Open
spencer-wallace opened this issue Jun 2, 2023 · 5 comments
Open

Comments

@spencer-wallace
Copy link

Problem: When trying to load a pipeline containing a spark catboost classifier, I am receiving the following error:

AttributeError: module 'ai.catboost.spark' has no attribute 'CatBoostClassificationModel'

AttributeError Traceback (most recent call last)
in <cell line: 5>()
3
4 # Load model
----> 5 loaded_model = mlflow.spark.load_model(logged_model)

/databricks/python/lib/python3.9/site-packages/mlflow/spark.py in load_model(model_uri, dfs_tmpdir, dst_path)
793 get_databricks_profile_uri_from_artifact_uri(root_uri)
794 ):
--> 795 return PipelineModel.load(mlflowdbfs_path)
796
797 return _load_model(

/databricks/spark/python/pyspark/ml/util.py in load(cls, path)
444 def load(cls, path: str) -> RL:
445 """Reads an ML instance from the input path, a shortcut of read().load(path)."""
--> 446 return cls.read().load(path)
447
448

/databricks/spark/python/pyspark/ml/pipeline.py in load(self, path)
282 metadata = DefaultParamsReader.loadMetadata(path, self.sc)
283 if "language" not in metadata["paramMap"] or metadata["paramMap"]["language"] != "Python":
--> 284 return JavaMLReader(cast(Type["JavaMLReadable[PipelineModel]"], self.cls)).load(path)
285 else:
286 uid, stages = PipelineSharedReadWrite.load(metadata, self.sc, path)

/databricks/spark/python/pyspark/ml/util.py in load(self, path)
398 "This Java ML type cannot be loaded into Python currently: %r" % self._clazz
399 )
--> 400 return self._clazz._from_java(java_obj) # type: ignore[attr-defined]
401
402 def session(self: JR, sparkSession: SparkSession) -> JR:

/databricks/spark/python/pyspark/ml/pipeline.py in _from_java(cls, java_stage)
342 """
343 # Load information from java_stage to the instance.
--> 344 py_stages: List[Transformer] = [JavaParams._from_java(s) for s in java_stage.stages()]
345 # Create a new instance of this stage.
346 py_stage = cls(py_stages)

/databricks/spark/python/pyspark/ml/pipeline.py in (.0)
342 """
343 # Load information from java_stage to the instance.
--> 344 py_stages: List[Transformer] = [JavaParams._from_java(s) for s in java_stage.stages()]
345 # Create a new instance of this stage.
346 py_stage = cls(py_stages)

/databricks/spark/python/pyspark/ml/wrapper.py in _from_java(java_stage)
290 stage_name = java_stage.getClass().getName().replace("org.apache.spark", "pyspark")
291 # Generate a default new instance from the stage_name class.
--> 292 py_type = __get_class(stage_name)
293 if issubclass(py_type, JavaParams):
294 # Load information from java_stage to the instance.

/databricks/spark/python/pyspark/ml/wrapper.py in __get_class(clazz)
285 m = import(module)
286 for comp in parts[1:]:
--> 287 m = getattr(m, comp)
288 return m
289

AttributeError: module 'ai.catboost.spark' has no attribute 'CatBoostClassificationModel'
catboost version: spark 1.2
scala: 2.12
spark: 3.3

@bakuteyev
Copy link

Any workarounds?

@ek-ak
Copy link
Collaborator

ek-ak commented Jun 30, 2023

Hello!
Looks like you don't have all necessary JARs in the CLASSPATH environment.

@bakuteyev
Copy link

Hello! Looks like you don't have all necessary JARs in the CLASSPATH environment.

In my case environment is databricks. And it works perfectly with "ai.catboost:catboost-spark_3.2_2.12:1.1.1" but not with "ai.catboost:catboost-spark_3.4_2.12:1.2" (with corresponding environment)
I'm pretty sure that error is not on my side, because only loading part in pipeline doesn't work.

@bakuteyev
Copy link

Actually it doesn't work with any version of catboost inside Pipeline.

@andrey-khropov
Copy link
Member

Actually it doesn't work with any version of catboost inside Pipeline.

That's very weird. Actually there have been an error with the Pipeline some time ago: #1936 and it has been fixed since CatBoost 1.0.4 and there're test cases to check that. Can you check whether the code from the test cases committed in 835c1a2 works in your environment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants