-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark pipeline cannot load with spark CatBoostClassifierModel #2402
Comments
Any workarounds? |
Hello! |
In my case environment is databricks. And it works perfectly with "ai.catboost:catboost-spark_3.2_2.12:1.1.1" but not with "ai.catboost:catboost-spark_3.4_2.12:1.2" (with corresponding environment) |
Actually it doesn't work with any version of catboost inside Pipeline. |
That's very weird. Actually there have been an error with the Pipeline some time ago: #1936 and it has been fixed since CatBoost 1.0.4 and there're test cases to check that. Can you check whether the code from the test cases committed in 835c1a2 works in your environment? |
Problem: When trying to load a pipeline containing a spark catboost classifier, I am receiving the following error:
AttributeError: module 'ai.catboost.spark' has no attribute 'CatBoostClassificationModel'
AttributeError Traceback (most recent call last)
in <cell line: 5>()
3
4 # Load model
----> 5 loaded_model = mlflow.spark.load_model(logged_model)
/databricks/python/lib/python3.9/site-packages/mlflow/spark.py in load_model(model_uri, dfs_tmpdir, dst_path)
793 get_databricks_profile_uri_from_artifact_uri(root_uri)
794 ):
--> 795 return PipelineModel.load(mlflowdbfs_path)
796
797 return _load_model(
/databricks/spark/python/pyspark/ml/util.py in load(cls, path)
444 def load(cls, path: str) -> RL:
445 """Reads an ML instance from the input path, a shortcut of
read().load(path)
."""--> 446 return cls.read().load(path)
447
448
/databricks/spark/python/pyspark/ml/pipeline.py in load(self, path)
282 metadata = DefaultParamsReader.loadMetadata(path, self.sc)
283 if "language" not in metadata["paramMap"] or metadata["paramMap"]["language"] != "Python":
--> 284 return JavaMLReader(cast(Type["JavaMLReadable[PipelineModel]"], self.cls)).load(path)
285 else:
286 uid, stages = PipelineSharedReadWrite.load(metadata, self.sc, path)
/databricks/spark/python/pyspark/ml/util.py in load(self, path)
398 "This Java ML type cannot be loaded into Python currently: %r" % self._clazz
399 )
--> 400 return self._clazz._from_java(java_obj) # type: ignore[attr-defined]
401
402 def session(self: JR, sparkSession: SparkSession) -> JR:
/databricks/spark/python/pyspark/ml/pipeline.py in _from_java(cls, java_stage)
342 """
343 # Load information from java_stage to the instance.
--> 344 py_stages: List[Transformer] = [JavaParams._from_java(s) for s in java_stage.stages()]
345 # Create a new instance of this stage.
346 py_stage = cls(py_stages)
/databricks/spark/python/pyspark/ml/pipeline.py in (.0)
342 """
343 # Load information from java_stage to the instance.
--> 344 py_stages: List[Transformer] = [JavaParams._from_java(s) for s in java_stage.stages()]
345 # Create a new instance of this stage.
346 py_stage = cls(py_stages)
/databricks/spark/python/pyspark/ml/wrapper.py in _from_java(java_stage)
290 stage_name = java_stage.getClass().getName().replace("org.apache.spark", "pyspark")
291 # Generate a default new instance from the stage_name class.
--> 292 py_type = __get_class(stage_name)
293 if issubclass(py_type, JavaParams):
294 # Load information from java_stage to the instance.
/databricks/spark/python/pyspark/ml/wrapper.py in __get_class(clazz)
285 m = import(module)
286 for comp in parts[1:]:
--> 287 m = getattr(m, comp)
288 return m
289
AttributeError: module 'ai.catboost.spark' has no attribute 'CatBoostClassificationModel'
catboost version: spark 1.2
scala: 2.12
spark: 3.3
The text was updated successfully, but these errors were encountered: