A KedroPipelineModel cannot be loaded from mlflow if its catalog contains non deepcopy-able DataSets #122

Galileo-Galilei · 2020-11-21T18:28:33Z

Description

I tried to load a KedroPipelineModel from mlflow, and I got a "cannot pickle context artifacts" error, which is due do the

Context

I cannot load a previously saved KedroPipelineModel generated by pipeline_ml_factory.

Steps to Reproduce

Save A KedroPipelineModel with a dataset that contains an object which cannot be deepcopied (for me, a keras tokenizer)

Expected Result

The model should be loaded

Actual Result

An error is raised

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

kedro and kedro-mlflow version used: 0.16.5 and 0.4.0
Python version used (python -V): 3.6.8
Windows 10 & CentOS were tested

Does the bug also happen with the last version on develop?

Yes

Potential solution

The faulty line is:

kedro-mlflow/kedro_mlflow/mlflow/kedro_pipeline_model.py

Line 45 in 63dcd50

self.loaded_catalog = deepcopy(self.initial_catalog)

The text was updated successfully, but these errors were encountered:

takikadiri · 2020-11-22T12:43:03Z

Does removing the faulty line and using directly the initial_catalog make the model loadable again ? if Yes, we have two options :

We no longer deepcopy the initial_catalog
We copy each DataSet of the catalog with his own loader (for example, we use tf.keras.models.clone_model for keras model DataSet ...)

Knowing that the KedroPipelineModel is intented to be used in a separated process (at inference-time), we can just remove the deepcopy part (there won't be a conflict with another function using the same catalog)

Galileo-Galilei · 2020-11-26T21:19:58Z

After some investigation, the issues comes from the MLflowAbstractModelDataSet, and particularly the self._mlflow_model_module attribute which is a module and not deepcopiable by nature. I suggest to store it as a string, and have a property attribute to load the module on the fly.

Note that this is a problem which occurs only when the DataSet is not deepcopiable (and not the underlying value the DataSet can load(), so we can quite safely assume that it should not occur often). If it does, we should consider a more radical solution among the ones you suggest.

…as artifact

Galileo-Galilei added the bug Something isn't working label Nov 21, 2020

Galileo-Galilei added this to To do in Ongoing development via automation Nov 21, 2020

Galileo-Galilei added this to the Release 0.5.0 milestone Nov 21, 2020

Galileo-Galilei changed the title ~~A KedroPipelineModel cannot be loaded from mlflow if its catalog contains non deepcopy-ish DataSets~~ A KedroPipelineModel cannot be loaded from mlflow if its catalog contains non deepcopy-able DataSets Nov 21, 2020

Galileo-Galilei modified the milestones: Release 0.5.0, Release 0.4.1 Nov 25, 2020

Galileo-Galilei moved this from To do to Planned for next release in Ongoing development Nov 25, 2020

Galileo-Galilei moved this from Planned for next release to In progress in Ongoing development Nov 28, 2020

Galileo-Galilei added a commit that referenced this issue Nov 28, 2020

FIX #122 - KedroPipelineModel now accepts MlflowAbstractModelDataSet …

53fb857

…as artifact

Galileo-Galilei mentioned this issue Nov 28, 2020

Feature/deepcopy catalog #129

Merged

5 tasks

Galileo-Galilei added a commit that referenced this issue Nov 28, 2020

FIX #122 - KedroPipelineModel now accepts MlflowAbstractModelDataSet …

f8369e2

…as artifact

Galileo-Galilei closed this as completed in #129 Nov 28, 2020

Ongoing development automation moved this from In progress to Done Nov 28, 2020

Galileo-Galilei added a commit that referenced this issue Nov 28, 2020

FIX #122 - KedroPipelineModel now accepts MlflowAbstractModelDataSet …

f2a60a4

…as artifact

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A KedroPipelineModel cannot be loaded from mlflow if its catalog contains non deepcopy-able DataSets #122

A KedroPipelineModel cannot be loaded from mlflow if its catalog contains non deepcopy-able DataSets #122

Galileo-Galilei commented Nov 21, 2020

takikadiri commented Nov 22, 2020

Galileo-Galilei commented Nov 26, 2020

A KedroPipelineModel cannot be loaded from mlflow if its catalog contains non deepcopy-able DataSets #122

A KedroPipelineModel cannot be loaded from mlflow if its catalog contains non deepcopy-able DataSets #122

Comments

Galileo-Galilei commented Nov 21, 2020

Description

Context

Steps to Reproduce

Expected Result

Actual Result

Your Environment

Does the bug also happen with the last version on develop?

Potential solution

takikadiri commented Nov 22, 2020

Galileo-Galilei commented Nov 26, 2020