You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When training a model using pipeline_ml an mlflow models is created containing the inference pipeline and the associated catalog including the model artifacts. As a user i have to specify the copy_mode of each dataset of the catalog, including the datasets that refer to model artifacts. I found myself systematically setting copy_mode to assign for model artifacts as they are always a read only datasets, if i don't set them they will be deepcopied at each inference interation (in model serving for example) which cause a memory leak. Kedro by default deepcopy all non Pandas, numpy.. datasets.
Is it possible to default the copy_mode as assign for artifact datasets ?
Context
This change will prevent retraining my models each time i forget to set all the model artifacts copy_mode to assign, and at worst to provoke a memory leak for some users that are not aware of this problem, as it is currently the default behavior .
It is likely a good idea to make it the default, but for the record you don't need to specify all the artifacts one by one with a dictionary: you can already pass "copy_mode=assign" to KedroPipelineModel and it will apply to all artifacts dataset.
Description
When training a model using
pipeline_ml
an mlflow models is created containing the inference pipeline and the associated catalog including the model artifacts. As a user i have to specify the copy_mode of each dataset of the catalog, including the datasets that refer to model artifacts. I found myself systematically settingcopy_mode
toassign
for model artifacts as they are always a read only datasets, if i don't set them they will be deepcopied at each inference interation (in model serving for example) which cause a memory leak. Kedro by default deepcopy all non Pandas, numpy.. datasets.Is it possible to default the
copy_mode
asassign
for artifact datasets ?Context
This change will prevent retraining my models each time i forget to set all the model artifacts
copy_mode
toassign
, and at worst to provoke a memory leak for some users that are not aware of this problem, as it is currently the default behavior .Possible Implementation
At
KedroPipelineModel
init, you can get the pipeline artifacts before initializing the loaded_catalog and defaulting thecopy_mode
toassign
for those datasetsThe text was updated successfully, but these errors were encountered: