Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid deepcopying model artifacts in model serving scenarios #463

Closed
takikadiri opened this issue Oct 21, 2023 · 1 comment · Fixed by #466
Closed

Avoid deepcopying model artifacts in model serving scenarios #463

takikadiri opened this issue Oct 21, 2023 · 1 comment · Fixed by #466

Comments

@takikadiri
Copy link
Collaborator

Description

When training a model using pipeline_ml an mlflow models is created containing the inference pipeline and the associated catalog including the model artifacts. As a user i have to specify the copy_mode of each dataset of the catalog, including the datasets that refer to model artifacts. I found myself systematically setting copy_mode to assign for model artifacts as they are always a read only datasets, if i don't set them they will be deepcopied at each inference interation (in model serving for example) which cause a memory leak. Kedro by default deepcopy all non Pandas, numpy.. datasets.

Is it possible to default the copy_mode as assign for artifact datasets ?

Context

This change will prevent retraining my models each time i forget to set all the model artifacts copy_mode to assign, and at worst to provoke a memory leak for some users that are not aware of this problem, as it is currently the default behavior .

Possible Implementation

At KedroPipelineModel init, you can get the pipeline artifacts before initializing the loaded_catalog and defaulting the copy_mode to assign for those datasets

@Galileo-Galilei
Copy link
Owner

It is likely a good idea to make it the default, but for the record you don't need to specify all the artifacts one by one with a dictionary: you can already pass "copy_mode=assign" to KedroPipelineModel and it will apply to all artifacts dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants