Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to save pipelines using pickle #1956

Closed
dsherry opened this issue Mar 10, 2021 · 4 comments · Fixed by #2091
Closed

Ability to save pipelines using pickle #1956

dsherry opened this issue Mar 10, 2021 · 4 comments · Fixed by #2091
Assignees
Labels
enhancement An improvement to an existing feature. needs design Issues requiring design documentation. spike To generate additional issues and kick off a sprint.

Comments

@dsherry
Copy link
Contributor

dsherry commented Mar 10, 2021

Background
The pipeline save and load methods currently use cloudpickle instead of python pickle.

We closed #1400 a couple months ago. But this led to a bug (#1912) where our code was setting class attributes which apply across all instances of our template classes.

Plan
The short-term resolution for #1912 is to revert the change for #1400, so that get_pipelines returns a pipeline instance which is not pickle-able using python pickle but also does not have the buggy side-effect behavior when it comes to the class attributes like name and component_graph

Once that's done, this issue tracks figuring out how we can modify our approach to support python pickle of our pipelines, and filing issues to execute that plan.

@dsherry dsherry added enhancement An improvement to an existing feature. needs design Issues requiring design documentation. spike To generate additional issues and kick off a sprint. labels Mar 10, 2021
@angela97lin
Copy link
Contributor

The issue at hand is that make_pipeline returns a dynamically-generated class and pickle is not able to pickle dynamic classes.

Options:

  • Be explicit about not supporting pickle and requiring users to use cloudpickle if they want to save their pipelines.
  • Right now, we need this dynamically-generated class because we expect a pipeline class to have specific attributes (custom name. component graph, custom hyperparameters). We could move from storing those as class attributes to instance attributes. These were added in Pipeline API #345 to facilitate the ease of creating new pipelines, but it could be worth revisiting again.

@dsherry
Copy link
Contributor Author

dsherry commented Mar 19, 2021

@chukarsten suggested looking at defining reduce on our pipelines. I think that could be the way to go to make pipeline classes serializable with pickle. https://stackoverflow.com/questions/11658511/pickling-dynamically-generated-classes

However that does still leave us with another challenge. Currently we define the pipeline classes used in automl at runtime, and as @angela97lin said above, the original issue was that those definitions weren't being found/saved during the pickling, and that issue still needs a solution.

@dsherry
Copy link
Contributor Author

dsherry commented Mar 24, 2021

We did some digging this morning and listed options.

@angela97lin angela97lin self-assigned this Mar 29, 2021
@angela97lin
Copy link
Contributor

Next steps: open a spike PR, see what difficulties we encounter with updating our pipeline API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement to an existing feature. needs design Issues requiring design documentation. spike To generate additional issues and kick off a sprint.
Projects
None yet
2 participants