Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model serialization? #524

Closed
dwilson1988 opened this issue Oct 22, 2018 · 12 comments · Fixed by #496
Closed

Model serialization? #524

dwilson1988 opened this issue Oct 22, 2018 · 12 comments · Fixed by #496

Comments

@dwilson1988
Copy link
Contributor

First off - AWESOME library. I've been using it for a few of my projects and it's been a lifesaver.

However, I'm looking to save models. I can't seem to find any documentation on how. My specific use case is just wanting to save the Kaplan-Meier estimator and use it to make predictions later. I can of course save off the data frame of the survival function, but I'd like to pickle (or otherwise) the model and reload in a different module as I'm performing analysis/fitting and then using the model later. Is there another way besides exporting the survival function (I'd like use the predict method...)

Thanks!

@CamDavidsonPilon
Copy link
Owner

Hey @dwilson1988, thanks for the kind words. So there's been an open issue in lifelines for a while around that point: #188

(pickle is a library to serialize python objects)

There's not a good solution atm, unfortunately. However, in a later release, I will come back to fix this.

@dwilson1988
Copy link
Contributor Author

I've written custom reduce methods for metaclasses before, would you be interested in a PR if I can break off some time to work on this?

@CamDavidsonPilon
Copy link
Owner

yesss so interested!

@dwilson1988
Copy link
Contributor Author

dwilson1988 commented Oct 23, 2018

So after inspection of the code, I see why this is an issue. Since the predict/divide/subtract methods are dynamically generated, they are unpicklable. I see two directions to go.

  1. Use the dill library, which can pickle functions including lambdas. I believe this should work out of the box. Either users can use this directly (just tested this) or it could be integrated into the __reduce__ method. I'm not a huge fan of this because it adds a dependency OR requires a user to know what serialization library to use.

  2. I can make it work with pickle, but it'll require a minor refactor on the way those methods work. Instead of implementing them as a factory method (e.g. _predict returns predict), I think those can easily be served as methods in the base UnivariateFitter class. The docstrings can easily be updated in UnivariateFitter.__init__ (started implementing this) and the attributes can even be hidden from the user until fit is called. This will require one minor mod to the derived classes, which is storing the attributes required for the base classes (estimate name and label for Kaplan-Meier as an example). If this approach is acceptable, I can probably get a PR in in the next day. I'm pretty sure I can make it general and clean and would allow these estimators to be picklable.

Thoughts?

@CamDavidsonPilon
Copy link
Owner

I'd prefer 2. - I don't like the additional dependency to clean up bad software architecture (I can say that because I wrote it heh)

@dwilson1988
Copy link
Contributor Author

:)

In progress, I'll send you a PR when I have it complete.

@CamDavidsonPilon
Copy link
Owner

If you can, feel free to work off my v0.15 branch (or point your contributions to branch v0.15.0), so I can add your contributions in the next release

@dwilson1988
Copy link
Contributor Author

Awesome, working on v0.15.0

@dwilson1988
Copy link
Contributor Author

Aha, I get it. You were trying to avoid rewriting docstrings because they are the same for the most part. A lot of docstrings are writable, but method docstrings aren't. You also can't modify the class __dict__ in 3.6+, so I'm working on workarounds.

@CamDavidsonPilon
Copy link
Owner

You were trying to avoid rewriting docstrings because they are the same for the most part.

I think a bit of over-engineering on my part, I feel

@CamDavidsonPilon
Copy link
Owner

I'd be just as happy with a "no-nonsense under-engineered" solution too

@dwilson1988
Copy link
Contributor Author

I like clever code that saves lines of code, but perhaps "Explicit is better than implicit" is a good call here. I'm almost done either way, should have the PR in soon. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants