Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serializable Fonduer model #259

Closed
HiromuHota opened this issue May 13, 2019 · 8 comments · Fixed by #407
Closed

Serializable Fonduer model #259

HiromuHota opened this issue May 13, 2019 · 8 comments · Fixed by #407

Comments

@HiromuHota
Copy link
Contributor

Is your feature request related to a problem? Please describe.

I develop a Fonduer-based app locally on my laptop.
Once it's done, I'd like to package the whole Fonduer pipeline (parsing, extraction, featurization, and classification) and deploy it to a remote place to serve.
However, a Fonduer-based app is not easy to package hence not easy to deploy.

Describe the solution you'd like

Add a Fonduer model class that is

  1. Serializable (e.g., a class with save and load member methods like below)
class FonduerModel:
    def save(path_to_save):
    def load(path_to_load):
  1. Capable of executing any phase of the Fonduer pipeline
  2. (Hopefully) Manageable by MLflow

Describe alternatives you've considered

I can create one or more of python scripts that do all the phase, package them, and deploy it.
This is cumbersome because the python script has to include many things (matchers, mention_classes, mention_spaces, candidate_classes, etc.) and it is not obvious what should be included for serving.

Additional context

I'd like to make Fonduer more deployable and servable.
I've been testing MLflow to package a Fonduer-based app and found it was difficult to do so when there is no serializable Fonduer model.

@senwu
Copy link
Collaborator

senwu commented May 14, 2019

This is a great idea! Really excited to see/chat about how we can approach it to make Fonduer much easier to use.

@HiromuHota
Copy link
Contributor Author

I did some research how to serialize a dynamically created class (e.g., mention/candidate subclasses in Fonduer) and I found that cloudpickle or dill can serialize such a class.
However, the thing becomes more complicated when you try to serialize LambdaFunctionMatcher-based matchers. Those user-defined lambda functions have to be serialized along with matchers, which could be cumbersome.
Serializability is nice but packagiability (of user-created python files for mention/candidate subclasses, matchers, spaces, etc.) would be good enough.

@trungtv
Copy link

trungtv commented Aug 15, 2019

Hello,
I am also interested in serving Fonduer models in concurrent environments. If there is an effort for that, I would like to join in.
Thanks for your great work.

@HiromuHota
Copy link
Contributor Author

I've created a custom MLflow model to package a (trained) Fonduer model.
In addition to packaging, this custom model can be used to serve the packaged model.

Currently this custom MLflow model includes some hard-coded part, hence needs cleanup; but I'd love to contribute it to the community if it is useful for other people.
@senwu @trungtv, let us know your thoughts.

@senwu
Copy link
Collaborator

senwu commented Aug 15, 2019

@HiromuHota Awesome! We definitely love to have it since more and more people want to use it! Happy to chat and contribute as well! We should have a tutorial for that as well. 👍

@trungtv
Copy link

trungtv commented Aug 16, 2019

@HiromuHota great! I am happy to contribute on this milestone for Fonduer.

@HiromuHota
Copy link
Contributor Author

@senwu @trungtv I've created a new repository (https://github.com/HiromuHota/fonduer-mlflow) for this custom MLflow model for Fonduer.
@senwu I'd like this custom MLflow model for Fonduer (fonduer_model.py) to be merged to the Fonduer repository in the future. So please take a look at the repository and get familiar with it. Let me know if you have any question, suggestion, etc.

@HiromuHota
Copy link
Contributor Author

I think fonduer-mlflow became in good shape and ready for to be submitted as a PR against fonduer.
Let me create a PR and submit it.

lukehsiao pushed a commit that referenced this issue Jun 3, 2020
Users would like to be able to develop a Fonduer-based application
locally on their workstation, and then package the whole pipeline
(parsing, extraction, featurization, and classification) to be deployed
somewhere remote to serve. However, FOnduer-based applications are not
easily packaged.

This commit changes that. Specifically, this creates a serializable
Fonduer model that is capable of executing any phase of the Fonduer
pipeline. It is manageable by MLflow [1], a platform for managing machine
learning pipelines, using the MLflow Model as a storage format.

Further documentation has been added in docs/user/packaging.rst, and
usage is also shown in the additional tests.

[1]: https://mlflow.org/

Closes #259.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants