Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for storing trained model metadata #881

Closed
parano opened this issue Jul 11, 2020 · 3 comments · Fixed by #1179
Closed

Add API for storing trained model metadata #881

parano opened this issue Jul 11, 2020 · 3 comments · Fixed by #1179
Labels
feature Feature requests or pull request implementing a new feature help-wanted An issue currently lacks a contributor

Comments

@parano
Copy link
Member

parano commented Jul 11, 2020

Is your feature request related to a problem? Please describe.

There are many existing libraries and platforms providing amazing metrics tracking capabilities, e.g. MLFLow's auto logging and Comet.ml's hosted solution with very nice UI.

In a typical ML workflow, the data science team in the model development phase, may use an experimentation management platform to produce lots of trained models and then go through a model selection process where they will be comparing different models and decide which ones work the best. In that scenario, tools like MLFlow and Comet.ml can give the user lots of insight into the training process and what are the algorithms or model architecture that works for their specific problem.

On the other hand, BentoML is a tool focusing on serving and deploying trained models. Once the data scientist has selected a model via the model development phase, they can use BentoML to productionize the trained model and make it ready for production-grade deployment.

In a sense, models that are saved to BentoML's model registry YataiService are the "golden models", they are produced by a production training pipeline and ready to be tested and shipped to serve production traffic.

Hoever, when looking at BentoML's saved bundle before deploying it, the user may still want to see some context information about the trained model. For example, the version of the training dataset used, the training parameters used, the experimentation/training job ID in their experimentation management platform, or the evaluation results(precision, recall, inception score, BPC, etc).

This information makes it a lot easier for users to trace back if they run into an issue with a new version of their BentoML Service, and to make decisions about which service to deploy.

Describe the solution you'd like

Here is to propose an API for adding context information in a loosely typed JSON blob, and allow viewing this information from CLI and Web UI.

from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import DataframeInput
from bentoml.artifact import SklearnModelArtifact

@env(auto_pip_dependencies=True)
@artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):

    @api(input=DataframeInput())
    def predict(self, df):
        # Optional pre-processing, post-processing code goes here
        return self.artifacts.model.predict(df)

-----

from sklearn import svm
from sklearn import datasets

from iris_classifier import IrisClassifier

if __name__ == "__main__":
    # Load training data
    iris = datasets.load_iris()
    X, y = iris.data, iris.target

    # Model Training
    clf = svm.SVC(gamma='scale')
    clf.fit(X, y)

    # Create a iris classifier service instance
    iris_classifier_service = IrisClassifier()

    # Pack the newly trained model artifact
    model_metadata = {
      'k1', 'v1'
      'job_id', 'ABC'
      'score', 0.84
    }
    iris_classifier_service.pack('model', calf, metadata=model_metadata)

    # Save the prediction service to disk for model serving
    saved_path = iris_classifier_service.save()

The user added metadata will be inserted in the bentoml.yml file under the saved bundle directory:

version: 0.8.2
kind: BentoService
metadata:
  created_at: 2020-07-02 05:25:02.809453
  service_name: IrisClassifier
  service_version: 20200701222443_EC5EAB
  module_name: iris_classifier
  module_file: iris_classifier.py
env:
  pip_dependencies:
  - scikit-learn
  - pandas
  - bentoml==0.8.2
  conda_env:
    name: bentoml-IrisClassifier
    channels:
    - defaults
    dependencies:
    - python=3.7.5
    - pip
  python_version: 3.7.5
  docker_base_image: bentoml/model-server:0.8.2
apis:
- name: predict
  docs: BentoService API
  input_type: DataframeInput
  output_type: DefaultOutput
  mb_max_batch_size: 2000
  mb_max_latency: 300
  input_config:
    is_batch_input: true
    orient:
    typ: frame
    input_dtypes:
  output_config:
    cors: '*'
artifacts:
- name: model
  artifact_type: SklearnModelArtifact
  metadata:
      k1: v1
      job_id: ABC
      score: 0.84

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@parano parano added help-wanted An issue currently lacks a contributor feature Feature requests or pull request implementing a new feature MLH labels Jul 12, 2020
@parano parano changed the title Add API for storing trained model meatadata Add API for storing trained model metadata Jul 13, 2020
@parano parano removed the MLH label Sep 13, 2020
@jackyzha0
Copy link
Collaborator

Will try to take this on in the next week or two!!

@parano
Copy link
Member Author

parano commented Sep 17, 2020

@jackyzha0 awesome!! this is my favorite feature on the roadmap!

@yubozhao yubozhao added the MLH label Sep 25, 2020
@fernandocamargoai
Copy link

I'd also suggest creating an endpoint where we can GET this metadata (like /metadata).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature requests or pull request implementing a new feature help-wanted An issue currently lacks a contributor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants