Add API for storing trained model metadata #881

parano · 2020-07-11T16:43:24Z

Is your feature request related to a problem? Please describe.

There are many existing libraries and platforms providing amazing metrics tracking capabilities, e.g. MLFLow's auto logging and Comet.ml's hosted solution with very nice UI.

In a typical ML workflow, the data science team in the model development phase, may use an experimentation management platform to produce lots of trained models and then go through a model selection process where they will be comparing different models and decide which ones work the best. In that scenario, tools like MLFlow and Comet.ml can give the user lots of insight into the training process and what are the algorithms or model architecture that works for their specific problem.

On the other hand, BentoML is a tool focusing on serving and deploying trained models. Once the data scientist has selected a model via the model development phase, they can use BentoML to productionize the trained model and make it ready for production-grade deployment.

In a sense, models that are saved to BentoML's model registry YataiService are the "golden models", they are produced by a production training pipeline and ready to be tested and shipped to serve production traffic.

Hoever, when looking at BentoML's saved bundle before deploying it, the user may still want to see some context information about the trained model. For example, the version of the training dataset used, the training parameters used, the experimentation/training job ID in their experimentation management platform, or the evaluation results(precision, recall, inception score, BPC, etc).

This information makes it a lot easier for users to trace back if they run into an issue with a new version of their BentoML Service, and to make decisions about which service to deploy.

Describe the solution you'd like

Here is to propose an API for adding context information in a loosely typed JSON blob, and allow viewing this information from CLI and Web UI.

from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import DataframeInput
from bentoml.artifact import SklearnModelArtifact

@env(auto_pip_dependencies=True)
@artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):

    @api(input=DataframeInput())
    def predict(self, df):
        # Optional pre-processing, post-processing code goes here
        return self.artifacts.model.predict(df)

-----

from sklearn import svm
from sklearn import datasets

from iris_classifier import IrisClassifier

if __name__ == "__main__":
    # Load training data
    iris = datasets.load_iris()
    X, y = iris.data, iris.target

    # Model Training
    clf = svm.SVC(gamma='scale')
    clf.fit(X, y)

    # Create a iris classifier service instance
    iris_classifier_service = IrisClassifier()

    # Pack the newly trained model artifact
    model_metadata = {
      'k1', 'v1'
      'job_id', 'ABC'
      'score', 0.84
    }
    iris_classifier_service.pack('model', calf, metadata=model_metadata)

    # Save the prediction service to disk for model serving
    saved_path = iris_classifier_service.save()

The user added metadata will be inserted in the bentoml.yml file under the saved bundle directory:

version: 0.8.2
kind: BentoService
metadata:
  created_at: 2020-07-02 05:25:02.809453
  service_name: IrisClassifier
  service_version: 20200701222443_EC5EAB
  module_name: iris_classifier
  module_file: iris_classifier.py
env:
  pip_dependencies:
  - scikit-learn
  - pandas
  - bentoml==0.8.2
  conda_env:
    name: bentoml-IrisClassifier
    channels:
    - defaults
    dependencies:
    - python=3.7.5
    - pip
  python_version: 3.7.5
  docker_base_image: bentoml/model-server:0.8.2
apis:
- name: predict
  docs: BentoService API
  input_type: DataframeInput
  output_type: DefaultOutput
  mb_max_batch_size: 2000
  mb_max_latency: 300
  input_config:
    is_batch_input: true
    orient:
    typ: frame
    input_dtypes:
  output_config:
    cors: '*'
artifacts:
- name: model
  artifact_type: SklearnModelArtifact
  metadata:
      k1: v1
      job_id: ABC
      score: 0.84

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

jackyzha0 · 2020-09-17T16:50:09Z

Will try to take this on in the next week or two!!

parano · 2020-09-17T16:54:58Z

@jackyzha0 awesome!! this is my favorite feature on the roadmap!

fernandocamargoai · 2020-09-30T11:13:49Z

I'd also suggest creating an endpoint where we can GET this metadata (like /metadata).

parano mentioned this issue Jul 11, 2020

Add API to save additional training context metadata and display in YataiService UI #68

Closed

parano added help-wanted An issue currently lacks a contributor feature Feature requests or pull request implementing a new feature MLH labels Jul 12, 2020

parano changed the title ~~Add API for storing trained model meatadata~~ Add API for storing trained model metadata Jul 13, 2020

parano removed the MLH label Sep 13, 2020

yubozhao added the MLH label Sep 25, 2020

jackyzha0 mentioned this issue Oct 15, 2020

[Feature] Model Metadata API #1179

Merged

18 tasks

parano closed this as completed in #1179 Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API for storing trained model metadata #881

Add API for storing trained model metadata #881

parano commented Jul 11, 2020 •

edited

jackyzha0 commented Sep 17, 2020

parano commented Sep 17, 2020

fernandocamargoai commented Sep 30, 2020

Add API for storing trained model metadata #881

Add API for storing trained model metadata #881

Comments

parano commented Jul 11, 2020 • edited

jackyzha0 commented Sep 17, 2020

parano commented Sep 17, 2020

fernandocamargoai commented Sep 30, 2020

parano commented Jul 11, 2020 •

edited