Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Model Metadata API #1179

Merged
merged 22 commits into from
Nov 13, 2020
Merged

[Feature] Model Metadata API #1179

merged 22 commits into from
Nov 13, 2020

Conversation

jackyzha0
Copy link
Collaborator

@jackyzha0 jackyzha0 commented Oct 15, 2020

Description

  • ability to save metadata for any artifact type
  • make consistent format across all frameworks and artifact types
  • update proto defs and include metadata in info command
  • add saving of metadata to load/save, will write a .yml file containing metadata for each artifact

Motivation and Context

closes #881

How Has This Been Tested?

  • with tests and locally

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature and improvements (non-breaking change which adds/improves functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Code Refactoring (internal change which is not user facing)
  • Documentation
  • Test, CI, or build

Component(s) if applicable

  • BentoService (service definition, dependency management, API input/output adapters)
  • Model Artifact (model serialization, multi-framework support)
  • Model Server (mico-batching, dockerisation, logging, OpenAPI, instruments)
  • YataiService gRPC server (model registry, cloud deployment automation)
  • YataiService web server (nodejs HTTP server and web UI)
  • Internal (BentoML's own configuration, logging, utility, exception handling)
  • BentoML CLI

Checklist:

  • My code follows the bentoml code style, both ./dev/format.sh and
    ./dev/lint.sh script have passed
    (instructions).
  • My change reduces project test coverage and requires unit tests to be added
  • I have added unit tests covering my code change
  • My change requires a change to the documentation
  • I have updated the documentation accordingly

@pep8speaks
Copy link

pep8speaks commented Oct 15, 2020

Hello @jackyzha0, Thanks for updating this PR.

There are currently no PEP 8 issues detected in this PR. Cheers! 🍻

Comment last updated at 2020-11-13 00:57:08 UTC

@jackyzha0
Copy link
Collaborator Author

still need to add coverage tests and web UI support, but would to hear your thoughts

@codecov
Copy link

codecov bot commented Oct 15, 2020

Codecov Report

Merging #1179 (59fff9d) into master (2923623) will increase coverage by 0.04%.
The diff coverage is 83.07%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1179      +/-   ##
==========================================
+ Coverage   66.03%   66.08%   +0.04%     
==========================================
  Files         135      141       +6     
  Lines        8553     9157     +604     
==========================================
+ Hits         5648     6051     +403     
- Misses       2905     3106     +201     
Impacted Files Coverage Δ
bentoml/frameworks/fastai.py 0.00% <0.00%> (ø)
bentoml/frameworks/fasttext.py 0.00% <0.00%> (ø)
bentoml/frameworks/keras.py 0.00% <0.00%> (ø)
bentoml/frameworks/lightgbm.py 0.00% <0.00%> (ø)
bentoml/frameworks/spacy.py 0.00% <0.00%> (ø)
bentoml/frameworks/tensorflow.py 79.43% <ø> (ø)
bentoml/saved_bundle/bundler.py 88.63% <ø> (ø)
bentoml/service/artifacts/text_file.py 0.00% <0.00%> (ø)
bentoml/frameworks/transformers.py 76.25% <77.77%> (+0.60%) ⬆️
bentoml/saved_bundle/config.py 88.63% <85.71%> (-0.17%) ⬇️
... and 46 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2923623...59fff9d. Read the comment docs.

@@ -104,7 +104,7 @@ def _load_from_directory(self, path):
tokenizer = getattr(
import_module("transformers"), self._tokenizer_type
).from_pretrained(path)
self._model = {"model": transformers_model, "tokenizer": tokenizer}
return {"model": transformers_model, "tokenizer": tokenizer}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this to actually just return model information after looking at how all the other frameworks currently set their self._model property

Comment on lines +20 to +23
# Protos
gen-protos: ## Build protobufs for Python and Node
@./protos/generate-docker.sh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Comment on lines 227 to 228
s = Struct()
s.update(artifact_config["metadata"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add validation and type check for the metadata?

@yubozhao
Copy link
Contributor

yubozhao commented Nov 1, 2020

@jackyzha0 let's fix the test and we should be good to go

@jackyzha0
Copy link
Collaborator Author

@jackyzha0 let's fix the test and we should be good to go

I've got some other work I need to wrap up today but I'll try to find some time to wrap it up this week

@jackyzha0 jackyzha0 marked this pull request as ready for review November 8, 2020 22:56
yubozhao
yubozhao previously approved these changes Nov 9, 2020
@@ -204,6 +210,7 @@ def setup_routes(self):
/healthz Health check ping
/feedback Submitting feedback
/metrics Prometheus metrics endpoint
/metadata BentoService Artifact Metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update the get_open_api_spec_json method in the open_api.py file to include this new endpoint?

"""
Pack the in-memory trained model object to this BentoServiceArtifact

Note: add "# pylint:disable=arguments-differ" to child class's pack method
"""
if metadata:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to move this and similarly the code in load below to the __getattribute__ method in this class. That way we can avoid modifying all the other artifact classes.

Everything else looks great! Thanks for the PR @jackyzha0! Sorry about the delay in getting to help with the review.

Copy link
Member

@parano parano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jackyzha0 - found a few small issues I didn't notice on the first pass, could you take a look at the comments above?

@@ -48,7 +56,7 @@ def name(self):
"""
return self._name

def pack(self, model):
def pack(self, model, metadata=None): # pylint: disable=unused-argument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a type annotation here? e.g. metadata: dict=None

@@ -125,6 +146,16 @@ def wrapped_load(*args, **kwargs):
"`load` an artifact multiple times may lead to unexpected "
"behaviors"
)

# load metadata if exists
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do the load lazily in the metadata property call?

@property
def metadata(self):
     if not self._metadata:
          # load 
     return self._metadata

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine that introducing two other properties to the class like _metadata_path and _metadata_loaded is adding a lot of extra complexity for just a simple YAML load, what would be the main benefit of lazy loading here?

Copy link
Member

@parano parano Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackyzha0 got it, that makes sense. I think we would just need to add a property _load_path, and if self._metadata is None can be used to check if it's loaded.

You are right there isn't much benefit besides avoiding loading this file. I would assume in most use cases, the metadata yaml file will be very small so it should no be a problem, this LGTM!

bentoml/service/artifacts/__init__.py Outdated Show resolved Hide resolved
if isinstance(kwargs['metadata'], dict):
self._metadata = kwargs['metadata']
else:
logger.warning(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for doing a warning here instead of raising a TypeError here? It seems to me it's better to raise an error explicitly, and help the user to discover this mistake early

@parano
Copy link
Member

parano commented Nov 12, 2020

Thanks for updating the PR, looking great! I will re-trigger the test run to reflect a fix just pushed to master, will merge once all tests are passing.

@jackyzha0
Copy link
Collaborator Author

Thanks for updating the PR, looking great! I will re-trigger the test run to reflect a fix just pushed to master, will merge once all tests are passing.

Just added a fix for the new TypeError exception in the test

@parano parano merged commit 9dd001a into bentoml:master Nov 13, 2020
aarnphm pushed a commit to aarnphm/BentoML that referenced this pull request Jul 29, 2022
* basic implementation of metadata saving

* make consistent format across all frameworks and artifact types

* update proto defs and include metadata in info command

* formatting fixes

* added tests and improved error messages

* more complete tests for invalid artifact metadata

* add /metadata endpoint

* add basic api_server test

* fix test for restoring metadata from saved bundle

* update metadata load/save for frameworks

* run lint+format, move to ruamel instead of regular yaml

* remove comment

* bring tf file up to date w master

* fix naming and broken tf2 test

* refactor load, save, pack back to service>artifacts>__init__

* whitespace fixes and more cleanup

* update open_api spec

* address review comments

* fix raise exception test

* remove unused vars
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add API for storing trained model metadata
4 participants