Skip to content

BentoML - v1.0.4

Compare
Choose a tag to compare
@ssheng ssheng released this 26 Aug 18:10
· 1174 commits to main since this release
6370972

馃嵄聽BentoML v1.0.4 is here!

  • Added support for explicit GPU mapping for runners. In addition to specifying the number of GPU devices allocated to a runner, we can map a list of device IDs directly to a runner through configuration.

    runners:
      iris_clf_1:
        resources:
          nvidia.com/gpu: [2, 4] # Map device 2 and 4 to iris_clf_1 runner
      iris_clf_2:
        resources:
          nvidia.com/gpu: [1, 3] # Map device 1 and 3 to iris_clf_2 runner
  • Added SSL support for API server through both CLI and configuration.

      --ssl-certfile TEXT          SSL certificate file
      --ssl-keyfile TEXT           SSL key file
      --ssl-keyfile-password TEXT  SSL keyfile password
      --ssl-version INTEGER        SSL version to use (see stdlib 'ssl' module)
      --ssl-cert-reqs INTEGER      Whether client certificate is required (see stdlib 'ssl' module)
      --ssl-ca-certs TEXT          CA certificates file
      --ssl-ciphers TEXT           Ciphers to use (see stdlib 'ssl' module)
  • Added adaptive batching size histogram metrics, BENTOML_{runner}_{method}_adaptive_batch_size_bucket, for observability of batching mechanism details.

    image

  • Added support OpenTelemetry OTLP exporter for tracing and configures the OpenTelemetry resource automatically if user has not explicitly configured it through environment variables. Upgraded OpenTelemetry python packages to version 0.33b0.

    image

  • Added support for saving external_modules alongside with models in the save_model API. Saving external Python modules is useful for models with external dependencies, such as tokenizers, preprocessors, and configurations.

  • Enhanced Swagger UI to include additional documentation and helper links.

    image

馃挕聽We continue to update the documentation on every release to help our users unlock the full power of BentoML.

  • Checkout the adaptive batching documentation on how to leverage batching to improve inference latency and efficiency.
  • Checkout the runner configuration documentation on how to customize resource allocation for runners at run time.

馃檶聽We continue to receive great engagement and support from the BentoML community.

  • Shout out to @sptowey for their contribution on adding SSL support.
  • Shout out to @dbuades for their contribution on adding the OTLP exporter.
  • Shout out to @tweeklab for their contribution on fixing a bug on import_model in the MLflow framework.

What's Changed

New Contributors

Full Changelog: v1.0.3...v1.0.4