diff --git a/docs/source/_static/img/jaeger-ui.png b/docs/source/_static/img/jaeger-ui.png new file mode 100644 index 00000000000..f0a872b9662 Binary files /dev/null and b/docs/source/_static/img/jaeger-ui.png differ diff --git a/docs/source/concepts/runner.rst b/docs/source/concepts/runner.rst index 098a538ff4a..a63bba8fcc1 100644 --- a/docs/source/concepts/runner.rst +++ b/docs/source/concepts/runner.rst @@ -2,6 +2,11 @@ Using Runners ============= +*time expected: 15 minutes* + +This page articulates on the concept of Runners and demonstrates its role within the +BentoML architecture. + What is Runner? --------------- @@ -56,6 +61,10 @@ methods. Custom Runner ------------- +For more advanced use cases, BentoML also allows users to define their own Runner +classes. This is useful when the pre-built Runners do not meet the requirements, or +when the user wants to implement a Runner for a new ML framework. + Creating a Runnable ^^^^^^^^^^^^^^^^^^^ @@ -300,6 +309,7 @@ Runner Configuration -------------------- Runner behaviors and resource allocation can be specified via BentoML :ref:`configuration `. + Runners can be both configured individually or in aggregate under the ``runners`` configuration key. To configure a specific runner, specify its name under the ``runners`` configuration key. Otherwise, the configuration will be applied to all runners. The examples below demonstrate both the configuration for all runners in aggregate and for an individual runner (``iris_clf``). @@ -313,29 +323,29 @@ To explicitly disable or control adaptive batching behaviors at runtime, configu .. tab-set:: .. tab-item:: All Runners - :sync: all_runners + :sync: all_runners + + .. code-block:: yaml + :caption: ⚙️ `configuration.yml` + + runners: + batching: + enabled: true + max_batch_size: 100 + max_latency_ms: 500 - .. code-block:: yaml - :caption: ⚙️ `configuration.yml` - - runners: - batching: - enabled: true - max_batch_size: 100 - max_latency_ms: 500 - .. tab-item:: Individual Runner :sync: individual_runner - + .. code-block:: yaml - :caption: ⚙️ `configuration.yml` + :caption: ⚙️ `configuration.yml` - runners: - iris_clf: - batching: - enabled: true - max_batch_size: 100 - max_latency_ms: 500 + runners: + iris_clf: + batching: + enabled: true + max_batch_size: 100 + max_latency_ms: 500 Resource Allocation ^^^^^^^^^^^^^^^^^^^ @@ -346,27 +356,27 @@ through configuration, with a `float` value for ``cpu`` and an `int` value for ` .. tab-set:: .. tab-item:: All Runners - :sync: all_runners + :sync: all_runners - .. code-block:: yaml - :caption: ⚙️ `configuration.yml` + .. code-block:: yaml + :caption: ⚙️ `configuration.yml` + + runners: + resources: + cpu: 0.5 + nvidia.com/gpu: 1 - runners: - resources: - cpu: 0.5 - nvidia.com/gpu: 1 - .. tab-item:: Individual Runner :sync: individual_runner - + .. code-block:: yaml - :caption: ⚙️ `configuration.yml` + :caption: ⚙️ `configuration.yml` - runners: - iris_clf: - resources: - cpu: 0.5 - nvidia.com/gpu: 1 + runners: + iris_clf: + resources: + cpu: 0.5 + nvidia.com/gpu: 1 Alternatively, a runner can be mapped to a specific set of GPUs. To specify GPU mapping, instead of defining an `integer` value, a list of device IDs can be specified for the ``nvidia.com/gpu`` key. For example, the following configuration maps the configured runners to GPU device 2 and 4. @@ -374,25 +384,25 @@ can be specified for the ``nvidia.com/gpu`` key. For example, the following conf .. tab-set:: .. tab-item:: All Runners - :sync: all_runners + :sync: all_runners - .. code-block:: yaml - :caption: ⚙️ `configuration.yml` + .. code-block:: yaml + :caption: ⚙️ `configuration.yml` + + runners: + resources: + nvidia.com/gpu: [2, 4] - runners: - resources: - nvidia.com/gpu: [2, 4] - .. tab-item:: Individual Runner - :sync: individual_runner - - .. code-block:: yaml - :caption: ⚙️ `configuration.yml` + :sync: individual_runner + + .. code-block:: yaml + :caption: ⚙️ `configuration.yml` - runners: - iris_clf: - resources: - nvidia.com/gpu: [2, 4] + runners: + iris_clf: + resources: + nvidia.com/gpu: [2, 4] Timeout ^^^^^^^ @@ -402,23 +412,23 @@ Runner timeout defines the amount of time in seconds to wait before calls a runn .. tab-set:: .. tab-item:: All Runners - :sync: all_runners + :sync: all_runners - .. code-block:: yaml - :caption: ⚙️ `configuration.yml` + .. code-block:: yaml + :caption: ⚙️ `configuration.yml` + + runners: + timeout: 60 - runners: - timeout: 60 - .. tab-item:: Individual Runner - :sync: individual_runner - - .. code-block:: yaml - :caption: ⚙️ `configuration.yml` + :sync: individual_runner + + .. code-block:: yaml + :caption: ⚙️ `configuration.yml` - runners: - iris_clf: - timeout: 60 + runners: + iris_clf: + timeout: 60 Access Logging ^^^^^^^^^^^^^^ diff --git a/docs/source/conf.py b/docs/source/conf.py index 7f418ab3c34..4646cfe56e5 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -35,6 +35,7 @@ "sphinx.ext.viewcode", "sphinx.ext.ifconfig", "sphinx.ext.intersphinx", + "sphinx.ext.mathjax", "sphinx.ext.extlinks", "sphinx_click.ext", "sphinx_copybutton", diff --git a/docs/source/guides/configuration.rst b/docs/source/guides/configuration.rst index b250b3bfdc0..ef9bb76f9a7 100644 --- a/docs/source/guides/configuration.rst +++ b/docs/source/guides/configuration.rst @@ -2,36 +2,62 @@ Configuration ============= -BentoML starts with an out-of-the-box configuration that works for common use cases. For advanced users, many -features can be customized through configuration. Both BentoML CLI and Python APIs can be customized -by the configuration. Configuration is best used for scenarios where the customizations can be specified once -and applied to the entire team. +*time expected: 11 minutes* -BentoML configuration is defined by a YAML file placed in a directory specified by the ``BENTOML_CONFIG`` -environment variable. The example below starts the bento server with configuration defined in ``~/bentoml_configuration.yaml``: +BentoML provides a configuration interface that allows you to customize the runtime +behaviour of your BentoService. This article highlight and consolidates the configuration +fields definition, as well as some of recommendation for best-practice when configuring +your BentoML. -.. code-block:: shell + Configuration is best used for scenarios where the customizations can be specified once + and applied anywhere among your organization using BentoML. - $ BENTOML_CONFIG=~/bentoml_configuration.yaml bentoml serve iris_classifier:latest +BentoML comes with out-of-the-box configuration that should work for most use cases. -Users only need to specify a partial configuration with only the properties they wish to customize instead -of a full configuration schema. In the example below, the microbatching workers count is overridden to 4. -Remaining properties will take their defaults values. +However, for more advanced users who wants to fine-tune the feature suites BentoML has to offer, +users can configure such runtime variables and settings via a configuration file, often referred to as +``bentoml_configuration.yaml``. + +.. note:: + + This is not to be **mistaken** with the ``bentofile.yaml`` which is used to define and + package your :ref:`Bento 🍱 ` + + This configuration file are for BentoML runtime configuration. + +Providing configuration during serve runtime +-------------------------------------------- + +BentoML configuration is a :wiki:`YAML` file which can then be specified via the environment variable ``BENTOML_CONFIG``. + +For example, given the following ``bentoml_configuration.yaml`` that specify that the +server should only use 4 workers: .. code-block:: yaml :caption: `~/bentoml_configuration.yaml` - api_server: - workers: 4 - timeout: 60 - http: - port: 6000 + version: 1 + api_server: + workers: 4 + +Said configuration then can be parsed to :ref:`bentoml serve ` like +below: + +.. code-block:: bash + + » BENTOML_CONFIG=~/bentoml_configuration.yaml bentoml serve iris_classifier:latest --production + +.. note:: + + Users will only have to specify a partial configuration with properties they wish to customize. BentoML + will then fill in the rest of the configuration with the default values [#default_configuration]_. -Throughout the BentoML documentation, features that are customizable through configuration are demonstrated -like the example above. For a full configuration schema including all customizable properties, refer to -the BentoML configuration template defined in :github:`default_configuration.yml `. + In the example above, the number of API workers count is overridden to 4. + Remaining properties will take their defaults values. +.. seealso:: + :ref:`guides/configuration:Configuration fields` Overrding configuration with environment variables @@ -63,25 +89,270 @@ Which the override configuration will be intepreted as: :alt: Configuration override environment variable -Docker Deployment ------------------ +Mounting configuration to containerized Bento +--------------------------------------------- + +To mount a configuration file to a containerized BentoService, user can use the +|volume_mount|_ option to mount the configuration file to the container and +|env_flag|_ option to set the ``BENTOML_CONFIG`` environment variable: + +.. code-block:: bash + + $ docker run --rm -v /path/to/configuration.yml:/home/bentoml/configuration.yml \ + -e BENTOML_CONFIG=/home/bentoml/configuration.yml \ + iris_classifier:6otbsmxzq6lwbgxi serve --production + +Voila! You have successfully mounted a configuration file to your containerized BentoService. + +.. _env_flag: https://docs.docker.com/engine/reference/commandline/run/#set-environment-variables--e---env---env-file + +.. |env_flag| replace:: ``-e`` + +.. _volume_mount: https://docs.docker.com/storage/volumes/#choose-the--v-or---mount-flag + +.. |volume_mount| replace:: ``-v`` + + +Configuration fields +-------------------- + +On the top level, BentoML configuration [#default_configuration]_ has three fields: + +* ``version``: The version of the configuration file. This is used to determine the + compatibility of the configuration file with the current BentoML version. + +* ``api_server``: Configuration for BentoML API server. + +* ``runners`` [#runners_configuration]_: Configuration for BentoService runners. + +``version`` +^^^^^^^^^^^ + +BentoML configuration provides a ``version`` field, which enables users to easily specify +and upgrade their configuration file as BentoML evolves. + +This field will follow BentoML major version number. For every patch releases that +introduces new configuration fields, a compatibility layer will be provided to ensure +there is no breaking changes. + +.. epigraph:: + + Note that ``version`` is not a required field, and BentoML will default to version 1 if + it is not specified. + + However, we encourage users to always version their BentoML configuration. + +``api_server`` +^^^^^^^^^^^^^^ + +The following options are available for the ``api_server`` section: + ++-------------+-------------------------------------------------------------+-------------------------------------------------+ +| Option | Description | Default | ++=============+=============================================================+=================================================+ +| ``workers`` | Number of API workers for to spawn | null [#default_workers]_ | ++-------------+-------------------------------------------------------------+-------------------------------------------------+ +| ``timeout`` | Timeout for API server in seconds | 60 | ++-------------+-------------------------------------------------------------+-------------------------------------------------+ +| ``backlog`` | Maximum number of connections to hold in backlog | 2048 | ++-------------+-------------------------------------------------------------+-------------------------------------------------+ +| ``metrics`` | Key and values to enable metrics feature | See :ref:`guides/configuration:\`\`metrics\`\`` | ++-------------+-------------------------------------------------------------+-------------------------------------------------+ +| ``logging`` | Key and values to enable logging feature | See :ref:`guides/logging:Logging Configuration` | ++-------------+-------------------------------------------------------------+-------------------------------------------------+ +| ``http`` | Key and values to configure HTTP API server | See :ref:`guides/configuration:\`\`http\`\`` | ++-------------+-------------------------------------------------------------+-------------------------------------------------+ +| ``grpc`` | Key and values to configure gRPC API server | See :ref:`guides/configuration:\`\`grpc\`\`` | ++-------------+-------------------------------------------------------------+-------------------------------------------------+ +| ``ssl`` | Key and values to configure SSL | See :ref:`guides/configuration:\`\`ssl\`\`` | ++-------------+-------------------------------------------------------------+-------------------------------------------------+ +| ``tracing`` | Key and values to configure tracing exporter for API server | See :doc:`/guides/tracing` | ++-------------+-------------------------------------------------------------+-------------------------------------------------+ + +``metrics`` +""""""""""" + +BentoML utilises `Prometheus `_ to collect metrics from the API server. By default, this feature is enabled. + +To disable this feature, set ``api_server.metrics.enabled`` to ``false``: + +.. code-block:: yaml + + api_server: + metrics: + enabled: false + +Following `labeling convention `_ set by Prometheus, metrics generated +by BentoML API server components will have ``namespace`` `bentoml_api_server`, which can +also be overridden by setting ``api_server.metrics.namespace``: + +.. code-block:: yaml + + api_server: + metrics: + namespace: custom_namespace + +.. epigraph:: + + :bdg-info:`Note:` for most use cases, users should not need to change the default ``namespace`` value. + +There are three types of metrics every BentoML API server will generate: + +- ``request_duration_seconds``: This is a `Histogram `_ that measures the HTTP request duration in seconds. + + There are two ways for users to customize `duration bucket size `_ for this metrics: + + - Provides a manual bucket steps via ``api_server.metrics.duration.buckets``: + + .. code-block:: yaml + + api_server: + metrics: + duration: + buckets: [0.1, 0.2, 0.5, 1, 2, 5, 10] + + - Automatically generate an exponential buckets with any given ``min``, ``max`` and ``factor``: + + .. code-block:: yaml + + api_server: + metrics: + duration: + min: 0.1 + max: 10 + factor: 1.2 + + .. note:: + + - ``duration.min``, ``duration.max`` and ``duration.factor`` are mutually exclusive with ``duration.buckets``. + + - ``duration.factor`` must be greater than 1. + + By default, BentoML will respect the default `duration buckets `_ provided by Prometheus. + +- ``request_total``: This is a `Counter `_ that measures the total number of HTTP requests. + +- ``request_in_progress``: This is a `Gauge `_ that measures the number of HTTP requests in progress. + +The following options are available for the ``metrics`` section: + ++----------------------+-------------------------------------+-------------------------------------------------------+ +| Option | Description | Default | ++======================+=====================================+=======================================================+ +| ``enabled`` | Enable metrics feature | ``true`` | ++----------------------+-------------------------------------+-------------------------------------------------------+ +| ``namespace`` | Namespace for metrics | ``bentoml_api_server`` | ++----------------------+-------------------------------------+-------------------------------------------------------+ +| ``duration.buckets`` | Duration buckets for Histogram | Prometheus bucket value [#prometheus_default_bucket]_ | ++----------------------+-------------------------------------+-------------------------------------------------------+ +| ``duration.factor`` | factor for exponential buckets | null | ++----------------------+-------------------------------------+-------------------------------------------------------+ +| ``duration.max`` | upper bound for exponential buckets | null | ++----------------------+-------------------------------------+-------------------------------------------------------+ +| ``duration.min`` | lower bound for exponential buckets | null | ++----------------------+-------------------------------------+-------------------------------------------------------+ + +``http`` +"""""""" + +Configuration under ``api_server.http`` will be used to configure the HTTP API server. + +By default, BentoML will start an HTTP API server on port 3000. To change the port, set ``api_server.http.port``: + +.. code-block:: yaml + + api_server: + http: + port: 5000 + +Users can also configure `CORS `_ via ``api_server.http.cors``. By default CORS is disabled. + +If specified, all fields under ``api_server.http.cors`` will then be parsed to `CORSMiddleware `_: + +.. code-block:: yaml + + api_server: + http: + cors: + enabled: true + allow_origin: ["myorg.com"] + allow_methods: ["GET", "OPTIONS", "POST", "HEAD", "PUT"] + allow_credentials: true + allow_headers: "*" + allow_origin_regex: 'https://.*\.my_org\.com' + max_age: 1200 + expose_headers: ["Content-Length"] + +``grpc`` +"""""""" + +This section will go through configuration that is not yet coverred in :ref:`our guides on performance tuning `. + +Similar to HTTP API server, BentoML will start a gRPC API server on port 3000 by default. To change the port, set ``api_server.grpc.port``: + +.. code-block:: yaml + + api_server: + grpc: + port: 5000 + +Note that when using :ref:`bentoml serve-grpc ` and metrics is +enabled, a Prometheus metrics server will be started as a sidecar on port 3001. To change the port, set ``api_server.grpc.metrics.port``: + +.. code-block:: yaml + + api_server: + grpc: + metrics: + port: 50051 + +By default, the gRPC API server will disable reflection. To always enable :github:`server reflection `, +set ``api_server.grpc.reflection.enabled`` to ``true``: + +.. code-block:: yaml + + api_server: + grpc: + reflection: + enabled: true + +.. note:: + + User can already enable reflection by passing ``--enable-reflection`` to :ref:`bentoml serve-grpc ` CLI command. + + However, we also provide this option in the config file to make it easier for users who wish to always enable reflection. + +``ssl`` +""""""" + +BentoML supports SSL/TLS for both HTTP and gRPC API server. To enable SSL/TLS, set ``api_server.ssl.enabled`` to ``true``: + +.. code-block:: yaml + + api_server: + ssl: + enabled: true + +When using HTTP API server, BentoML will parse all of the available fields directly to `Uvicorn `_. + +.. TODO:: + + - Add instruction how one can setup SSL for gRPC API server. + +---- + +.. rubric:: Notes -Configuration file can be mounted to the Docker container using the `-v` option and specified to the BentoML -runtime using the `-e` environment variable option. +.. [#default_workers] The default number of workers is the number of CPUs count. -.. code-block:: shell +.. [#default_configuration] The default configuration can also be found under :github:`configuration folder `. - $ docker run -v /local/path/configuration.yml:/home/bentoml/configuration.yml -e BENTOML_CONFIG=/home/bentoml/configuration.yml + .. dropdown:: `Expands for default configuration` + :icon: code + .. literalinclude:: ../../../bentoml/_internal/configuration/v1/default_configuration.yaml + :language: yaml -.. spelling:: +.. [#prometheus_default_bucket] The default buckets is specified `here `_ for Python client. - customizations - microbatching - customizable - multiproc - dir - tls - apiserver - uri - gcs +.. [#runners_configuration] See :ref:`Runners' configuration ` diff --git a/docs/source/guides/containerization.rst b/docs/source/guides/containerization.rst index efee1effa3f..86ec0c580fb 100644 --- a/docs/source/guides/containerization.rst +++ b/docs/source/guides/containerization.rst @@ -2,6 +2,8 @@ Advanced Containerization ========================= +*time expected: 12 minutes* + This guide describes advanced containerization options provided by BentoML: diff --git a/docs/source/guides/grpc.rst b/docs/source/guides/grpc.rst index 82aea0b6ba7..0df097bb182 100644 --- a/docs/source/guides/grpc.rst +++ b/docs/source/guides/grpc.rst @@ -2,6 +2,8 @@ Serving with gRPC ================= +*time expected: 12 minutes* + This guide will demonstrate advanced features that BentoML offers for you to get started with `gRPC `_: @@ -1477,6 +1479,7 @@ A quick overview of the available configuration for gRPC: ``max_concurrent_streams`` ^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. epigraph:: :bdg-info:`Definition:` Maximum number of concurrent incoming streams to allow on a HTTP2 connection. By default we don't set a limit cap. HTTP/2 connections typically has limit of `maximum concurrent streams `_ @@ -1505,6 +1508,7 @@ on a connection at one time. ``maximum_concurrent_rpcs`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. epigraph:: :bdg-info:`Definition:` The maximum number of concurrent RPCs this server will service before returning ``RESOURCE_EXHAUSTED`` status. By default we set to ``None`` to indicate no limit, and let gRPC to decide the limit. @@ -1514,6 +1518,7 @@ By default we set to ``None`` to indicate no limit, and let gRPC to decide the l ``max_message_length`` ^^^^^^^^^^^^^^^^^^^^^^ +.. epigraph:: :bdg-info:`Definition:` The maximum message length in bytes allowed to be received on/can be send to the server. By default we set to ``-1`` to indicate no limit. diff --git a/docs/source/guides/logging.rst b/docs/source/guides/logging.rst index 22c91211e3b..c89a99e15ed 100644 --- a/docs/source/guides/logging.rst +++ b/docs/source/guides/logging.rst @@ -2,6 +2,8 @@ Logging ======= +*time expected: 6 minutes* + Server Logging -------------- @@ -11,7 +13,7 @@ webservices are logged along with requests to each of the model runner services. The request log format is as follows: .. parsed-literal:: - + time [LEVEL] [component] ClientIP:ClientPort (scheme,method,path,type,length) (status,type,length) Latency (trace,span,sampled) For example, a log message might look like: @@ -25,41 +27,38 @@ OpenTelemetry Compatible ^^^^^^^^^^^^^^^^^^^^^^^^ The BentoML logging system implements the `OpenTelemetry `_ standard -for `http `_ +for :github:`HTTP ` throughout the call stack to provide for maximum debuggability. Propogation of the OpenTelemetry -parameters follows the standard provided -`here `_ +parameters follows the standard provided `here `_ The following are parameters which are provided in the logs as well for correlation back to particular requests. -- `trace` is the id of a trace which tracks “the progression of a single request, as it is handled - by services that make up an application” - - `OpenTelemetry Basic Documentation `_ -- `span is` the id of a span which is contained within a trace. “A span is the building block of a - trace and is a named, timed operation that represents a piece of the workflow in the distributed - system. Multiple spans are pieced together to create a trace.” - - `OpenTelemetry Span Documentation `_ -- `sampled is` the number of times this trace has been sampled. “Sampling is a mechanism to control - the noise and overhead introduced by OpenTelemetry by reducing the number of samples of traces - collected and sent to the backend.” - - `OpenTelemetry SDK Documentation `_ +- `trace_id` is the id of a trace which tracks “the progression of a single request, as it is handled by services that make up an application.” [#basic_documentation]_ + +- `span_id is` the id of a span which is contained within a trace. + + .. epigraph:: + “A span is the building block of a trace and is a named, timed operation that represents a piece of the workflow in the distributed system. Multiple spans are pieced together to create a trace.” [#span_documentation]_ + +- `sampled_id is` the number of times this trace has been sampled. + + .. epigraph:: + “Sampling is a mechanism to control the noise and overhead introduced by OpenTelemetry by reducing the number of samples of traces collected and sent to the backend.” [#sampling_documentation]_ Logging Configuration ^^^^^^^^^^^^^^^^^^^^^ Access logs can be configured by setting the appropriate flags in the bento configuration file for both web requests and model serving requests. Read more about how to use a bento configuration file -here in the - :ref:`Configuration Guide ` +here in the - :ref:`Configuration Guide ` -To configure other logs, use the -`default python logging configuration `_. All BentoML -logs are logged under the ``"bentoml"`` namespace. +To configure other logs, please use the `default Python logging configuration `_. All BentoML logs are logged under the ``bentoml`` namespace. -Web Service Request Logging +API Service Request Logging """"""""""""""""""""""""""" -For web requests, logging can be enabled and disabled using the `api_server.logging.access` parameter at the +For requests made to the API server, logging can be enabled and disabled using the ``api_server.logging.access`` parameter at the top level of the ``bentoml_configuration.yml``. .. code-block:: yaml @@ -67,43 +66,43 @@ top level of the ``bentoml_configuration.yml``. api_server: logging: access: - enabled: False + enabled: true # whether to log the size of the request body - request_content_length: True + request_content_length: true # whether to log the content type of the request - request_content_type: True + request_content_type: true # whether to log the content length of the response - response_content_length: True + response_content_length: true # whether to log the content type of the response - response_content_type: True + response_content_type: true -Model Runner Request Logging +Model Server Request Logging """""""""""""""""""""""""""" -Depending on how you've configured BentoML, the webserver may be separated from the model runner. -In either case, we have special logging that is enabled specifically on the model side of the -request. You may configure the runner access logs under the runners parameter at the top level of -your ``bentoml_configuration.yml``: +Depending on how you've configured BentoML, the API server and runner server can be run +separately. In either case, BentoML also provides a logging configuration under +``runners`` to allow user to configure the runner server output logs. + +You may configure the runner access logs under the runners parameter at the top level of your ``bentoml_configuration.yml``: .. code-block:: yaml runners: logging: access: - enabled: True + enabled: true ... -The available configuration options are identical to the webserver request logging options above. -These logs are disabled by default in order to prevent double logging of requests. - +The available configuration options are the same to the one for API service request logging options above. Access Logging Format """"""""""""""""""""" You may configure the format of the Trace and Span IDs in the access logs in ``bentoml_configuration.yml``. + The default configuration is shown below, where the opentelemetry ``trace_id`` and ``span_id`` are logged in -hexadecimal format, consistent with opentelemetry logging instrumentation. You may also configure other format +hexadecimal format, consistent with OpenTelemetry logging instrumentation. You may also configure other format specs, such as decimal ``d``. .. code-block:: yaml @@ -133,3 +132,13 @@ When using BentoML as a library, BentoML does not configure any logs. By default bentoml_logger.addHandler(ch) bentoml_logger.setLevel(logging.DEBUG) +---- + +.. rubric:: Notes + +.. [#basic_documentation] `OpenTelemetry Basic Documentation `_ + +.. [#span_documentation] `OpenTelemetry Span Documentation `_ + +.. [#sampling_documentation] `OpenTelemetry SDK Documentation `_ + diff --git a/docs/source/guides/snippets/tracing/bentoml_configuration.yaml b/docs/source/guides/snippets/tracing/bentoml_configuration.yaml new file mode 100644 index 00000000000..f0b331e41fb --- /dev/null +++ b/docs/source/guides/snippets/tracing/bentoml_configuration.yaml @@ -0,0 +1,21 @@ +version: 1 +api_server: + workers: 6 + grpc: + reflection: + enabled: true + max_concurrent_streams: 100 + maximum_concurrent_rpcs: 1 + tracing: + sample_rate: 0.7 + exporter_type: jaeger + jaeger: + thrift: + agent_host_name: jaeger + agent_port: 6831 +runners: + logging: + access: + enabled: false + iris_clf: + resources: system diff --git a/docs/source/guides/snippets/tracing/docker-compose.yml b/docs/source/guides/snippets/tracing/docker-compose.yml new file mode 100644 index 00000000000..02e80792f76 --- /dev/null +++ b/docs/source/guides/snippets/tracing/docker-compose.yml @@ -0,0 +1,34 @@ +version: "3.7" +services: + jaeger: + image: jaegertracing/all-in-one:1.38 + ports: + - "6831:6831/udp" + - "16686:16686" + - "14268:14268" + - "5778:5778" + - "4317:4317" + - "4318:4318" + networks: + - jaeger-network + environment: + - COLLECTOR_OTLP_ENABLED=true + iris_classifier: + image: iris_classifier:klncyjcfqwldtgxi + volumes: + - ./bentoml_configuration.yaml:/home/bentoml/bentoml_configuration.yaml + ports: + - "3000:3000" + - "3001:3001" + command: ["serve-grpc", "--production"] + environment: + - OTEL_EXPORTER_JAEGER_AGENT_HOST=jaeger + - OTEL_EXPORTER_JAEGER_AGENT_PORT=6831 + - BENTOML_CONFIG=/home/bentoml/bentoml_configuration.yaml + networks: + - jaeger-network + depends_on: + - jaeger + +networks: + jaeger-network: diff --git a/docs/source/guides/tracing.rst b/docs/source/guides/tracing.rst index 39c29aaa0bc..18b2dce1e4a 100644 --- a/docs/source/guides/tracing.rst +++ b/docs/source/guides/tracing.rst @@ -2,9 +2,32 @@ Tracing ======= -BentoML API server supports tracing with `Zipkin `_, +*time expected: 8 minutes* + +This guide dives into the :wiki:`tracing ` capabilities that BentoML offers. + +BentoML allows user to export trace with `Zipkin `_, `Jaeger `_ and `OTLP `_. +This guide will also provide a simple example of how to use BentoML tracing with `Jaeger `_ + +Why do you need this? +--------------------- + +Debugging models and services in production is hard. Adding logs and identifying +the root cause of the problem is time consuming and error prone. Additionally, tracking +logs across multiple services is difficult, which takes a lot of time, and slow down +your development agility. As a result, logs won’t always provide the required information to solve regressions. + +Tracing encompasses a much wider, continuous view of an application. The goal of tracing is to following a program’s flow and data progression. +As such, there is a lot more information at play; tracing can be a lot noisier than logging – and that’s intentional. + +BentoML comes with built-in tracing support, with :ref:`OpenTelemetry `. This means users +can then use any of the OpenTelemetry compatible tracing tools to visualize and analyze the traces. + +Running a BentoService +---------------------- + :bdg-info:`Requirements:` bentoml must be installed with the extras dependencies for tracing exporters. The following command will install BentoML with its coresponding tracing exporter: @@ -29,47 +52,121 @@ tracing exporter: pip install "bentoml[tracing-otlp]" -To config tracing server, user can provide a config YAML file specifying the tracer type and tracing server information: +We will be using the example from :ref:`the quickstart `. -.. code-block:: yaml +Run the Jaeger `all-in-one `_ docker image: - tracing: - type: jaeger - sample_rate: 1.0 - zipkin: - url: http://localhost:9411/api/v2/spans - jaeger: - address: localhost - port: 6831 - otlp: - protocol: grpc - url: http://localhost:4317 +.. code-block:: bash -By default, no traces will be collected. Set sample_rate to your desired fraction in order to start collecting them. -Here is an example config for tracing with a Zipkin server: + » docker run -d --name jaeger \ + -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \ + -e COLLECTOR_OTLP_ENABLED=true \ + -p 6831:6831/udp \ + -p 6832:6832/udp \ + -p 5778:5778 \ + -p 16686:16686 \ + -p 4317:4317 \ + -p 4318:4318 \ + -p 14250:14250 \ + -p 14268:14268 \ + -p 14269:14269 \ + -p 9411:9411 \ + jaegertracing/all-in-one:1.38 -.. code-block:: yaml +.. dropdown:: For our Mac users + :icon: cpu - tracing: - type: zipkin - sample_rate: 1.0 - zipkin: - url: http://localhost:9411/api/v2/spans + If you are running into this error: -When using Zipkin tracer, BentoML only supports its v2 protocol. If you are reporting to -the an OpenZipkin server directly, make sure to add the URL path :code:`/api/v2/spans` -to the server address. + .. parsed-literal:: + + 2022-10-05T01:32:21-0700 [WARNING] [api_server:iris_classifier:8] Data exceeds the max UDP packet size; size 216659, max 65000 + 2022-10-05T01:32:24-0700 [ERROR] [api_server:iris_classifier:3] Exception while exporting Span batch. + Traceback (most recent call last): + File "~/venv/lib/python3.10/site-packages/opentelemetry/sdk/trace/export/__init__.py", line 367, in _export_batch + self.span_exporter.export(self.spans_list[:idx]) # exporter_type: ignore + File "~/venv/lib/python3.10/site-packages/opentelemetry/exporter/jaeger/thrift/__init__.py", line 219, in export + self._agent_client.emit(batch) + File "~/venv/lib/python3.10/site-packages/opentelemetry/exporter/jaeger/thrift/send.py", line 95, in emit + udp_socket.sendto(buff, self.address) + OSError: [Errno 40] Message too long -Here is another example config file for tracing with Jaeger and opentracing: + This is because the default UDP packet size on Mac is set 9216 bytes, which is described `under Jaeger reporters `_. To increase the UDP packet size, run the following command: + + .. code-block:: bash + + % sysctl net.inet.udp.maxdgram + # net.inet.udp.maxdgram: 9216 + % sudo sysctl net.inet.udp.maxdgram=65536 + # net.inet.udp.maxdgram: 9216 -> 65536 + % sudo sysctl net.inet.udp.maxdgram + # net.inet.udp.maxdgram: 65536 + + +To configure Jaeger exporter, user can provide a config :wiki:`YAML` file specifying the tracer type and tracing server information under ``api_server.tracing``: + +.. literalinclude: ./snippets/tracing/bentoml_configuration.yaml + :language: yaml + :caption: `bentoml_configuration.yaml` + +Provide this configuration via environment variable ``BENTOML_CONFIG`` to ``bentoml serve``: + +.. code-block:: bash + + » BENTOML_CONFIG=bentoml_configuration.yaml bentoml serve iris_classifier:latest --production + +Send any request to the BentoService, and then you can visit the `Jaeger UI `_ to see the traces. + +.. image:: /_static/img/jaeger-ui.png + :alt: Jaeger UI + +Tracing your containerized BentoService +--------------------------------------- + +If you are running your BentoService within a container, you can use the following ``docker-compose`` configuration to run Jaeger and your BentoService together: + +.. literalinclude:: ./snippets/tracing/docker-compose.yml + :language: yaml + :caption: `docker-compose.yml` + +Start the services with ``docker-compose -f ./docker-compose.yml up`` + +To shutdown the services, run ``docker-compose -f ./docker-compose.yml down`` + +Exporter Configuration +---------------------- + +.. note:: + + BentoML implements OpenTelemetry APIs, which means OpenTelemetry environment variables + will take precedence over the configuration file. + + For example, if you have the following configuration in your config file: + + .. code-block:: yaml + + api_server: + tracing: + exporter_type: jaeger + sample_rate: 1.0 + jaeger: + protocol: thrift + thrift: + agent_host_name: localhost + + Then environment variable ``OTEL_EXPORTER_JAEGER_AGENT_HOST`` will take precedence over the + ``agent_host_name`` setting in the config file. + +The following section describes the configuration options for each tracing exporter. + +By default, no traces will be collected. Set ``sample_rate`` to your desired fraction in order to start collecting them: .. code-block:: yaml - tracing: - type: jaeger - sample_rate: 1.0 - jaeger: - address: localhost - port: 6831 + api_server: + tracing: + exporter_type: zipkin + sample_rate: 1.0 If you would like to exclude some routes from tracing, you can specify them using the :code:`excluded_urls` parameter. This parameter can be either a comma-separated @@ -78,52 +175,221 @@ string of routes, or a list of strings. .. code-block:: yaml tracing: - type: jaeger + exporter_type: jaeger sample_rate: 1.0 jaeger: address: localhost port: 6831 excluded_urls: readyz,livez,healthz,static_content,docs,metrics +To set a timeout for the exporter, where it will wait for each batch export, use the ``timeout`` parameter: + +.. code-block:: yaml -Finally, here is an example using OTLP. This allows easy integration with an OpenTelemetry Traces receiver. -You may use either HTTP or gRPC as protocol. gRPC is the default, but HTTP may be easier to proxy or load-balance. + tracing: + exporter_type: jaeger + sample_rate: 1.0 + timeout: 5 [#default_timeout]_ + +To set a maximum length string attribute values can have, use the ``max_tag_value_length`` parameter: .. code-block:: yaml tracing: - type: otlp + exporter_type: jaeger sample_rate: 1.0 - otlp: - protocol: grpc - url: http://localhost:4317 + max_tag_value_length: 256 + +.. note:: -If using HTTP, you must set the whole Traces receiver endpoint path (e.g. `/v1/traces` for OpenTelemetry Collector): + All of the above value are shared values among the exporters. This means it will be + applied to corresponding exporter that is set via ``exporter_type``. + +Zipkin +^^^^^^ + +When using Zipkin, BentoML only supports its V2 protocol. If you are reporting to +the an OpenZipkin server directly, make sure to add the URL path ``/api/v2/spans`` +to the server address. + +Configuration fields are passed through the OpenTelemetry Zipkin exporter +[#otlp_zipkin_exporter_docs]_. .. code-block:: yaml tracing: - type: otlp + exporter_type: zipkin sample_rate: 1.0 - otlp: - protocol: http - url: http://localhost:4318/v1/traces + zipkin: + endpoint: http://localhost:9411/api/v2/spans + local_node_ipv4: "192.168.0.1" + local_node_ipv6: "2001:db8::c001" + local_node_port: 31313 -When starting a BentoML API model server, provide the path to this config file -by setting the environment variable :code:`BENTOML_CONFIG`: +Jaeger +^^^^^^ -.. code-block:: bash +The Jaeger exporter supports sending trace over both the Thrift and gRPC protocol. By default, BentoML +will use the Thrift protocol. + +.. note:: - BENTOML_CONFIG=my_config_file.yml bentoml serve $BENTO_BUNDLE_PATH + When it is not feasible to deploy Jaeger Agent next to the application, for example, when the + application code is running as Lambda function, a collector can be configured to send spans + using Thrift over HTTP. If both agent and collector are configured, the exporter sends traces + only to the collector to eliminate the duplicate entries. [#otlp_jaeger_exporter_docs]_. + +To setup the collector endpoint that will be used to receive either Thrift or Protobuf +over HTTP/gRPC, use the ``collector_endpoint`` parameter: +.. tab-set:: -Similarly when serving with BentoML API server docker image, assuming you have a -:code:`my_config_file.yml` file ready in current directory: + .. tab-item:: Thrift over HTTP + :sync: http -.. code-block:: bash + .. code-block:: yaml + + tracing: + exporter_type: jaeger + sample_rate: 1.0 + jaeger: + collector_endpoint: http://localhost:14268/api/traces?format=jaeger.thrift + + .. tab-item:: Protobuf over gRPC + :sync: grpc + + .. code-block:: yaml + + tracing: + exporter_type: jaeger + sample_rate: 1.0 + jaeger: + collector_endpoint: http://localhost:14250 + +Configuration fields are passed through the OpenTelemetry Zipkin exporter +[#jaeger_source]_. + +.. tab-set:: + + .. tab-item:: Thrift + :sync: http + + .. code-block:: yaml + + tracing: + exporter_type: jaeger + sample_rate: 1.0 + jaeger: + protocol: thrift + thrift: + agent_host_name: localhost + agent_port: 6831 + udp_split_oversized_batches: true + + .. note:: + + if ``udp_split_oversized_batches`` [#default_udp_split_oversized_batches]_ is + True, the oversized batch will be split into smaller batch over the UDP max + packets size (default: `65000`) if given buffer is larger than max + packet size: + + .. math:: + + \mathrm{packets}\triangleq \left\lceil{\frac{\text{len}\left(\text{buff}\right)}{\text{max_packet_size}}}\right\rceil + + + .. tab-item:: gRPC + :sync: grpc + + .. code-block:: yaml + + tracing: + exporter_type: jaeger + sample_rate: 1.0 + jaeger: + protocol: grpc + grpc: + endpoint: http://localhost:14250 + insecure: true # Whether or not collector has encryption or authentication. + +OTLP Exporter +^^^^^^^^^^^^^ + +BentoML supports OTLP exporter for easy integration with an OpenTelemetry Traces receiver. +OTLP provides both a gRPC and HTTP protocol that uses Protobuf to send traces. +You may use either HTTP or gRPC as protocol. By default, gRPC is the default protocol. + +.. note:: + + You may also use HTTP protocol as it provides an easier way to configure proxy and + load balancer. + +To change the protocol, use the ``protocol`` parameter: + +.. code-block:: yaml + + api_server: + tracing: + exporter_type: otlp + sample_rate: 1.0 + otlp: + protocol: http + +Configuration fields are passed through the OpenTelemetry Zipkin exporter +[#otlp_source]_. + +.. tab-set:: + + .. tab-item:: HTTP + :sync: http + + .. note:: + + Make sure to set ``endpoint`` to have traces export path ``/v1/traces`` appended. + + .. code-block:: yaml + + tracing: + exporter_type: otlp + sample_rate: 1.0 + otlp: + protocol: http + endpoint: http://localhost:4318/v1/traces + http: + certificate_file: /path/to/cert.pem + headers: + Keep-Alive: timeout=5, max=1000 + + .. tab-item:: gRPC + :sync: grpc + + .. code-block:: yaml + + tracing: + exporter_type: otlp + sample_rate: 1.0 + otlp: + protocol: grpc + endpoint: http://localhost:4317 + grpc: + insecure: true + headers: + - ["grpc-encoding", "gzip"] + +---- + +.. rubric:: Notes + +.. [#otlp_zipkin_exporter_docs] `OpenTelemetry Zipkin Exporter API docs `_ + +.. [#otlp_jaeger_exporter_docs] `OpenTelemetry Jaeger Exporter API docs `_ + +.. [#jaeger_source] Jaeger exporter source code for :github:`Thrift ` and + :github:`gRPC `. - docker run -v $(PWD):/tmp -p 3000:3000 -e BENTOML_CONFIG=/tmp/my_config_file.yml my-bento-api-server +.. [#default_timeout] The default timeout is 10 seconds. For most use cases, you don't need to change this value. -.. spelling:: +.. [#default_udp_split_oversized_batches] Whether or not to re-emit oversized batches in smaller chunks. By default this is not set. - opentracing +.. [#otlp_source] OTLP exporter source code for :github:`HTTP ` + and :github:`gRPC `.