Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs/core and guides #2417

Merged
merged 12 commits into from
Apr 26, 2022
Merged

Docs/core and guides #2417

merged 12 commits into from
Apr 26, 2022

Conversation

timliubentoml
Copy link
Collaborator

@timliubentoml timliubentoml commented Apr 19, 2022

Description

Small updates on the Core Concepts
Guides added:

Adaptive Batching

  • Architecture Description and introduction to batching
  • Configuration details still need to be provided post new design implemented
    Logging
  • Architecture along with OpenTelemetry description
  • Configuration description, only enable/disable and a few parameters currently
    Multi-Model serving
  • Code samples provided for simple and more advanced cases
    Custom endpoints
  • Descriptions along with how to mount ASGI/WSGI service endpoints

Motivation and Context

How Has This Been Tested?

Checklist:

  • My code follows the bentoml code style, both make format and
    make lint script have passed
    (instructions).
  • My change reduces project test coverage and requires unit tests to be added
  • I have added unit tests covering my code change
  • My change requires a change to the documentation
  • I have updated the documentation accordingly

@timliubentoml timliubentoml marked this pull request as ready for review April 20, 2022 18:00
README.md Outdated Show resolved Hide resolved
Comment on lines +17 to +19
- `trace_id` is the id of a trace which tracks “the progression of a single request, as it is handled by services that make up an application” - `OpenTelemetry Basic Documentation <https://www.dynatrace.com/support/help/extend-dynatrace/opentelemetry/basics>`_
- `span_id is` the id of a span which is contained within a trace. “A span is the building block of a trace and is a named, timed operation that represents a piece of the workflow in the distributed system. Multiple spans are pieced together to create a trace.” - `OpenTelemetry Span Documentation <https://opentelemetry.lightstep.com/spans/>`_
- `sampled is` the number of times this trace has been sampled. “Sampling is a mechanism to control the noise and overhead introduced by OpenTelemetry by reducing the number of samples of traces collected and sent to the backend.” - `OpenTelemetry SDK Documentation <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md>`_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make a table for this section instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a convention for this style in our current doc. Main style I found was enumerating the options in titles like here: https://docs.bentoml.org/en/latest/concepts/building_bentos.html

I feel like bullet points are better. Maybe table if you have more than one thing you want to specify about the parameter?

Add further reading section
Further Reading
---------------
- :ref:`API Reference for IO descriptors <api-io-descriptors>`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this linking back to itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's a little confusing, but this is actually linking to the "API Reference" documentation (https://docs.bentoml.org/en/latest/api/api_io_descriptors.html) for the io descriptors. This current page is called: "api-io-descriptors-page"

Managing Models and Bentos Remotely with Yatai
----------------------------------------------

Yatai is BentoML's end to end deployment and monitoring tool. It also functions as a remote model and bento repository.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a platform instead of tool?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


Yatai is BentoML's end to end deployment and monitoring tool. It also functions as a remote model and bento repository.

To connect the CLI to a remote `Yatai <yatai-service-page>`, use the `bentoml login YATAI_URL` command.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a code tab for bentoml login YATAI_URL

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

docs/source/guides/adaptive_batching.rst Outdated Show resolved Hide resolved
@@ -3,4 +3,44 @@
Adaptive Batching
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved under Core Concepts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it here for now, but I don't have a strong opinion about this.

"While serving a TensorFlow model, batching individual model inference requests together can be important for performance. In particular, batching is necessary to unlock the high throughput promised by hardware accelerators such as GPUs."
-- `TensorFlow documentation <https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md>`_

As an optimization for a real-time service, batching works off of 2 main concepts.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also explain why we do server-side batching. Server side batching is advantageous because 1) it simplifies the client logic 2) and the server can often times batch more efficiently than the client due to traffic volume.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

docs/source/guides/adaptive_batching.rst Outdated Show resolved Hide resolved
Running with Adaptive Batching
------------------------------

There are 2 ways that adaptive batching will run depending on how you've deployed BentoML.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a placeholder page for explaining the standalone mode vs distributed mode.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to take this explanation out then?

Comment on lines 12 to 14
.. parsed-literal::

"[%(component)s] %(message)s"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this format is less relevant for users. For access logs, we can breakdown a log entry.

[api_server] 127.0.0.1:55723 (scheme=http,method=POST,path=/classify,type=application/json,length=9) (status=200,type=application/json,length=1) 0.465ms (trace=175923203261911804790364073480797463970,span=2163644694959317809,sampled=0)

In the access log example, the format has the following components.

  • Component
  • Client IP
  • Request
  • Response
  • Latency
  • Traces

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Logging Configuration
---------------------

Logs can be configured from the bentofile.yaml file for both web requests and model serving requests.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging is not configured through bentofile.yaml, rather, it is configured through the service configuration documented here, https://docs.bentoml.org/en/latest/guides/configuration.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@parano parano merged commit debc858 into main Apr 26, 2022
@parano parano deleted the docs/core-and-guides branch April 26, 2022 02:54
aarnphm pushed a commit to aarnphm/BentoML that referenced this pull request Jul 29, 2022
* core concepts edits

* added content to a few guides

* initial logging documentation

* updates to format and added contributer images in readme

* more updates to formatting

* Update README.md

* Update README.md

* Update README.md

* Update bento_management.rst

* Update docs/source/guides/adaptive_batching.rst

Co-authored-by: Sean Sheng <s3sheng@gmail.com>

* Update docs/source/guides/adaptive_batching.rst

Co-authored-by: Sean Sheng <s3sheng@gmail.com>

* updates based on sean's feedback

Co-authored-by: Tim Liu <timliu@Tims-MBP.attlocal.net>
Co-authored-by: Sean Sheng <s3sheng@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants