feat(tracing): add support for otlp exporter #2918

dbuades · 2022-08-17T17:43:26Z

What does this PR address?

Mainly fixes #2917
However, it also addresses the following concerns:

Add opentelemetry-exporter-jaeger-thrift as an extra dependency in order to solve this issue
Update OpenTelemetry to 1.12 (no incompatibilities found on my side)
Log a warning message when sample_rate is not set for tracing.

Before submitting:

Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's
guide on how to create a pull request.
Does the code follow BentoML's code style, both make format and make lint script have passed (instructions)?
Did you read through contribution guidelines and follow development guidelines?
Did your changes require updates to the documentation? Have you updated
those accordingly? Here are documentation guidelines and tips on writting docs.
Did you write tests to cover your changes? -> No, I didn't found tests covering tracing. Open to do it in a subsequent PR though.

Who can help review?

It was mentioned in Slack that @bojiang worked on the tracing feature.

docs/source/guides/tracing.rst

parano · 2022-08-17T18:20:41Z

bentoml/_internal/configuration/containers.py

@@ -374,6 +388,9 @@ def tracer_provider(

        if sample_rate is None:
            sample_rate = 0.0
+            logger.warning(


Looks like this will show up for all users that don't use tracing?

@parano You are right, I'm going to change it.

Maybe we should warn if user explicitly set the sample_rate to 0.0. If the sample_rate is Null we can leave the behavior as-is.

@ssheng I still think that we should warn in both situations, since it took us a while to debug why the traces weren't being sent. I'm open to discussing it though.

I just pushed a fix for this, let me know what you think.

parano · 2022-08-17T18:25:45Z

setup.cfg

-    opentelemetry-sdk==1.9.0
-    opentelemetry-semantic-conventions==0.28b0
-    opentelemetry-util-http==0.28b0
+    opentelemetry-api>=1.9.0


@bojiang what's the reason we lock the opentelemetry version before? was it to prevent installing the pre-release versions by accident? if that's the case it should be ok to change this to >= now

There are some issue with opentelemetry pre-release before, couldn't remember what the issues are. Idk if that got fixed upstream.

Is there any reason why we need to bump this up @dbuades ?

We have to lock the version of otlp libs/contribs because the opentelemetry community is not well organizing the dependency map. We should lock them to verified working versions.

@aarnphm Unfortunately I needed to bump it since it was required by opentelemetry-exporter-otlp
@bojiang Would you be confortable if I lock them to their current version until the e2e tests are added? Although we don't have e2e tests, on our side we tested tracing with jaeger, zipkin and otlp exporters on these versions and didn't find any errors.

aarnphm

LGTM with some minor changes

bentoml/_internal/configuration/default_configuration.yaml

bentoml/_internal/configuration/containers.py

ssheng

I'm happy to merge and update the warning change for you in a separate PR.

ssheng · 2022-08-18T02:47:06Z

bentoml/_internal/configuration/containers.py

@@ -374,6 +388,9 @@ def tracer_provider(

        if sample_rate is None:
            sample_rate = 0.0
+            logger.warning(


Maybe we should warn if user explicitly set the sample_rate to 0.0. If the sample_rate is Null we can leave the behavior as-is.

codecov · 2022-08-18T03:03:00Z

Codecov Report

Merging #2918 (70ecf88) into main (372b530) will decrease coverage by 0.97%.
The diff coverage is 28.57%.

@@            Coverage Diff             @@
##             main    #2918      +/-   ##
==========================================
- Coverage   70.82%   69.84%   -0.98%     
==========================================
  Files         104      121      +17     
  Lines        9435     9936     +501     
==========================================
+ Hits         6682     6940     +258     
- Misses       2753     2996     +243

Impacted Files	Coverage Δ
bentoml/_internal/configuration/containers.py	`76.53% <28.57%> (-3.26%)`	⬇️
bentoml/_internal/service/loader.py	`71.85% <0.00%> (-1.49%)`	⬇️
bentoml/lightgbm.py	`0.00% <0.00%> (ø)`
bentoml/bentos.py	`48.68% <0.00%> (ø)`
bentoml/serve.py	`0.00% <0.00%> (ø)`
bentoml/transformers.py	`0.00% <0.00%> (ø)`
bentoml/keras.py	`0.00% <0.00%> (ø)`
bentoml/sklearn.py	`0.00% <0.00%> (ø)`
bentoml/catboost.py	`0.00% <0.00%> (ø)`
bentoml/pytorch_lightning.py	`0.00% <0.00%> (ø)`
... and 16 more

bojiang · 2022-08-18T03:45:02Z

It seems that opentelemetry-exporter-otlp is requiring newer version of the opentelemetry-api.
We should fully test the otlp features since otlp plugins are in an early stage and have strict requirements to the api, plus our e2e doesn't cover the otlp part.

setup.cfg

aarnphm · 2022-08-19T11:42:28Z

Hmm maybe you need to rebase this onto main to remove the merge commit?

ssheng · 2022-08-21T06:21:24Z

I have verified that the change here is compatible with the existing jaeger and zipkin tracers. The otlp exporter is also working as expected. I will go ahead and merge this PR. Thanks again for your contribution, @dbuades!

Following this PR, I will lock opentelemetry dependencies to 0.33b0 until they stabilizes. I also discovered that the service name field was not properly set as the api_service or the runner names, causing the UI to display "unknown_service" as a result. I will look to address in the same following PR. However, using environment variable to set the service name is possible. Did you use the env var mechanism, @dbuades?

dbuades · 2022-08-21T09:22:51Z

I have verified that the change here is compatible with the existing jaeger and zipkin tracers. The otlp exporter is also working as expected. I will go ahead and merge this PR. Thanks again for your contribution, @dbuades!

Following this PR, I will lock opentelemetry dependencies to 0.33b0 until they stabilizes. I also discovered that the service name field was not properly set as the api_service or the runner names, causing the UI to display "unknown_service" as a result. I will look to address in the same following PR. However, using environment variable to set the service name is possible. Did you use the env var mechanism, @dbuades?

Thanks @ssheng !
That's a very good point for the service name. We are currently are using the env var mechanism since we use that strategy it in all of our microservices, so it felt natural to use the same strategy here.
However, it makes total sense to infer it from the bento service name if the env var is not set. Let me know if you'd like me to help with the PR or with the review.

ssheng · 2022-08-21T10:44:20Z

Thanks for the feedback, @dbuades! We'd love to get your review on this, #2928.

@parano

## What does this PR address?  Following #2918, verified that the latest OpenTelemetry version `0.33b0` is compatible with the existing `otlp`, `jaeger` and `zipkin` exporters. Locking the dependency versions until OTEL stabilizes. In addition, this PR also configures the OpenTelemetry resource automatically if user has not explicitly configured it through environment variables. - API servers and runners are treated as OpenTelemetry services and named after their names. - Service namespace is set as the Bento name - Service version is set as the Bento version - Service instance ID is set as the worker ID Individual trace page on Jaeger. API server and runners are labeled as separate services. ![Screen Shot 2022-08-21 at 02 54 17](https://user-images.githubusercontent.com/861225/185787072-25e01da3-41a3-40a5-8277-fb8653311571.png) Search page: ![Screen Shot 2022-08-21 at 03 00 02](https://user-images.githubusercontent.com/861225/185787156-5960e43c-a071-421d-9ba1-1752108d83b0.png) ## Before submitting:    - [ ] Does the Pull Request follow [Conventional Commits specification](https://www.conventionalcommits.org/en/v1.0.0/#summary) naming? Here are [GitHub's guide](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) on how to create a pull request. - [ ] Does the code follow BentoML's code style, both `make format` and `make lint` script have passed ([instructions](https://github.com/bentoml/BentoML/blob/main/DEVELOPMENT.md#style-check-auto-formatting-type-checking))? - [ ] Did you read through [contribution guidelines](https://github.com/bentoml/BentoML/blob/main/CONTRIBUTING.md#ways-to-contribute) and follow [development guidelines](https://github.com/bentoml/BentoML/blob/main/DEVELOPMENT.md#start-developing)? - [ ] Did your changes require updates to the documentation? Have you updated those accordingly? Here are [documentation guidelines](https://github.com/bentoml/BentoML/tree/main/docs) and [tips on writting docs](https://github.com/bentoml/BentoML/tree/main/docs#writing-documentation). - [ ] Did you write tests to cover your changes? ## Who can help review? Feel free to tag members/contributors who can help review your PR.  Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

feat(tracing): add support for otlp exporter

f48897d

dbuades requested a review from a team as a code owner August 17, 2022 17:43

dbuades requested review from parano and removed request for a team August 17, 2022 17:43

parano reviewed Aug 17, 2022

View reviewed changes

docs/source/guides/tracing.rst Outdated Show resolved Hide resolved

parano reviewed Aug 17, 2022

View reviewed changes

aarnphm reviewed Aug 17, 2022

View reviewed changes

ssheng previously approved these changes Aug 18, 2022

View reviewed changes

bojiang self-requested a review August 18, 2022 03:39

dbuades dismissed ssheng’s stale review via 1f2ccc5 August 19, 2022 08:42

aarnphm reviewed Aug 19, 2022

View reviewed changes

setup.cfg Outdated Show resolved Hide resolved

fix(tracing): only warn about sample_rate when required

ee76874

dbuades force-pushed the feat/support-otlp-tracing branch from 8c53384 to ee76874 Compare August 19, 2022 09:10

dbuades and others added 2 commits August 19, 2022 11:28

Merge branch 'main' into feat/support-otlp-tracing

aba1a40

chore: remove explicit import of opentelemetry-exporter-jaeger-thrift

70ecf88

ssheng approved these changes Aug 21, 2022

View reviewed changes

ssheng merged commit 2552c8b into bentoml:main Aug 21, 2022

ssheng mentioned this pull request Aug 21, 2022

chore: Lock OpenTelemetry versions and add tracing metadata #2928

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tracing): add support for otlp exporter #2918

feat(tracing): add support for otlp exporter #2918

dbuades commented Aug 17, 2022 •

edited

parano Aug 17, 2022

parano Aug 17, 2022

dbuades Aug 17, 2022

ssheng Aug 18, 2022

dbuades Aug 19, 2022

parano Aug 17, 2022

aarnphm Aug 17, 2022 •

edited

bojiang Aug 18, 2022 •

edited

dbuades Aug 19, 2022

aarnphm left a comment

ssheng left a comment

ssheng Aug 18, 2022

codecov bot commented Aug 18, 2022 •

edited

bojiang commented Aug 18, 2022 •

edited

aarnphm commented Aug 19, 2022

ssheng commented Aug 21, 2022 •

edited

dbuades commented Aug 21, 2022

ssheng commented Aug 21, 2022

feat(tracing): add support for otlp exporter #2918

feat(tracing): add support for otlp exporter #2918

Conversation

dbuades commented Aug 17, 2022 • edited

What does this PR address?

Before submitting:

Who can help review?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aarnphm Aug 17, 2022 • edited

Choose a reason for hiding this comment

bojiang Aug 18, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aarnphm left a comment

Choose a reason for hiding this comment

ssheng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 18, 2022 • edited

Codecov Report

bojiang commented Aug 18, 2022 • edited

aarnphm commented Aug 19, 2022

ssheng commented Aug 21, 2022 • edited

dbuades commented Aug 21, 2022

ssheng commented Aug 21, 2022

dbuades commented Aug 17, 2022 •

edited

aarnphm Aug 17, 2022 •

edited

bojiang Aug 18, 2022 •

edited

codecov bot commented Aug 18, 2022 •

edited

bojiang commented Aug 18, 2022 •

edited

ssheng commented Aug 21, 2022 •

edited