-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(tracing): add support for otlp exporter #2918
Conversation
@@ -374,6 +388,9 @@ def tracer_provider( | |||
|
|||
if sample_rate is None: | |||
sample_rate = 0.0 | |||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this will show up for all users that don't use tracing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parano You are right, I'm going to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should warn if user explicitly set the sample_rate
to 0.0
. If the sample_rate
is Null
we can leave the behavior as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssheng I still think that we should warn in both situations, since it took us a while to debug why the traces weren't being sent. I'm open to discussing it though.
I just pushed a fix for this, let me know what you think.
opentelemetry-sdk==1.9.0 | ||
opentelemetry-semantic-conventions==0.28b0 | ||
opentelemetry-util-http==0.28b0 | ||
opentelemetry-api>=1.9.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bojiang what's the reason we lock the opentelemetry version before? was it to prevent installing the pre-release versions by accident? if that's the case it should be ok to change this to >=
now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some issue with opentelemetry pre-release before, couldn't remember what the issues are. Idk if that got fixed upstream.
Is there any reason why we need to bump this up @dbuades ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to lock the version of otlp libs/contribs because the opentelemetry community is not well organizing the dependency map. We should lock them to verified working versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aarnphm Unfortunately I needed to bump it since it was required by opentelemetry-exporter-otlp
@bojiang Would you be confortable if I lock them to their current version until the e2e tests are added? Although we don't have e2e tests, on our side we tested tracing with jaeger, zipkin and otlp exporters on these versions and didn't find any errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some minor changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to merge and update the warning change for you in a separate PR.
@@ -374,6 +388,9 @@ def tracer_provider( | |||
|
|||
if sample_rate is None: | |||
sample_rate = 0.0 | |||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should warn if user explicitly set the sample_rate
to 0.0
. If the sample_rate
is Null
we can leave the behavior as-is.
Codecov Report
@@ Coverage Diff @@
## main #2918 +/- ##
==========================================
- Coverage 70.82% 69.84% -0.98%
==========================================
Files 104 121 +17
Lines 9435 9936 +501
==========================================
+ Hits 6682 6940 +258
- Misses 2753 2996 +243
|
It seems that |
8c53384
to
ee76874
Compare
Hmm maybe you need to rebase this onto main to remove the merge commit? |
I have verified that the change here is compatible with the existing Following this PR, I will lock |
Thanks @ssheng ! |
## What does this PR address? <!-- Thanks for sending a pull request! Congrats for making it this far! Here's a 🍱 for you. There are still a few steps ahead. Please make sure to read the contribution guidelines, then fill out the blanks below before requesting a code review. Name your Pull Request with one of the following prefixes, e.g. "feat: add support for PyTorch", to indicate the type of changes proposed. This is based on the [Conventional Commits specification](https://www.conventionalcommits.org/en/v1.0.0/#summary). - feat: (new feature for the user, not a new feature for build script) - fix: (bug fix for the user, not a fix to a build script) - docs: (changes to the documentation) - style: (formatting, missing semicolons, etc; no production code change) - refactor: (refactoring production code, eg. renaming a variable) - perf: (code changes that improve performance) - test: (adding missing tests, refactoring tests; no production code change) - chore: (updating grunt tasks etc; no production code change) - build: (changes that affect the build system or external dependencies) - ci: (changes to configuration files and scripts) - revert: (reverts a previous commit) Describe your changes in detail. Attach screenshots here if appropriate. Once you're done with this, someone from BentoML team or community member will help review your PR (see "Who can help review?" section for potential reviewers.). If no one has reviewed your PR after a week have passed, don't hesitate to post a new comment and ping @-the same person. Notifications sometimes get lost 🥲. --> Following #2918, verified that the latest OpenTelemetry version `0.33b0` is compatible with the existing `otlp`, `jaeger` and `zipkin` exporters. Locking the dependency versions until OTEL stabilizes. In addition, this PR also configures the OpenTelemetry resource automatically if user has not explicitly configured it through environment variables. - API servers and runners are treated as OpenTelemetry services and named after their names. - Service namespace is set as the Bento name - Service version is set as the Bento version - Service instance ID is set as the worker ID Individual trace page on Jaeger. API server and runners are labeled as separate services. ![Screen Shot 2022-08-21 at 02 54 17](https://user-images.githubusercontent.com/861225/185787072-25e01da3-41a3-40a5-8277-fb8653311571.png) Search page: ![Screen Shot 2022-08-21 at 03 00 02](https://user-images.githubusercontent.com/861225/185787156-5960e43c-a071-421d-9ba1-1752108d83b0.png) ## Before submitting: <!--- Go over all the following points, and put an `x` in all the boxes that apply. --> <!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! --> <!--- If you plan to update documentation or tests in follow-up, please note --> - [ ] Does the Pull Request follow [Conventional Commits specification](https://www.conventionalcommits.org/en/v1.0.0/#summary) naming? Here are [GitHub's guide](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) on how to create a pull request. - [ ] Does the code follow BentoML's code style, both `make format` and `make lint` script have passed ([instructions](https://github.com/bentoml/BentoML/blob/main/DEVELOPMENT.md#style-check-auto-formatting-type-checking))? - [ ] Did you read through [contribution guidelines](https://github.com/bentoml/BentoML/blob/main/CONTRIBUTING.md#ways-to-contribute) and follow [development guidelines](https://github.com/bentoml/BentoML/blob/main/DEVELOPMENT.md#start-developing)? - [ ] Did your changes require updates to the documentation? Have you updated those accordingly? Here are [documentation guidelines](https://github.com/bentoml/BentoML/tree/main/docs) and [tips on writting docs](https://github.com/bentoml/BentoML/tree/main/docs#writing-documentation). - [ ] Did you write tests to cover your changes? ## Who can help review? Feel free to tag members/contributors who can help review your PR. <!-- Feel free to ping any of the BentoML members for help on your issue, but don't ping more than three people 😊. If you know how to use git blame, that is probably the easiest way. Team members that you can ping: - @parano - @yubozhao - @bojiang - @ssheng - @aarnphm - @sauyon - @larme - @yetone - @jjmachan --> Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
What does this PR address?
Mainly fixes #2917
However, it also addresses the following concerns:
opentelemetry-exporter-jaeger-thrift
as an extra dependency in order to solve this issuesample_rate
is not set for tracing.Before submitting:
guide on how to create a pull request.
make format
andmake lint
script have passed (instructions)?those accordingly? Here are documentation guidelines and tips on writting docs.
Who can help review?
It was mentioned in Slack that @bojiang worked on the tracing feature.