Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nri: enable otel tracing #7728

Merged
merged 2 commits into from
Feb 6, 2024

Conversation

klihub
Copy link
Contributor

@klihub klihub commented Feb 2, 2024

What type of PR is this?

/kind other

What this PR does / why we need it:

This PR updates our NRI dependency enabling OpenTelemetry trace context propagation over ttrpc connections. Additionally the PR also enables automatic OpenTelemetry instrumentation of NRI ttrpc calls when tracing is globally enabled in cri-o.

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

- Enable automatic OpenTelemetry instrumentation of ttrpc calls to NRI plugins when tracing is globally enabled in cri-o.
- Bumping our NRI deps also fixes a bug where starting a cri-o launched NRI plugin could cause the accidental closing of an unrelated fd (socket, file, etc.) due to an fd double-close bug.

@klihub klihub requested a review from mrunalp as a code owner February 2, 2024 14:46
@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/other Categorizes issue or PR as not clearly related to any existing kind/* category labels Feb 2, 2024
@klihub
Copy link
Contributor Author

klihub commented Feb 3, 2024

/retest-required

@klihub klihub force-pushed the devel/nri-otel-tracing branch 5 times, most recently from 52a80c4 to 3596de7 Compare February 4, 2024 14:45
Copy link

codecov bot commented Feb 4, 2024

Codecov Report

Merging #7728 (d955623) into main (e9febd3) will not change coverage.
The diff coverage is 100.00%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7728   +/-   ##
=======================================
  Coverage   47.95%   47.95%           
=======================================
  Files         146      146           
  Lines       16274    16274           
=======================================
  Hits         7804     7804           
  Misses       7517     7517           
  Partials      953      953           

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Code LGTM

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 5, 2024
@saschagrunert
Copy link
Member

/retest

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2024
@klihub
Copy link
Contributor Author

klihub commented Feb 6, 2024

Thank you! Code LGTM

@saschagrunert ttrpc got tagged a new version. Updated deps to point the latest tag. That cost me your earlier lgtm.

@kwilczynski
Copy link
Member

Thank you! Code LGTM

@saschagrunert ttrpc got tagged a new version. Updated deps to point the latest tag. That cost me your earlier lgtm.

@klihub, might be a silly question, but... can we use the normal OTEL package we already have? Or does it have to be ttrpc explicitly (e.g., no way to adapt the other one in any way, etc.)?

@klihub
Copy link
Contributor Author

klihub commented Feb 6, 2024

Thank you! Code LGTM

@saschagrunert ttrpc got tagged a new version. Updated deps to point the latest tag. That cost me your earlier lgtm.

@klihub, might be a silly question, but... can we use the normal OTEL package we already have? Or does it have to be ttrpc explicitly (e.g., no way to adapt the other one in any way, etc.)?

otelttrpc tries to be for ttrpc what otelgrpc is for grpc. Currently it implements instrumenting unary client and server interceptors which enable 2 things:

  1. the automatic injection of otel traces for each unary RPC call invoked and served over ttrpc (we still lack support for streaming calls), and
  2. the automatic propagation of otel trace context (span ID/info) over the IPC transport used for ttrpc, so that related/resulting computation on the other side of the IPC link gets a proper parent span ID and gets properly nested in visualization under the right originating/triggering event

So otelttrpc is not a replacement for otel, it is a helper package which implements instrumenting interceptors so that we get ttrpc closer to grpc wrt. otel tracing transparency.

With this PR in place, if tracing is enabled any related processing in NRI plugins should correctly show up in the traces. If a plugin is uninstrumented, a single (unary client call) should show up. If the plugin itself is instrumented, more detailed spans should be available as child spans of this.

@kwilczynski
Copy link
Member

[...]

@klihub, might be a silly question, but... can we use the normal OTEL package we already have? Or does it have to be ttrpc explicitly (e.g., no way to adapt the other one in any way, etc.)?

otelttrpc tries to be for ttrpc what otelgrpc is for grpc. Currently it implements instrumenting unary client and server interceptors which enable 2 things:

Makes sense. Since NRI uses ttrpc as its protocol of choice. Thank you!

@kwilczynski
Copy link
Member

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2024
Copy link
Contributor

openshift-ci bot commented Feb 6, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: klihub, kwilczynski, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [klihub,kwilczynski,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 8510499 into cri-o:main Feb 6, 2024
59 of 64 checks passed
@klihub klihub deleted the devel/nri-otel-tracing branch February 6, 2024 14:56
@kwilczynski
Copy link
Member

@klihub, sorry to appear out of the blue on this closed Pull Request, but I have a question.

The containerd/otelttrpc package, which we now use, seems to be causing some build issues when we build CRI-O binary packages (RPMs) for use in OpenShift per:

_output/src/github.com/cri-o/cri-o/vendor/github.com/containerd/otelttrpc/interceptor.go:50:2: code in directory /builddir/build/BUILD/cri-o-4fa897e3b639c3bf19ad0152f2429162548e5698/_output/src/github.com/cri-o/cri-o/vendor/github.com/containerd/otelttrpc/internal expects import "github.com/containerd/ttrpc/otelttrpc"

This seems to be due to the "import path checking" feature you are using there.

Just wondering, do you think this is something that you need to leverage in this specific package?

@klihub
Copy link
Contributor Author

klihub commented Mar 4, 2024

@klihub, sorry to appear out of the blue on this closed Pull Request, but I have a question.

The containerd/otelttrpc package, which we now use, seems to be causing some build issues when we build CRI-O binary packages (RPMs) for use in OpenShift per:

_output/src/github.com/cri-o/cri-o/vendor/github.com/containerd/otelttrpc/interceptor.go:50:2: code in directory /builddir/build/BUILD/cri-o-4fa897e3b639c3bf19ad0152f2429162548e5698/_output/src/github.com/cri-o/cri-o/vendor/github.com/containerd/otelttrpc/internal expects import "github.com/containerd/ttrpc/otelttrpc"

This seems to be due to the "import path checking" feature you are using there.

Just wondering, do you think this is something that you need to leverage in this specific package?

@kwilczynski No, I can't see why we would need any of that. And actually it is more of an accident (IOW, me not being aware of the consequences) that we have those in place. I'll file a PR to get rid of them.

@klihub
Copy link
Contributor Author

klihub commented Mar 4, 2024

@kwilczynski Filed here: containerd/otelttrpc#1. Once that gets merged, I'll file corresponding PRs to update the cri-o otelttrpc dependency

@kwilczynski
Copy link
Member

@kwilczynski Filed here: containerd/otelttrpc#1. Once that gets merged, I'll file corresponding PRs to update the cri-o otelttrpc dependency

@klihub, thank you on behalf of our build team! I appreciate a super fast response from you! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/other Categorizes issue or PR as not clearly related to any existing kind/* category lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants