Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenCensus instrumentation #3303

Closed
wants to merge 2 commits into from
Closed

Conversation

kevpar
Copy link
Member

@kevpar kevpar commented May 24, 2019

This change has three parts:

  • Add the OC GRPC handler to the GRPC server.
    This causes a new OC span to be started and associated with the
    context for every incoming GRPC request. If the request metadata
    includes existing span context, the new span will be parented to the
    existing span.

  • Add a small exporter to log completed span data to Logrus.
    OC supports the concept of exporters which send the data for completed
    spans to an external system for storage and tracking. I created a
    logrusExporter which simply logs some of the span data to Logrus.

    While this obviously isn't a complete distributed tracing system, it
    allows span data to be easily stored in log files and seen on the
    console. It also works nicely with the existing use of Logrus hooks
    to send log data to other systems (such as ETW on Windows).

    Long-term, we should also add the OC agent exporter, which sends the
    span data to the OC agent running on the same system. This is the
    recommended path for exporting spans. However, since this requires
    an additional binary running on the system, I think there is value in
    keeping the logrusExporter as a light-weight alternative.

  • Modify log.GetLogger to add span correlation data to the logrus.Entry.
    When a Logrus entry is returned from log.GetLogger, if the context
    also contains a saved span, then information is added to the entry to
    allow correlating the log messages with the current span.

    This will cause all messages logged via log.G(ctx) to contain
    additional fields for trace ID, span ID, and sampling status. This
    does make these messages more verbose, but will be very useful when
    correlating logs to spans.

Signed-off-by: Kevin Parsons kevpar@microsoft.com

@kevpar
Copy link
Member Author

kevpar commented May 24, 2019

@jterry75 PTAL

@kevpar
Copy link
Member Author

kevpar commented May 24, 2019

As a later change, I would like to start moving many log messages to use log.G(ctx), that will cause the span correlation to light up.

@theopenlab-ci
Copy link

theopenlab-ci bot commented May 24, 2019

Build succeeded.

log/context.go Outdated Show resolved Hide resolved
vendor.conf Show resolved Hide resolved
log/context.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments

cmd/containerd/trace.go Show resolved Hide resolved
vendor.conf Show resolved Hide resolved
@theopenlab-ci
Copy link

theopenlab-ci bot commented May 24, 2019

Build succeeded.

Copy link
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jterry75
Copy link
Contributor

@crosbymichael - PTAL. The goal would be to do this: https://github.com/containerd/containerd/pull/3303/files#diff-6f891b9f28cfb10751fe182afa544a9cR87 on the ttrpc entry for a shim as well so the traceID/parentSpanID flows. Thanks!

@theopenlab-ci
Copy link

theopenlab-ci bot commented May 24, 2019

Build succeeded.

@codecov-io
Copy link

Codecov Report

Merging #3303 into master will decrease coverage by 0.05%.
The diff coverage is 23.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3303      +/-   ##
==========================================
- Coverage    44.6%   44.55%   -0.06%     
==========================================
  Files         112      112              
  Lines       12180    12192      +12     
==========================================
- Hits         5433     5432       -1     
- Misses       5913     5925      +12     
- Partials      834      835       +1
Flag Coverage Δ
#linux 48.45% <33.33%> (-0.05%) ⬇️
#windows 39.81% <23.52%> (-0.06%) ⬇️
Impacted Files Coverage Δ
services/server/server.go 1.26% <0%> (-0.01%) ⬇️
log/context.go 22.85% <25%> (-14.65%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d57cf6f...e215c3b. Read the comment docs.

@fuweid
Copy link
Member

fuweid commented May 25, 2019

link to #3057

@Random-Liu
Copy link
Member

Random-Liu commented May 25, 2019

This is very useful!

Is it possible to make this a plugin?

So that:

  1. People who don't need it can easily compile it out, e.g. people run containerd in embedded system without a cluster.
  2. It is easier to switch to OpenTelemetry in the future, because the implementation is made self contained.

@jterry75
Copy link
Contributor

@Random-Liu - I think it could be a config option if necessary but I don't see how it could be a plugin. We need to add the gRPC option at containerd server init. I don't think this can take a plugin dependency can it?

@Random-Liu
Copy link
Member

@jterry75 At least the exporter seems to be a plugin to me.

@mxpv
Copy link
Member

mxpv commented May 28, 2019

@Random-Liu @jterry75 how do you feel about build tag?

@Random-Liu
Copy link
Member

@mxpv Yeah, if we make it a plugin we need a build tag.

@@ -62,15 +63,26 @@ func WithLogger(ctx context.Context, logger *logrus.Entry) context.Context {
}

// GetLogger retrieves the current logger from the context. If no logger is
// available, the default logger is returned.
// available, the default logger is returned. If the context has a span
// associated with it, add correlating information to the returned logger.
func GetLogger(ctx context.Context) *logrus.Entry {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it going to be confusing hiding the tracing functionality in the log pkg and GetLogger function?

@kevpar
Copy link
Member Author

kevpar commented Jun 3, 2019

@dmcgowan also prefers this implemented as a set of plugins for the various functionality. So I am going to look at breaking this change into several plugins.

Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
This change has three parts:

- Add the OC GRPC handler to the GRPC server.
  This causes a new OC span to be started and associated with the
  context for every incoming GRPC request. If the request metadata
  includes existing span context, the new span will be parented to the
  existing span.

- Add a small exporter to log completed span data to Logrus.
  OC supports the concept of exporters which send the data for completed
  spans to an external system for storage and tracking. I created a
  logrusExporter which simply logs some of the span data to Logrus.

  While this obviously isn't a complete distributed tracing system, it
  allows span data to be easily stored in log files and seen on the
  console. It also works nicely with the existing use of Logrus hooks
  to send log data to other systems (such as ETW on Windows).

  Long-term, we should also add the OC agent exporter, which sends the
  span data to the OC agent running on the same system. This is the
  recommended path for exporting spans. However, since this requires
  an additional binary running on the system, I think there is value in
  keeping the logrusExporter as a light-weight alternative.

- Modify log.GetLogger to add span correlation data to the logrus.Entry.
  When a Logrus entry is returned from log.GetLogger, if the context
  also contains a saved span, then information is added to the entry to
  allow correlating the log messages with the current span.

  This will cause all messages logged via log.G(ctx) to contain
  additional fields for trace ID, span ID, and sampling status. This
  does make these messages more verbose, but will be very useful when
  correlating logs to spans.

Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
@theopenlab-ci
Copy link

theopenlab-ci bot commented Jul 16, 2019

Build succeeded.

@kevpar kevpar closed this Jul 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants