Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gRPC client interceptor for channels reused across traces #539

Merged
merged 4 commits into from
Mar 11, 2019

Conversation

nikhaldi
Copy link
Contributor

@nikhaldi nikhaldi commented Mar 5, 2019

In a typical web server gRPC clients/channels are reused across multiple
requests and hence across multiple traces. Previously the
OpenCensusClientInterceptor was instantiated for each channel with the
current tracer from the execution context. This would then lead to all
rpcs going through that channel to have the same tracer, essentially
grouping all rpcs under whatever happened to be the current trace when the
channel was created.

Instead instantiate OpenCensusClientInterceptor without a tracer by
default. The current tracer will be retrieved from the execution context at
the start of every rpc span.

In addition OpenCensusClientInterceptor was manipulating thread-local state
via the execution context. This seems unnecessary and misguided. The current
span state is already managed by the spans/context tracers. Setting the
tracer explicitly risks further subtle bugs.

Also removes unused method OpenCensusClientInterceptor._end_span_between_context.

Fixes #182

@nikhaldi nikhaldi requested review from c24t, reyang, songy23 and a team as code owners March 5, 2019 20:26
@nikhaldi
Copy link
Contributor Author

nikhaldi commented Mar 5, 2019

I can reproduce the underlying problem reliably with a standard Flask integration using the FlaskMiddleware. This change does fix the Flask server for me.

I don't fully understand what the intent was with instantiating the client interceptor this way (I looked at the git history, still unclear). It's clearly broken though for common server use cases. I may be missing some context and maybe there are other use cases where this made sense? Would love any hints on that.

In a typical web server gRPC clients/channels are reused across multiple
requests and hence across multiple traces. Previously the
`OpenCensusClientInterceptor` was instantiated for each channel with the
current tracer from the execution context. This would then lead to all
rpcs going through that channel to have the same tracer, essentially
grouping all rpcs under whatever happened to be the current trace when the
channel was created.

Instead instantiate `OpenCensusClientInterceptor` without a tracer by
default. The current tracer will be retrieved from the execution context at
the start of every rpc span.

In addition `OpenCensusClientInterceptor` was manipulating thread-local state
via the execution context. This seems unnecessary and misguided. The current
span state is already managed by the spans/context tracers. Setting the
tracer explicitly risks further subtle bugs.

Also removes unused method `OpenCensusClientInterceptor._end_span_between_context`.

Fixes census-instrumentation#182
@c24t
Copy link
Member

c24t commented Mar 6, 2019

I don't fully understand what the intent was with instantiating the client interceptor this way (I looked at the git history, still unclear).

I don't understand it either. If the goal was to be able to use OpenCensusClientInterceptor with a tracer other than the context's current tracer then we almost certainly wouldn't want to set that as the context's tracer.

Removing execution_context here seems right to me, but @liyanhui1228 or @wkiser may be able to shed some light on a use case we're missing.

Copy link
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an unambiguously good change to me, let's see if @liyanhui1228 or @wkiser have any objections.

@c24t c24t merged commit 724bd0b into census-instrumentation:master Mar 11, 2019
@odeke-em
Copy link
Member

Thanks for working on this @nikhaldi and @c24t for the review!

It would be nice to add a regression test too to lock-in this behavior and ensure that this bug never creeps back in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants