Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc server plugin reports wrong status code #2170

Closed
ghost opened this issue Jul 6, 2022 · 4 comments · Fixed by #2339
Closed

grpc server plugin reports wrong status code #2170

ghost opened this issue Jul 6, 2022 · 4 comments · Fixed by #2339
Labels
bug Something isn't working

Comments

@ghost
Copy link

ghost commented Jul 6, 2022

Expected behaviour
When a grpc server returns a non-0 status code, it is reported through to datadog tags properly.

Actual behaviour
In a high percentage of the time, the server grpc.request trace may report a status code of "1" instead of the proper status code, where the grpc.request span for the client side reports the correct one. It seems like there is an edge case between something with the grpc stream being closed and the grpc response being returned causing the tag to show the wrong status code.

Steps to reproduce

I spent several hours trying to get together a sample that would reproduce this problem and could not come up with anything.

Environment

  • Operation system: Linux
  • Node.js version: Varies
  • Tracer version: Latest version all the way back to at least 1.3.X
  • Agent version: 7.34.0 but has been happening with other versions as well
  • Relevant library versions: Tried it with @grpc/grpc-js versions 1.6.7 and 1.3.4
@ghost ghost added the bug Something isn't working label Jul 6, 2022
@ghost
Copy link
Author

ghost commented Aug 17, 2022

For anyone else that comes across this problem, what I found is that this happens in the case where there is a unfinished span that has been associated with that trace at the time that the grpc response finishes.

When the grpc.request span gets finished, it seems to be put into the processor queue in-memory, however as soon as the cancelled event happens, the grpc status code is updated on that span, updating the in-memory reference in the queue, and therefore gets reported to the datadog agent incorrectly. The cancelled event seems to fire as soon as the grpc stream is reset (even for rstCode 0).

If the grpc request ends without any unfinished spans, status codes get reported as normal, which is why I was unable to reproduce it before.

So either the processor needs to be updated to do a copy of the span (seems expensive), the Span class could be updated to prevent changes to tags after a span is finished (could cause problems with other plugins), or the grpc plugin updated to ignore a cancelled event if the span is already finished.

@rochdev
Copy link
Member

rochdev commented Aug 17, 2022

the grpc plugin updated to ignore a cancelled event if the span is already finished

I think that for now that's the best option, but later on we might also want to also freeze tags when the span finishes, although as you mentioned this will require making sure that it doesn't break any plugin that might depend on the current behaviour.

@rochdev
Copy link
Member

rochdev commented Sep 1, 2022

I can't seem to be able to reproduce this. Unless I explicitly cancel, no cancellation is happening.

@brandontuttle Do you have an example of a case where the cancel event is emitted without an actual cancellation?

@rochdev
Copy link
Member

rochdev commented Sep 1, 2022

Ended up being able to reproduce and fix the issue with #2339.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant