Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: tracer: Warn when developer code does not close all spans #2064

Closed
adammw opened this issue Jun 23, 2023 · 5 comments · Fixed by #2149
Closed

proposal: tracer: Warn when developer code does not close all spans #2064

adammw opened this issue Jun 23, 2023 · 5 comments · Fixed by #2149
Labels
proposal/accepted Accepted proposals proposal more in depth change that requires full team approval scheduled This is for proposals which have been planned in the quarterly planning. tracer

Comments

@adammw
Copy link

adammw commented Jun 23, 2023

We recently spent a lot of developer and support time investigating why certain traces weren't being sent to APM only to find that the tracing library was still waiting for the full trace to complete as there were spans created that were never finished.

It would be a helpful feature if the library could enact some sort of configurable timeout that if a trace is not finished in a certain amount of time, to log that a misconfiguration is possible and which spans the library is waiting on to finish to finalise the trace.

The feature as suggested in #1428 may also have helped mask the issue so we could still continue to get usable data in spite of the unfinished spans.

@adammw adammw added the enhancement quick change/addition that does not need full team approval label Jun 23, 2023
@ajgajg1134 ajgajg1134 added proposal more in depth change that requires full team approval and removed enhancement quick change/addition that does not need full team approval labels Jun 27, 2023
@ajgajg1134 ajgajg1134 changed the title Warn when developer code does not close all spans tracer: Warn when developer code does not close all spans Jun 27, 2023
@ajgajg1134
Copy link
Contributor

This seems like a good idea and would serve well as a debugging tool for when people run into this issue.

A few implementation ideas from our proposal review meeting:

  • At trace start (read: span start) create a goroutine for new traces that check for open spans that are longer than a configured timeout. Downsides is this creates 1 goroutine per trace which could be expensive.
  • In the tracer we keep a big slice of all the started traces, periodically we iterate it. When older than X time we print warning and remove it from the list. Maybe just checking if the Trace has been around too long would be sufficient? But we'd be missing which span is missing. (We can also do this by just seeing an old trace and THEN iterating on the trace)

@adammw
Copy link
Author

adammw commented Jun 27, 2023

The second option sounds sufficient assuming the developer has turned on trace debug logging as the start/finish of spans will be logged, but agreed that it is not as user-friendly as seeing the names of the spans that it is waiting on - especially if that could be a warning without trace debugging on. Is there unacceptable overhead in storing which spans have been started from that trace? Perhaps if the tracer kept a slice of started spans rather than traces it would be able to look it up dynamically when the situation occurs?

For context, these are the modifications to logging I made temporarily to diagnose the issue on our end - seeing that finished and started counts mismatched

@knusbaum
Copy link
Contributor

@adammw

I think we can do the second option, and if we find a trace that's older than X time, we can iterate its spans (it keeps a list of them) and print out the ones that are not closed. Would that solution work for you?

The implementation details might change (keep a big slice vs some more performant concurrency-safe data structure) but I think the concept is actionable.

@knusbaum knusbaum changed the title tracer: Warn when developer code does not close all spans proposal: tracer: Warn when developer code does not close all spans Jul 11, 2023
@knusbaum
Copy link
Contributor

knusbaum commented Jul 11, 2023

I'm going to mark this as accepted. We plan on doing this in the near future (next few weeks). We will follow up with details.

@knusbaum knusbaum added proposal/accepted Accepted proposals scheduled This is for proposals which have been planned in the quarterly planning. tracer labels Jul 11, 2023
@ajgajg1134 ajgajg1134 reopened this Aug 22, 2023
@darccio
Copy link
Member

darccio commented Sep 6, 2023

@adammw We implemented the proposed feature. It'll be available in the upcoming v1.55.0 release.

Docs will be updated here at the release. Anyway, I paste here the relevant info so you can read it beforehand:

Abandoned span logs

The Datadog Go Tracer also supports logging for potentially abandoned spans. To enable this debug mode in Go, set the environment variable DD_TRACE_DEBUG_ABANDONED_SPANS=true. To change the duration after which spans are considered abandoned (default=10m), set the environment variable DD_TRACE_ABANDONED_SPAN_TIMEOUT to the desired time duration. Abandoned span logs appear at the Info level.

You can also enable debugging abandoned spans during the Start config:

package main

import (
  "time"

  "gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer"
)

func main() {
    tracer.Start(tracer.WithDebugSpansMode(10 * time.Minute))
    defer tracer.Stop()
}

@darccio darccio closed this as completed Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal/accepted Accepted proposals proposal more in depth change that requires full team approval scheduled This is for proposals which have been planned in the quarterly planning. tracer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants