[core] Implementation of the tracing package by palazzem · Pull Request #2 · DataDog/dd-trace-go

palazzem · 2016-09-09T14:22:28Z

Core implementation to handle spans and the tracer objects. The following is a working example of our tracer package:

package main

import (
    "fmt"
    "time"

    "golang.org/x/net/context"

    // import the Datadog tracer
    "github.com/DataDog/dd-trace-go/pkg/tracer"
)

var smallWait = time.Millisecond * 30

func main() {
    fmt.Println("Starting the fake application")
    for {
        // creation of the context object is delegated to the user
        ctx := context.Background()

        // main request
        span := tracer.NewSpan("pylons.request", "pylons", "/")
        time.Sleep(smallWait)

        ctx = tracer.ContextWithSpan(ctx, span)
        generateTemplate(ctx)

        // cache stuff
        cacheSpan := tracer.NewChildSpan("redis.command", span)
        time.Sleep(smallWait)
        cacheSpan.Finish()

        // close the parent span
        span.Finish()
    }
}

func generateTemplate(ctx context.Context) {
    parent, ok := tracer.SpanFromContext(ctx)

    // template stuff
    templateSpan := tracer.NewChildSpan("pylons.template", parent)
    defer templateSpan.Finish()
    templateSpan.Resource = "something"
    time.Sleep(smallWait)
}

What is missing?

~~tests are not pushed until this approach is accepted~~
~~the dispatch implementation~~
~~remove some parameters from the NewChildSpan() signature. We may just:~~

child := NewChildSpan("something", parent)

~~handling the context.Context object so that the ctx can be passed as argument between functions. A ctx must be used to retrieve a span if a parent is available.~~

… Finish() method

…; providing a DefaultTracer client, while missing the HTTP implementation

clutchski · 2016-09-09T14:30:49Z

+	if !s.IsFinished() {
+		s.Duration = Now() - s.Start
+
+		go func() {


two issues with this:

spawning a goroutine per finished span tons of overhead.

as you commented, this could block forever, which is completely unacceptable for client side code. a channel is not the right tool for this because if a reader crashes a customer app will lock up forever.

i'd much prefer if we took this bit of code from dd-go/dogtrace. this is already running on high traffic production systems, so let's just use it instead of inventing something new. (except for the onFinish stuff, that's completely unnecessary)

OK it was a proposal. I can update this code with the delivery process that we're using in the dogtrace

Reading some stuff around, it seems I can easily solve the point 2 (blocked go-routine) with a timeout so that if the reader crashes (the reader is the one that sends all spans to the locale/remote agent) the span is lost and the go-routine is killed. I think it makes sense because if the go-routine that sends data is dead, it's OK to lose the span because no one is capable to send it somewhere.

On the other hand, if we're using the current approach in dogtrace (without a go-routine), our code will block as you describe because it should obtain a lock and it will surely add some delay (maybe it should wait for another Span that has the lock?). Furthermore if the sender dies, the shared data structures (the children []*Span in the Span or a common global outgoingSpans []*Span) will grow without control causing a memory leak.

About the go-routine overhead, I think I can profile it because it's a built-in functionality and reading to golang.org blog posts, they were created for similar cases.

By the way I have a free weekend so I want to play a little bit with this stuff 😄
If you agree, I prefer to revert this part at the end so that I can spend some time with benchmarks and experiments.

clutchski · 2016-09-09T14:44:28Z

i think we should only create spans from a tracer. and never from other spans. we've had APIs like this before and it's more confusing (when do i call span.Nest or tracer.NewSpan?)

i think something like this is simpler:

tracer := datadog.NewTracer()

// create a new root span with some info
span := tracer.NewSpan("whatever")
defer span.Finish()
span.Service = "foo"
span.Resource = "whatever"

// create a span from another
child := tracer.NewChildSpan("another", span)
defer child.Finish()

// or from a special `datadog_trace` variable in context
child2 := tracer.NewSpanFromContext("yahoo", context)
child2.Finish()

palazzem · 2016-09-10T09:39:58Z

Yeah I agree. Having an API where developers must always use the tracer default (or custom) instance is better. I've changed the high-level API according to your suggestions.

LeoCavaille · 2016-09-10T11:02:19Z

+import (
+	"sync"
+
+	log "github.com/cihub/seelog"


just a remark, but we definitely don't want to pull this dep in the final package. Also, seelog is madness, it often shows up in a top 30 cpu profile in our apps 😱

ah OK.... I've seen that in a lot of our projects. There is something else I can use?

I wonder if we should use a logging lib at all, or just use the primitive "log" package in Go. Do we need the user to pull another dependency for logging if he already has one?
Also we should try to keep our logging at a minimum (except if the user toggles debug mode).

Yes, totally agree about limit our logging to warnings/errors. About the logging lib I don't know... I mean if the primitive log package provides all that we need, I think we can replace everything with that.

palazzem · 2016-09-11T10:17:11Z

+	"math/rand"
+	"time"
+
+	log "github.com/cihub/seelog"


@LeoCavaille what do you think about that library (https://github.com/op/go-logging)? Have you ever used it?

…Transport

palazzem · 2016-09-11T10:46:07Z

+	Start    int64              `json:"start"`     // span start time expressed in nanoseconds since epoch
+	Duration int64              `json:"duration"`  // duration of the span expressed in nanoseconds
+	Error    int32              `json:"error"`     // error status of the span; 0 means no errors
+	Meta     map[string]string  `json:"meta"`      // arbitrary map of metadata


@LeoCavaille may I add the omitempty for Meta and Metrics? the agent is expecting the meta field in any case even if it's nil?

if it's not in the payload, it will just set it to nil when decoding, so yea.

palazzem · 2016-09-11T11:33:40Z

Side note: as a last step I'll implement a kind of tracer.Disable() and tracer.Enable() so that customers that have a huge instrumented codebase, can easily disable the tracer (with one line) if something bad happens.

palazzem · 2016-09-14T15:58:54Z

+func SpanFromContext(ctx context.Context) (*Span, bool) {
+	// TODO[manu]: split the return just for clarity of the review; one-liner later
+	span, ok := ctx.Value(datadogActiveSpanKey).(*Span)
+	return span, ok


A working example is:

func generateTemplate(ctx context.Context) { parent, ok := tracer.SpanFromContext(ctx) if !ok { parent = tracer.NewSpan("pylons.template", "pylons", "index.html") } // template stuff templateSpan := tracer.NewChildSpan("pylons.template", parent) defer templateSpan.Finish() templateSpan.Resource = "something" time.Sleep(smallWait) }

On a second look, I don't like the idea of initializing a parent span when SpanFromContext returns nil. Whose responsibility is it to finish the parent span in this case? The below example is closer to how I think this should be used:

func generateTemplate(ctx context.Context) { parent, ok := tracer.SpanFromContext(ctx) if !ok { templateSpan := tracer.NewSpan("pylons.template", "pylons", "index.html") } else { templateSpan := tracer.NewChildSpan("pylons.template", parent) } // template stuff defer templateSpan.Finish() templateSpan.Resource = "something" time.Sleep(smallWait) }

Yeah, totally agree.

talwai

I have reviewed this PR and contributed my humble feedback

talwai · 2016-09-15T09:08:37Z

+	// the child is finished but it's not recorded in
+	// the tracer buffer
+	assert.True(child.Duration > 0)
+	assert.Equal(len(tracer.finishedSpans), 0)


LeoCavaille · 2016-09-15T08:56:16Z

+def go_benchmark(path)
+  sh "go test -run=NONE -bench=. -memprofile=mem.out #{path}"
+  sh "go test -run=NONE -bench=. -cpuprofile=cpu.out #{path}"
+  sh "go test -run=NONE -bench=. -blockprofile=block.out #{path}"


or run them all together ?
go test ... -memprofile=mem.out -cpuprofile=cpu.out -blockprofile=block.out ...

Ah, I don't have a lot of experience on that. I split them because I've found in The Go programming language book such sentence:

Gathering a profile for code under test is as easy as enabling one fo the flags below [-cpuprofile, etc...]. Be careful when using more than one flag at a time, however: the machinery for gathering one kind of profile may skew the results of others.

Do you have more info on that? It could be possible that the info is outdated or that this kind of benchmark may be executed just once.

LeoCavaille · 2016-09-15T09:11:34Z

+	// initialize the Tracer
+	t := &Tracer{
+		transport:   NewHTTPTransport(defaultDeliveryURL),
+		flushTicker: time.NewTicker(flushInterval),


you just use the ticker in one method, not sure it's necessary to make it an attribute of the Tracer? Just instantiate and use it in that method?

for range time.Tick()

Yeah, totally correct and thanks for catching that. Totally forget to remove that because it was related to the previous approach (multiple channels and go routines that can be stopped). Furthermore our goroutine never stops so there isn't any kind of "leak" for loosing the reference.

clutchski · 2016-09-15T15:25:58Z

@@ -0,0 +1,121 @@
+package tracer


rm the pkg directory

clutchski · 2016-09-15T15:26:09Z

+
+// Tracer is the common struct we use to collect, buffer
+type Tracer struct {
+	enabled   int32     // acts as bool to define if the Tracer is enabled or not


use a bool here

clutchski · 2016-09-15T15:26:23Z

 import:
 - package: github.com/stretchr/testify
  version: ^1.1.3
+- package: github.com/cihub/seelog


drop this dependency

clutchski · 2016-09-15T15:27:10Z

+// Mock Transport
+type DummyTransport struct{}
+
+func (t *DummyTransport) Send(spans []*Span) error { return nil }


the dummy transport should encode the spans to make sure the test data in sane

clutchski · 2016-09-15T15:27:32Z

+	parent := tracer.NewSpan("pylons.request", "pylons", "/")
+	child := tracer.NewChildSpan("redis.command", parent)
+	assert.Equal(child.ParentID, parent.SpanID)
+	assert.Equal(child.TraceID, parent.TraceID)


please test the right thigs are inherited

clutchski · 2016-09-15T15:27:50Z

+	}
+
+	// child that is correctly configured
+	return newSpan(name, parent.Service, parent.Resource, spanID, parent.TraceID, parent.SpanID, parent.tracer)


it shouldn't be inheriting the resource. only the service & ids (see python client)

clutchski · 2016-09-15T15:44:10Z

+
+type datadogContextKey struct{}
+
+var datadogActiveSpanKey = datadogContextKey{}


using an empty struct as the key is kinda crazy. is this idiomatic? why not "datadog_trace_span"

Found that in the opentracing-go client. I'm not an expert here but yes.... think a string is enough?

clutchski · 2016-09-15T15:49:20Z

+// Encode returns a byte array related to the marshalling
+// of a list of spans.
+func (e *JSONEncoder) Encode(spans []*Span) ([]byte, error) {
+	return json.Marshal(spans)


this is really slow let's use the encoder pool from dogtrace. been down this road :)

clutchski · 2016-09-15T15:54:56Z

+		return err
+	}
+
+	defer response.Body.Close()


no need to defer at the end of a function. needless perf cost. you're already at the end

This PR updates the behavior of `WithMaxQuerySize` when `max=0` to avoid attaching the query tag entirely. This is more intuitive ("max query size of zero") and gives folks a way to disable serializing the command entirely.

Emanuele Palazzetti added 5 commits September 9, 2016 10:19

span struct with main fields

a1396cf

provide high-level API for the Span struct; partial management of the…

c5a782a

… Finish() method

add error management for the Span struct

f9c35dd

add seelog dependency

de94320

add the Tracer object that holds the logic to start a tracing session…

3416f3c

…; providing a DefaultTracer client, while missing the HTTP implementation

palazzem assigned LeoCavaille Sep 9, 2016

clutchski reviewed Sep 9, 2016
View reviewed changes

LeoCavaille reviewed Sep 10, 2016
View reviewed changes

palazzem added the core label Sep 10, 2016

Emanuele Palazzetti added 6 commits September 10, 2016 16:59

all functions to create spans (root or child) belongs to the Tracer

5a385aa

add SetMetrics to the Span struct

8db6cac

renaming the default error meta

a5dc8fe

tracer.Wait() waits until a timeout elapses

a42d77f

handling the err == nil case for span.SetError()

17756b4

add base span tests

c975ff8

palazzem force-pushed the palazzem/trace-span-structs branch from ab40cc4 to c975ff8 Compare September 10, 2016 14:59

Emanuele Palazzetti added 3 commits September 10, 2016 17:43

test to call Finish twice

4437580

add tests for the Tracer struct

76b21a9

spans enqueuing includes a timeout

9747630

palazzem reviewed Sep 11, 2016
View reviewed changes

add the encoder interface that handles the JSON encoding for the HTTP…

46b7868

…Transport

palazzem reviewed Sep 11, 2016
View reviewed changes

move the Transport interface in its own file

b397e72

palazzem reviewed Sep 14, 2016
View reviewed changes

talwai reviewed Sep 15, 2016

View reviewed changes

LeoCavaille reviewed Sep 15, 2016

View reviewed changes

Emanuele Palazzetti added 5 commits September 15, 2016 14:33

add separate tests execution to detect race conditions

db0ad34

use atomic to make the boolean access thread-safe

bbb2444

run benchmarks once using all profilers

7ff57af

use a time.Tick() instead of a NewTicker channel

e104a4e

minor style changes

78989e1

palazzem force-pushed the palazzem/trace-span-structs branch from c0b1074 to 78989e1 Compare September 15, 2016 12:56

palazzem changed the title ~~[WIP] tracer, span and the high-level API~~ [core] Implementation of the tracing package Sep 15, 2016

palazzem merged commit 7b3a665 into master Sep 15, 2016

palazzem deleted the palazzem/trace-span-structs branch September 15, 2016 14:02

clutchski reviewed Sep 15, 2016

View reviewed changes

Comment thread pkg/tracer/span.go

@@ -0,0 +1,121 @@

package tracer

Copy link
Copy Markdown

Contributor

clutchski Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm the pkg directory

clutchski reviewed Sep 15, 2016

View reviewed changes

palazzem restored the palazzem/trace-span-structs branch September 15, 2016 16:18

mtoffl01 mentioned this pull request May 11, 2023

tracer: request Headers as tags for web integrations #1764

Merged

3 tasks

mtoffl01 mentioned this pull request Mar 26, 2025

[BUG] contrib/sirupsen/logrus: Span IDs changed and missing support for 128-bit Trace IDs #3324

Closed

rustybrooks-realitydefender mentioned this pull request Jun 24, 2025

[BUG]: new baggage propagator default breaks existing tracing #3556

Closed

leoromanovsky mentioned this pull request Mar 3, 2026

feat(openfeature): add flag evaluation tracking via OTel Metrics #4489

Merged


		type datadogContextKey struct{}

		var datadogActiveSpanKey = datadogContextKey{}

Conversation

palazzem commented Sep 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is missing?

Uh oh!

clutchski Sep 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

palazzem Sep 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clutchski commented Sep 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palazzem commented Sep 10, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

palazzem Sep 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

palazzem commented Sep 11, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

talwai Sep 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

talwai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LeoCavaille Sep 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clutchski Sep 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

palazzem commented Sep 9, 2016 •

edited

Loading

clutchski Sep 9, 2016 •

edited

Loading

palazzem Sep 10, 2016 •

edited

Loading

clutchski commented Sep 9, 2016 •

edited

Loading

palazzem Sep 12, 2016 •

edited

Loading

talwai Sep 15, 2016 •

edited

Loading

LeoCavaille Sep 15, 2016 •

edited

Loading

clutchski Sep 15, 2016 •

edited

Loading