[core] Implementation of the tracing package#2
Conversation
…; providing a DefaultTracer client, while missing the HTTP implementation
| if !s.IsFinished() { | ||
| s.Duration = Now() - s.Start | ||
|
|
||
| go func() { |
There was a problem hiding this comment.
two issues with this:
- spawning a goroutine per finished span tons of overhead.
- as you commented, this could block forever, which is completely unacceptable for client side code. a channel is not the right tool for this because if a reader crashes a customer app will lock up forever.
i'd much prefer if we took this bit of code from dd-go/dogtrace. this is already running on high traffic production systems, so let's just use it instead of inventing something new. (except for the onFinish stuff, that's completely unnecessary)
There was a problem hiding this comment.
OK it was a proposal. I can update this code with the delivery process that we're using in the dogtrace
There was a problem hiding this comment.
Reading some stuff around, it seems I can easily solve the point 2 (blocked go-routine) with a timeout so that if the reader crashes (the reader is the one that sends all spans to the locale/remote agent) the span is lost and the go-routine is killed. I think it makes sense because if the go-routine that sends data is dead, it's OK to lose the span because no one is capable to send it somewhere.
On the other hand, if we're using the current approach in dogtrace (without a go-routine), our code will block as you describe because it should obtain a lock and it will surely add some delay (maybe it should wait for another Span that has the lock?). Furthermore if the sender dies, the shared data structures (the children []*Span in the Span or a common global outgoingSpans []*Span) will grow without control causing a memory leak.
About the go-routine overhead, I think I can profile it because it's a built-in functionality and reading to golang.org blog posts, they were created for similar cases.
By the way I have a free weekend so I want to play a little bit with this stuff 😄
If you agree, I prefer to revert this part at the end so that I can spend some time with benchmarks and experiments.
|
i think we should only create spans from a tracer. and never from other spans. we've had APIs like this before and it's more confusing (when do i call i think something like this is simpler: tracer := datadog.NewTracer()
// create a new root span with some info
span := tracer.NewSpan("whatever")
defer span.Finish()
span.Service = "foo"
span.Resource = "whatever"
// create a span from another
child := tracer.NewChildSpan("another", span)
defer child.Finish()
// or from a special `datadog_trace` variable in context
child2 := tracer.NewSpanFromContext("yahoo", context)
child2.Finish() |
|
Yeah I agree. Having an API where developers must always use the |
| import ( | ||
| "sync" | ||
|
|
||
| log "github.com/cihub/seelog" |
There was a problem hiding this comment.
just a remark, but we definitely don't want to pull this dep in the final package. Also, seelog is madness, it often shows up in a top 30 cpu profile in our apps 😱
There was a problem hiding this comment.
ah OK.... I've seen that in a lot of our projects. There is something else I can use?
There was a problem hiding this comment.
I wonder if we should use a logging lib at all, or just use the primitive "log" package in Go. Do we need the user to pull another dependency for logging if he already has one?
Also we should try to keep our logging at a minimum (except if the user toggles debug mode).
There was a problem hiding this comment.
Yes, totally agree about limit our logging to warnings/errors. About the logging lib I don't know... I mean if the primitive log package provides all that we need, I think we can replace everything with that.
ab40cc4 to
c975ff8
Compare
| "math/rand" | ||
| "time" | ||
|
|
||
| log "github.com/cihub/seelog" |
There was a problem hiding this comment.
@LeoCavaille what do you think about that library (https://github.com/op/go-logging)? Have you ever used it?
| Start int64 `json:"start"` // span start time expressed in nanoseconds since epoch | ||
| Duration int64 `json:"duration"` // duration of the span expressed in nanoseconds | ||
| Error int32 `json:"error"` // error status of the span; 0 means no errors | ||
| Meta map[string]string `json:"meta"` // arbitrary map of metadata |
There was a problem hiding this comment.
@LeoCavaille may I add the omitempty for Meta and Metrics? the agent is expecting the meta field in any case even if it's nil?
There was a problem hiding this comment.
if it's not in the payload, it will just set it to nil when decoding, so yea.
|
Side note: as a last step I'll implement a kind of |
| func SpanFromContext(ctx context.Context) (*Span, bool) { | ||
| // TODO[manu]: split the return just for clarity of the review; one-liner later | ||
| span, ok := ctx.Value(datadogActiveSpanKey).(*Span) | ||
| return span, ok |
There was a problem hiding this comment.
A working example is:
func generateTemplate(ctx context.Context) {
parent, ok := tracer.SpanFromContext(ctx)
if !ok {
parent = tracer.NewSpan("pylons.template", "pylons", "index.html")
}
// template stuff
templateSpan := tracer.NewChildSpan("pylons.template", parent)
defer templateSpan.Finish()
templateSpan.Resource = "something"
time.Sleep(smallWait)
}There was a problem hiding this comment.
On a second look, I don't like the idea of initializing a parent span when SpanFromContext returns nil. Whose responsibility is it to finish the parent span in this case? The below example is closer to how I think this should be used:
func generateTemplate(ctx context.Context) {
parent, ok := tracer.SpanFromContext(ctx)
if !ok {
templateSpan := tracer.NewSpan("pylons.template", "pylons", "index.html")
} else {
templateSpan := tracer.NewChildSpan("pylons.template", parent)
}
// template stuff
defer templateSpan.Finish()
templateSpan.Resource = "something"
time.Sleep(smallWait)
}
There was a problem hiding this comment.
Yeah, totally agree.
talwai
left a comment
There was a problem hiding this comment.
I have reviewed this PR and contributed my humble feedback
| // the child is finished but it's not recorded in | ||
| // the tracer buffer | ||
| assert.True(child.Duration > 0) | ||
| assert.Equal(len(tracer.finishedSpans), 0) |
| def go_benchmark(path) | ||
| sh "go test -run=NONE -bench=. -memprofile=mem.out #{path}" | ||
| sh "go test -run=NONE -bench=. -cpuprofile=cpu.out #{path}" | ||
| sh "go test -run=NONE -bench=. -blockprofile=block.out #{path}" |
There was a problem hiding this comment.
or run them all together ?
go test ... -memprofile=mem.out -cpuprofile=cpu.out -blockprofile=block.out ...
There was a problem hiding this comment.
Ah, I don't have a lot of experience on that. I split them because I've found in The Go programming language book such sentence:
Gathering a profile for code under test is as easy as enabling one fo the flags below [-cpuprofile, etc...]. Be careful when using more than one flag at a time, however: the machinery for gathering one kind of profile may skew the results of others.
Do you have more info on that? It could be possible that the info is outdated or that this kind of benchmark may be executed just once.
| // initialize the Tracer | ||
| t := &Tracer{ | ||
| transport: NewHTTPTransport(defaultDeliveryURL), | ||
| flushTicker: time.NewTicker(flushInterval), |
There was a problem hiding this comment.
you just use the ticker in one method, not sure it's necessary to make it an attribute of the Tracer? Just instantiate and use it in that method?
for range time.Tick()
There was a problem hiding this comment.
Yeah, totally correct and thanks for catching that. Totally forget to remove that because it was related to the previous approach (multiple channels and go routines that can be stopped). Furthermore our goroutine never stops so there isn't any kind of "leak" for loosing the reference.
c0b1074 to
78989e1
Compare
| @@ -0,0 +1,121 @@ | |||
| package tracer | |||
|
|
||
| // Tracer is the common struct we use to collect, buffer | ||
| type Tracer struct { | ||
| enabled int32 // acts as bool to define if the Tracer is enabled or not |
| import: | ||
| - package: github.com/stretchr/testify | ||
| version: ^1.1.3 | ||
| - package: github.com/cihub/seelog |
| // Mock Transport | ||
| type DummyTransport struct{} | ||
|
|
||
| func (t *DummyTransport) Send(spans []*Span) error { return nil } |
There was a problem hiding this comment.
the dummy transport should encode the spans to make sure the test data in sane
| parent := tracer.NewSpan("pylons.request", "pylons", "/") | ||
| child := tracer.NewChildSpan("redis.command", parent) | ||
| assert.Equal(child.ParentID, parent.SpanID) | ||
| assert.Equal(child.TraceID, parent.TraceID) |
There was a problem hiding this comment.
please test the right thigs are inherited
| } | ||
|
|
||
| // child that is correctly configured | ||
| return newSpan(name, parent.Service, parent.Resource, spanID, parent.TraceID, parent.SpanID, parent.tracer) |
There was a problem hiding this comment.
it shouldn't be inheriting the resource. only the service & ids (see python client)
|
|
||
| type datadogContextKey struct{} | ||
|
|
||
| var datadogActiveSpanKey = datadogContextKey{} |
There was a problem hiding this comment.
using an empty struct as the key is kinda crazy. is this idiomatic? why not "datadog_trace_span"
There was a problem hiding this comment.
Found that in the opentracing-go client. I'm not an expert here but yes.... think a string is enough?
| // Encode returns a byte array related to the marshalling | ||
| // of a list of spans. | ||
| func (e *JSONEncoder) Encode(spans []*Span) ([]byte, error) { | ||
| return json.Marshal(spans) |
There was a problem hiding this comment.
this is really slow let's use the encoder pool from dogtrace. been down this road :)
| return err | ||
| } | ||
|
|
||
| defer response.Body.Close() |
There was a problem hiding this comment.
no need to defer at the end of a function. needless perf cost. you're already at the end
This PR updates the behavior of `WithMaxQuerySize` when `max=0` to avoid attaching the query tag entirely. This is more intuitive ("max query size of zero") and gives folks a way to disable serializing the command entirely.
Core implementation to handle
spans and thetracerobjects. The following is a working example of our tracer package:What is missing?
tests are not pushed until this approach is acceptedthe dispatch implementationremove some parameters from theNewChildSpan()signature. We may just:handling thecontext.Contextobject so that thectxcan be passed as argument between functions. Actxmust be used to retrieve aspanif a parent is available.