-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tracer] using channels to send complete consistent traces #78
Conversation
1a408d8
to
fd38cfe
Compare
tracer/buffer.go
Outdated
// and avoid re-allocing. | ||
spans := sb.spans | ||
sb.spans = nil | ||
for _, span := range tb.spans { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of looping each time you try to do a doFlush
, can't you store an int
that counts the number of finished spans? even if looping here is not so expensive in the average case, we still have a loop and a RLock()
. So to check if a buffer is flushable, you may:
- when you add a
Span
indoPush()
, increase thetraceBuffer.finishedSpan
counter (when aSpan
is finished, its state is immutable so it's safe to always increase that number) - when you want to check if a buffer is finished, you check
if traceBuffer.finishedSpan == len(traceBuffer.spans)
makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the idea, indeed, checking number of finished spans with buffer length does make sense.
tracer/buffer.go
Outdated
} | ||
|
||
return &spansBuffer{maxSize: maxSize} | ||
tb.spans = append(tb.spans, span) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This array behaves a lot like a buffered channel. Switching to one would remove the need for an explicit lock and with Manu's suggestion of adding a counter it would still be possible to count the number of finished spans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this but at some point we really use a []*Span
later in the pipeline. IE it's not a pipeline but rather something that is filled until it's "complete" and then flushed/sent at once. Typically, filling the trace buffer would require emptying that buffer item by item and pump it into the final array. Will give it a 2nd look anyway, thanks for suggesting.
15bdf40
to
6c25cb6
Compare
9bc99f5
to
caab10f
Compare
Released this on staging for:
|
We've been running this version on prod for a couple of days now, looking fine. |
This contains two PRs: - #73 branch raphael/gocql (Cassandra support) - #78 branch christian/flushchannel (fix partial traces bug) Those 2 branches had conflicts, and the commit 6952e85 fixes the interactions between them. Additionnally, it has been running for 3 complete days on our prod env, so considered stable enough.
…ataDog#78) * Support the DD_ENTITY_ID envvar for container tagging * Support the DD_AGENT_HOST and DD_DOGSTATSD_PORT envvars for autoconfiguration * pr feedback * doc * don't use testing.Run, introduced in go 1.7 * golint * change entity tag name * Reword godoc
This is a variant of #77 it achieves the same thing (avoid sending partial, incomplete traces) only it does it with a deeper rewriting. The major change is: it uses a channel to send traces finished spans. Quick overview:
tracer.NewRootSpan
-> now creates a trace-bound in-memory buffer (Array) which should contain pointers to all spans belonging to this tracetracer.NewChildSpan
-> registers the child trace in the in-memory buffer abovespan.Finish()
-> now calls buffer.Flush()buffer.AckFinish()
-> called after every span.Finish(), checks if all spans in the buffer or finished, if yes, put all spans into a global, shared channeltracer.worker()
-> no more fills local buffers with traces or spans. Instead, only treats special events such as forced flush, and of course regular, time-based flushing. The low-level flush functions now locally empty the channels and send the data right away, suppressing the need for a persistent, local, mutex protected array.tracer.flush()
-> called on a regular basis, empties the tracer-bound buffer, sends data to agent (we still need this to bulk agent calls and mitigate protocol overhead)The trace channel is "non-blocking" (when writing to it, uses select/default combo to ensure it never affects user code) so at some point, under heavy heavy load, we could technically loose/ignore traces. It's going to show up in the logs anyway, and has to be a really heavy load (globally, more than 1000 traces per lib<->agent roundtrip, so, in practice, lib or agent being saturated).
IMPORTANT NOTICE, this patch deprecates:
SetSpansBufferSize
-> does not really make sense any more, there are several buffers & channels now (for spans, traces, services, errors, ...) and this one is not a game changer any more.FlushTraces
has been renamed toflushTraces
so becomes private. Instead, aForceFlush
method is provided, which is safe to use. It's not required in production, but it can be called, it won't harm, it's thread-safe. And flushes everything (not only traces).