-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataDog and StatsD Metrics Support #1701
Changes from all commits
caba277
fc6220a
bbbb7e4
810a5f1
9a646a7
e864a38
225a1c7
613c270
88c99fd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
package middlewares | ||
|
||
import ( | ||
"time" | ||
|
||
"github.com/containous/traefik/log" | ||
"github.com/containous/traefik/safe" | ||
"github.com/containous/traefik/types" | ||
kitlog "github.com/go-kit/kit/log" | ||
"github.com/go-kit/kit/metrics" | ||
"github.com/go-kit/kit/metrics/dogstatsd" | ||
) | ||
|
||
var _ Metrics = (Metrics)(nil) | ||
|
||
var datadogClient = dogstatsd.New("traefik.", kitlog.LoggerFunc(func(keyvals ...interface{}) error { | ||
log.Info(keyvals) | ||
return nil | ||
})) | ||
|
||
var datadogTicker *time.Ticker | ||
|
||
// Metric names consistent with https://github.com/DataDog/integrations-extras/pull/64 | ||
const ( | ||
ddMetricsReqsName = "requests.total" | ||
ddMetricsLatencyName = "request.duration" | ||
) | ||
|
||
// Datadog is an Implementation for Metrics that exposes datadog metrics for the latency | ||
// and the number of requests partitioned by status code and method. | ||
// - number of requests partitioned by status code and method | ||
// - request durations | ||
// - amount of retries happened | ||
type Datadog struct { | ||
reqsCounter metrics.Counter | ||
reqDurationHistogram metrics.Histogram | ||
retryCounter metrics.Counter | ||
} | ||
|
||
func (dd *Datadog) getReqsCounter() metrics.Counter { | ||
return dd.reqsCounter | ||
} | ||
|
||
func (dd *Datadog) getReqDurationHistogram() metrics.Histogram { | ||
return dd.reqDurationHistogram | ||
} | ||
|
||
func (dd *Datadog) getRetryCounter() metrics.Counter { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @marco-jantke Great catch... the retry metric was added after the PR was already open, so I must've lost the change during one of the numerous rebases. The good news is that the code won't |
||
return dd.retryCounter | ||
} | ||
|
||
// NewDataDog creates new instance of Datadog | ||
func NewDataDog(name string) *Datadog { | ||
var m Datadog | ||
|
||
m.reqsCounter = datadogClient.NewCounter(ddMetricsReqsName, 1.0).With("service", name) | ||
m.reqDurationHistogram = datadogClient.NewHistogram(ddMetricsLatencyName, 1.0).With("service", name) | ||
|
||
return &m | ||
} | ||
|
||
// InitDatadogClient initializes metrics pusher and creates a datadogClient if not created already | ||
func InitDatadogClient(config *types.Datadog) *time.Ticker { | ||
if datadogTicker == nil { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suppose we do not enter this function from multiple goroutines concurrently, and hence don't need to synchronize access? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct. We only call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An alternative to comparing the ticker could then be to use More of a suggestion though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's good enough in this case. Thanks for giving it a shot. 👍 |
||
address := config.Address | ||
if len(address) == 0 { | ||
address = "localhost:8125" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we default to this address? Is this somehow conventional in Datadog? Instinctively, I tend to go with requiring proper configuration and -- if missing -- return an error, unless a default makes sense. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The default DataDog statsD listener is running on localhost:8125, which is a well known port, so it is very common for people to not define it explicitly and use the default (be that in DataDog agent config, or in the clients) (See default dd-agent config for reference - https://github.com/DataDog/dd-agent/blob/master/datadog.conf.example#L163) |
||
} | ||
pushInterval, err := time.ParseDuration(config.PushInterval) | ||
if err != nil { | ||
log.Warnf("Unable to parse %s into pushInterval, using 10s as default value", config.PushInterval) | ||
pushInterval = 10 * time.Second | ||
} | ||
|
||
report := time.NewTicker(pushInterval) | ||
|
||
safe.Go(func() { | ||
datadogClient.SendLoop(report.C, "udp", address) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit concerned on how this routine will be stopped as there is no stop chan nor cancel-able context. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given your next comment below, would it be better to call InitDatadogClient in the startup sequence, if the config has the configuration and just keep the client reusable all throughout the runtime of Traefik? This way the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would prefer being able to stop gracefully this routine. |
||
}) | ||
|
||
datadogTicker = report | ||
} | ||
return datadogTicker | ||
} | ||
|
||
// StopDatadogClient stops internal datadogTicker which controls the pushing of metrics to DD Agent and resets it to `nil` | ||
func StopDatadogClient() { | ||
if datadogTicker != nil { | ||
datadogTicker.Stop() | ||
} | ||
datadogTicker = nil | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
package middlewares | ||
|
||
import ( | ||
"fmt" | ||
"net/http" | ||
"net/http/httptest" | ||
"testing" | ||
"time" | ||
|
||
"github.com/containous/traefik/testhelpers" | ||
"github.com/containous/traefik/types" | ||
"github.com/stvp/go-udp-testing" | ||
"github.com/urfave/negroni" | ||
) | ||
|
||
func TestDatadog(t *testing.T) { | ||
udp.SetAddr(":18125") | ||
// This is needed to make sure that UDP Listener listens for data a bit longer, otherwise it will quit after a millisecond | ||
udp.Timeout = 5 * time.Second | ||
recorder := httptest.NewRecorder() | ||
InitDatadogClient(&types.Datadog{":18125", "1s"}) | ||
|
||
n := negroni.New() | ||
dd := NewDataDog("test") | ||
defer StopDatadogClient() | ||
metricsMiddlewareBackend := NewMetricsWrapper(dd) | ||
|
||
n.Use(metricsMiddlewareBackend) | ||
r := http.NewServeMux() | ||
r.HandleFunc(`/ok`, func(w http.ResponseWriter, r *http.Request) { | ||
w.WriteHeader(http.StatusOK) | ||
fmt.Fprintln(w, "ok") | ||
}) | ||
r.HandleFunc(`/not-found`, func(w http.ResponseWriter, r *http.Request) { | ||
w.WriteHeader(http.StatusNotFound) | ||
fmt.Fprintln(w, "not-found") | ||
}) | ||
n.UseHandler(r) | ||
|
||
req1 := testhelpers.MustNewRequest(http.MethodGet, "http://localhost:3000/ok", nil) | ||
req2 := testhelpers.MustNewRequest(http.MethodGet, "http://localhost:3000/not-found", nil) | ||
|
||
expected := []string{ | ||
// We are only validating counts, as it is nearly impossible to validate latency, since it varies every run | ||
"traefik.requests.total:1.000000|c|#service:test,code:404,method:GET\n", | ||
"traefik.requests.total:1.000000|c|#service:test,code:200,method:GET\n", | ||
} | ||
|
||
udp.ShouldReceiveAll(t, expected, func() { | ||
n.ServeHTTP(recorder, req1) | ||
n.ServeHTTP(recorder, req2) | ||
}) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,7 @@ import ( | |
"time" | ||
|
||
"github.com/go-kit/kit/metrics" | ||
"github.com/go-kit/kit/metrics/multi" | ||
) | ||
|
||
// Metrics is an Interface that must be satisfied by any system that | ||
|
@@ -22,6 +23,52 @@ type RetryMetrics interface { | |
getRetryCounter() metrics.Counter | ||
} | ||
|
||
// MultiMetrics is a struct that provides a wrapper container for multiple Metrics, if they are configured | ||
type MultiMetrics struct { | ||
wrappedMetrics *[]Metrics | ||
reqsCounter metrics.Counter | ||
reqDurationHistogram metrics.Histogram | ||
retryCounter metrics.Counter | ||
} | ||
|
||
// NewMultiMetrics creates a new instance of MultiMetrics | ||
func NewMultiMetrics(manyMetrics []Metrics) *MultiMetrics { | ||
counters := []metrics.Counter{} | ||
histograms := []metrics.Histogram{} | ||
retryCounters := []metrics.Counter{} | ||
|
||
for _, m := range manyMetrics { | ||
counters = append(counters, m.getReqsCounter()) | ||
histograms = append(histograms, m.getReqDurationHistogram()) | ||
retryCounters = append(retryCounters, m.getRetryCounter()) | ||
} | ||
|
||
var mm MultiMetrics | ||
|
||
mm.wrappedMetrics = &manyMetrics | ||
mm.reqsCounter = multi.NewCounter(counters...) | ||
mm.reqDurationHistogram = multi.NewHistogram(histograms...) | ||
mm.retryCounter = multi.NewCounter(retryCounters...) | ||
|
||
return &mm | ||
} | ||
|
||
func (mm *MultiMetrics) getReqsCounter() metrics.Counter { | ||
return mm.reqsCounter | ||
} | ||
|
||
func (mm *MultiMetrics) getReqDurationHistogram() metrics.Histogram { | ||
return mm.reqDurationHistogram | ||
} | ||
|
||
func (mm *MultiMetrics) getRetryCounter() metrics.Counter { | ||
return mm.retryCounter | ||
} | ||
|
||
func (mm *MultiMetrics) getWrappedMetrics() *[]Metrics { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAICS this method is used nowhere. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, you are correct... Same PR that added retryMetrics, removed the |
||
return mm.wrappedMetrics | ||
} | ||
|
||
// MetricsWrapper is a Negroni compatible Handler which relies on a | ||
// given Metrics implementation to expose and monitor Traefik Metrics. | ||
type MetricsWrapper struct { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this statement have any effect? I saw these kind of constructs to verify a concrete implementation is implementing an interface. To have the
Metrics
on the right hand side doesn't make any sense to me. Can you explain it or clean it up?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right hand side should have been
(*DataDog)(nil)
. Fixed.