Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataDog and StatsD Metrics Support #1701

Merged
merged 9 commits into from Jul 20, 2017
10 changes: 10 additions & 0 deletions docs/toml.md
Expand Up @@ -648,6 +648,16 @@ address = ":8080"
# [web.metrics.prometheus]
# Buckets=[0.1,0.3,1.2,5.0]
#
# To enable Traefik to export internal metics to DataDog
# [web.metrics.datadog]
# Address = localhost:8125
# PushInterval = "10s"
#
# To enable Traefik to export internal metics to StatsD
# [web.metrics.statsd]
# Address = localhost:8125
# PushInterval = "10s"
#
# To enable basic auth on the webui
# with 2 user/pass: test:test and test2:test2
# Passwords can be encoded in MD5, SHA1 and BCrypt: you can use htpasswd to generate those ones
Expand Down
22 changes: 18 additions & 4 deletions glide.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 8 additions & 1 deletion glide.yaml
Expand Up @@ -22,7 +22,7 @@ import:
- roundrobin
- stream
- utils
- name: github.com/urfave/negroni
- package: github.com/urfave/negroni
version: 490e6a555d47ca891a89a150d0c1ef3922dfffe9
- package: github.com/containous/staert
version: 1e26a71803e428fd933f5f9c8e50a26878f53147
Expand Down Expand Up @@ -99,7 +99,13 @@ import:
- package: github.com/go-kit/kit
version: v0.3.0
subpackages:
- log
- metrics
- metrics/dogstatsd
- metrics/multi
- metrics/prometheus
- metrics/statsd
- util/conn
- package: github.com/prometheus/client_golang
version: 08fd2e12372a66e68e30523c7642e0cbc3e4fbde
subpackages:
Expand Down Expand Up @@ -190,6 +196,7 @@ import:
subpackages:
- spew
testImport:
- package: github.com/stvp/go-udp-testing
- package: github.com/docker/libcompose
version: 0ad950cbeb3d72107613dd220b5e9d7e001b890b
- package: github.com/go-check/check
Expand Down
92 changes: 92 additions & 0 deletions middlewares/datadog.go
@@ -0,0 +1,92 @@
package middlewares

import (
"time"

"github.com/containous/traefik/log"
"github.com/containous/traefik/safe"
"github.com/containous/traefik/types"
kitlog "github.com/go-kit/kit/log"
"github.com/go-kit/kit/metrics"
"github.com/go-kit/kit/metrics/dogstatsd"
)

var _ Metrics = (Metrics)(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this statement have any effect? I saw these kind of constructs to verify a concrete implementation is implementing an interface. To have the Metrics on the right hand side doesn't make any sense to me. Can you explain it or clean it up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right hand side should have been (*DataDog)(nil). Fixed.


var datadogClient = dogstatsd.New("traefik.", kitlog.LoggerFunc(func(keyvals ...interface{}) error {
log.Info(keyvals)
return nil
}))

var datadogTicker *time.Ticker

// Metric names consistent with https://github.com/DataDog/integrations-extras/pull/64
const (
ddMetricsReqsName = "requests.total"
ddMetricsLatencyName = "request.duration"
)

// Datadog is an Implementation for Metrics that exposes datadog metrics for the latency
// and the number of requests partitioned by status code and method.
// - number of requests partitioned by status code and method
// - request durations
// - amount of retries happened
type Datadog struct {
reqsCounter metrics.Counter
reqDurationHistogram metrics.Histogram
retryCounter metrics.Counter
}

func (dd *Datadog) getReqsCounter() metrics.Counter {
return dd.reqsCounter
}

func (dd *Datadog) getReqDurationHistogram() metrics.Histogram {
return dd.reqDurationHistogram
}

func (dd *Datadog) getRetryCounter() metrics.Counter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As retryCounter never gets initialised won't this lead to a panic when having DataDog implementation (same for StatsD) + Retries activated? Did you try this out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-jantke Great catch... the retry metric was added after the PR was already open, so I must've lost the change during one of the numerous rebases. The good news is that the code won't panic, since there is a m.retryMetrics != nil check before the counter is used, but the bad news is that we are missing retry metrics for DataDog & StatsD. Thanks a lot for noticing, I'll add those to the initialization block and open another PR.

return dd.retryCounter
}

// NewDataDog creates new instance of Datadog
func NewDataDog(name string) *Datadog {
var m Datadog

m.reqsCounter = datadogClient.NewCounter(ddMetricsReqsName, 1.0).With("service", name)
m.reqDurationHistogram = datadogClient.NewHistogram(ddMetricsLatencyName, 1.0).With("service", name)

return &m
}

// InitDatadogClient initializes metrics pusher and creates a datadogClient if not created already
func InitDatadogClient(config *types.Datadog) *time.Ticker {
if datadogTicker == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we do not enter this function from multiple goroutines concurrently, and hence don't need to synchronize access?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. We only call InitDatadogClient once when the server starts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to comparing the ticker could then be to use sync.Once. This may also help us avoid accidentally initializing the client more than once if for whatever reason we toggle the ticker value inadvertently (say, due to a bug).

More of a suggestion though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync.Once sounds like a good idea. I've tried to add it, but realized that I still need to keep track of the ticker reference, so that I could stop it once the shutdown occurs. Since I have to keep the ticker variable around anyways, do you want to introduce another variable to store the once, or just let it be and use the null check instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good enough in this case. Thanks for giving it a shot. 👍

address := config.Address
if len(address) == 0 {
address = "localhost:8125"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we default to this address? Is this somehow conventional in Datadog?

Instinctively, I tend to go with requiring proper configuration and -- if missing -- return an error, unless a default makes sense.

Copy link
Contributor Author

@aantono aantono Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default DataDog statsD listener is running on localhost:8125, which is a well known port, so it is very common for people to not define it explicitly and use the default (be that in DataDog agent config, or in the clients) (See default dd-agent config for reference - https://github.com/DataDog/dd-agent/blob/master/datadog.conf.example#L163)

}
pushInterval, err := time.ParseDuration(config.PushInterval)
if err != nil {
log.Warnf("Unable to parse %s into pushInterval, using 10s as default value", config.PushInterval)
pushInterval = 10 * time.Second
}

report := time.NewTicker(pushInterval)

safe.Go(func() {
datadogClient.SendLoop(report.C, "udp", address)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned on how this routine will be stopped as there is no stop chan nor cancel-able context.
InitDatadogClient will be called at each new configuration refresh, we really should cancel this routine or we will get a memory leak here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given your next comment below, would it be better to call InitDatadogClient in the startup sequence, if the config has the configuration and just keep the client reusable all throughout the runtime of Traefik? This way the safe.Go routine does not need to be managed and stopped explicitly, as it will only need to be terminated when Traefik process is being shut down. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer being able to stop gracefully this routine.
You already have a Stop() func that fits well, which is never called BTW.

})

datadogTicker = report
}
return datadogTicker
}

// StopDatadogClient stops internal datadogTicker which controls the pushing of metrics to DD Agent and resets it to `nil`
func StopDatadogClient() {
if datadogTicker != nil {
datadogTicker.Stop()
}
datadogTicker = nil
}
53 changes: 53 additions & 0 deletions middlewares/datadog_test.go
@@ -0,0 +1,53 @@
package middlewares

import (
"fmt"
"net/http"
"net/http/httptest"
"testing"
"time"

"github.com/containous/traefik/testhelpers"
"github.com/containous/traefik/types"
"github.com/stvp/go-udp-testing"
"github.com/urfave/negroni"
)

func TestDatadog(t *testing.T) {
udp.SetAddr(":18125")
// This is needed to make sure that UDP Listener listens for data a bit longer, otherwise it will quit after a millisecond
udp.Timeout = 5 * time.Second
recorder := httptest.NewRecorder()
InitDatadogClient(&types.Datadog{":18125", "1s"})

n := negroni.New()
dd := NewDataDog("test")
defer StopDatadogClient()
metricsMiddlewareBackend := NewMetricsWrapper(dd)

n.Use(metricsMiddlewareBackend)
r := http.NewServeMux()
r.HandleFunc(`/ok`, func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
fmt.Fprintln(w, "ok")
})
r.HandleFunc(`/not-found`, func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusNotFound)
fmt.Fprintln(w, "not-found")
})
n.UseHandler(r)

req1 := testhelpers.MustNewRequest(http.MethodGet, "http://localhost:3000/ok", nil)
req2 := testhelpers.MustNewRequest(http.MethodGet, "http://localhost:3000/not-found", nil)

expected := []string{
// We are only validating counts, as it is nearly impossible to validate latency, since it varies every run
"traefik.requests.total:1.000000|c|#service:test,code:404,method:GET\n",
"traefik.requests.total:1.000000|c|#service:test,code:200,method:GET\n",
}

udp.ShouldReceiveAll(t, expected, func() {
n.ServeHTTP(recorder, req1)
n.ServeHTTP(recorder, req2)
})
}
47 changes: 47 additions & 0 deletions middlewares/metrics.go
Expand Up @@ -6,6 +6,7 @@ import (
"time"

"github.com/go-kit/kit/metrics"
"github.com/go-kit/kit/metrics/multi"
)

// Metrics is an Interface that must be satisfied by any system that
Expand All @@ -22,6 +23,52 @@ type RetryMetrics interface {
getRetryCounter() metrics.Counter
}

// MultiMetrics is a struct that provides a wrapper container for multiple Metrics, if they are configured
type MultiMetrics struct {
wrappedMetrics *[]Metrics
reqsCounter metrics.Counter
reqDurationHistogram metrics.Histogram
retryCounter metrics.Counter
}

// NewMultiMetrics creates a new instance of MultiMetrics
func NewMultiMetrics(manyMetrics []Metrics) *MultiMetrics {
counters := []metrics.Counter{}
histograms := []metrics.Histogram{}
retryCounters := []metrics.Counter{}

for _, m := range manyMetrics {
counters = append(counters, m.getReqsCounter())
histograms = append(histograms, m.getReqDurationHistogram())
retryCounters = append(retryCounters, m.getRetryCounter())
}

var mm MultiMetrics

mm.wrappedMetrics = &manyMetrics
mm.reqsCounter = multi.NewCounter(counters...)
mm.reqDurationHistogram = multi.NewHistogram(histograms...)
mm.retryCounter = multi.NewCounter(retryCounters...)

return &mm
}

func (mm *MultiMetrics) getReqsCounter() metrics.Counter {
return mm.reqsCounter
}

func (mm *MultiMetrics) getReqDurationHistogram() metrics.Histogram {
return mm.reqDurationHistogram
}

func (mm *MultiMetrics) getRetryCounter() metrics.Counter {
return mm.retryCounter
}

func (mm *MultiMetrics) getWrappedMetrics() *[]Metrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS this method is used nowhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you are correct... Same PR that added retryMetrics, removed the getWrappedMetrics() method. It was originally present on the initial Metrics interface. Will clean-up

return mm.wrappedMetrics
}

// MetricsWrapper is a Negroni compatible Handler which relies on a
// given Metrics implementation to expose and monitor Traefik Metrics.
type MetricsWrapper struct {
Expand Down