Skip to content
This repository has been archived by the owner on Feb 27, 2023. It is now read-only.

Commit

Permalink
add integration-test and docs
Browse files Browse the repository at this point in the history
Signed-off-by: yeya24 <yb532204897@gmail.com>
  • Loading branch information
yeya24 committed Jul 17, 2019
1 parent 745a06b commit 82ccf80
Show file tree
Hide file tree
Showing 11 changed files with 258 additions and 33 deletions.
Binary file added docs/images/dragonfly_metrics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
114 changes: 114 additions & 0 deletions docs/user_guide/monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Monitor Dragonfly with Prometheus

Currently metrics become an important part of observability. As for monitoring Dragonfly, we recommend you to use Prometheus.

In Dragonfly project, there are two long-running process: supernode and dfdaemon. Both of these two components expose its metrics via `/metrics` endpoint, so Prometheus can get metrics from these two components. We will also support dfget metrics in the future. As for current metrics, you can check out this docs.

## How to set up Prometheus

### Setup Dragonfly Environment

First, please ensure you know how to setup Dragonfly environment. If you don't, you can check out this [quick_start](https://github.com/dragonflyoss/Dragonfly/blob/master/docs/quick_start/README.md) docs first. Besides, building from source code is ok.

``` bash
make build
# start supernode and dfdaemon
bin/linux_amd64/supernode --advertise-ip 127.0.0.1
bin/linux_amd64/dfdaemon
```

When supernode and dfdaemon is running normally, you can check metrics through command line.

check dfdaemon metrics:

``` bash
~ curl localhost:65001/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 10
```

check supernode metrics:

``` bash
~ curl localhost:8002/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 2.2854e-05
go_gc_duration_seconds{quantile="0.25"} 0.000150952
go_gc_duration_seconds{quantile="0.5"} 0.000155267
go_gc_duration_seconds{quantile="0.75"} 0.000171251
go_gc_duration_seconds{quantile="1"} 0.00018524
go_gc_duration_seconds_sum 0.000685564
go_gc_duration_seconds_count 5
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 8
```

If you can get the results above, it means your Dragonfly components work well. Next, we will start to setup Prometheus.

### Download Prometheus

[Download the release of Prometheus](https://prometheus.io/download/) for your platform, then extract and run it. Here we take Linux version as an example:

``` bash
wget https://github.com/prometheus/prometheus/releases/download/v2.11.1/prometheus-2.11.1.linux-amd64.tar.gz
tar -xvf prometheus-2.11.1.linux-amd64.tar.gz
cd prometheus-2.11.1.linux-amd64
```

Before starting using Prometheus, we should configure Prometheus first.

### Configure Prometheus

Here we provide a minimal-configuration below for monitoring Dragonfly. As for more detailed configuration, you can check [Prometheus Configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) for help.

```
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
scrape_configs:
- job_name: 'dragonfly'
static_configs:
- targets: ['localhost:8002', 'localhost:65001']
```

If you are not familiar with Prometheus, you can modify `prometheus.yml` to this configuration above. Here we don't use any alert rules and alertmanager, so these parts is unset. After modifying this file, you can validate it via `promtool`.

``` bash
./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: 0 rule files found
```

Finally you can start Prometheus in the same directory. If Prometheus works well, you can open your browser with localhost:9090 and see Prometheus web ui.

``` bash
./prometheus
```

### Get Dragonfly Metrics Using Prometheus

In Prometheus web ui, you can search Dragonfly metrics below. If you want to learn more about Prometheus query language, please check [promql](https://prometheus.io/docs/prometheus/latest/querying/basics/) for help.

![dragonfly_metrics.png](../images/dragonfly_metrics.png)
34 changes: 24 additions & 10 deletions supernode/server/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import (
type metrics struct {
requestCounter *prometheus.CounterVec
requestDuration *prometheus.HistogramVec
requestSize *prometheus.HistogramVec
responseSize *prometheus.HistogramVec
}

Expand All @@ -26,7 +27,7 @@ func newMetrics() *metrics {
Name: "http_requests_total",
Help: "Counter of HTTP requests.",
},
[]string{"handler", "code"},
[]string{"code", "handler", "method"},
),
requestDuration: promauto.NewHistogramVec(
prometheus.HistogramOpts{
Expand All @@ -36,7 +37,17 @@ func newMetrics() *metrics {
Help: "Histogram of latencies for HTTP requests.",
Buckets: []float64{.1, .2, .4, 1, 3, 8, 20, 60, 120},
},
[]string{"handler"},
[]string{"code", "handler", "method"},
),
requestSize: promauto.NewHistogramVec(
prometheus.HistogramOpts{
Namespace: config.Namespace,
Subsystem: config.Subsystem,
Name: "http_request_size_bytes",
Help: "Histogram of request size for HTTP requests.",
Buckets: prometheus.ExponentialBuckets(100, 10, 8),
},
[]string{"code", "handler", "method"},
),
responseSize: promauto.NewHistogramVec(
prometheus.HistogramOpts{
Expand All @@ -46,7 +57,7 @@ func newMetrics() *metrics {
Help: "Histogram of response size for HTTP requests.",
Buckets: prometheus.ExponentialBuckets(100, 10, 8),
},
[]string{"handler"},
[]string{"code", "handler", "method"},
),
}

Expand All @@ -55,13 +66,16 @@ func newMetrics() *metrics {

// instrumentHandler will update metrics for every http request
func (m *metrics) instrumentHandler(handlerName string, handler http.HandlerFunc) http.HandlerFunc {
return promhttp.InstrumentHandlerCounter(
m.requestCounter.MustCurryWith(prometheus.Labels{"handler": handlerName}),
promhttp.InstrumentHandlerDuration(
m.requestDuration.MustCurryWith(prometheus.Labels{"handler": handlerName}),
promhttp.InstrumentHandlerResponseSize(
m.responseSize.MustCurryWith(prometheus.Labels{"handler": handlerName}),
handler,
return promhttp.InstrumentHandlerDuration(
m.requestDuration.MustCurryWith(prometheus.Labels{"handler": handlerName}),
promhttp.InstrumentHandlerCounter(
m.requestCounter.MustCurryWith(prometheus.Labels{"handler": handlerName}),
promhttp.InstrumentHandlerRequestSize(
m.requestSize.MustCurryWith(prometheus.Labels{"handler": handlerName}),
promhttp.InstrumentHandlerResponseSize(
m.responseSize.MustCurryWith(prometheus.Labels{"handler": handlerName}),
handler,
),
),
),
)
Expand Down
25 changes: 12 additions & 13 deletions supernode/server/router.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@ import (
// versionMatcher defines to parse version url path.
const versionMatcher = "/v{version:[0-9.]+}"

// metricsRouter is a wrapper for mux.Router and metrics.
type metricsRouter struct {
router *mux.Router
metrics *metrics
}
var m = newMetrics()

func initRoute(s *Server) *metricsRouter {
func initRoute(s *Server) *mux.Router {
r := mux.NewRouter()
router := &metricsRouter{r, newMetrics()}
handlers := []*HandlerSpec{
// system
{Method: http.MethodGet, Path: "/_ping", HandlerFunc: s.ping},
Expand All @@ -42,23 +37,27 @@ func initRoute(s *Server) *metricsRouter {
{Method: http.MethodDelete, Path: "/peers/{id}", HandlerFunc: s.deRegisterPeer},
{Method: http.MethodGet, Path: "/peers/{id}", HandlerFunc: s.getPeer},
{Method: http.MethodGet, Path: "/peers", HandlerFunc: s.listPeers},

{Method: http.MethodGet, Path: "/metrics", HandlerFunc: handleMetrics},
}

// register API
for _, h := range handlers {
if h != nil {
r.Path(versionMatcher + h.Path).Methods(h.Method).Handler(router.metrics.instrumentHandler(versionMatcher+h.Path, filter(h.HandlerFunc)))
r.Path(h.Path).Methods(h.Method).Handler(router.metrics.instrumentHandler(h.Path, filter(h.HandlerFunc)))
r.Path(versionMatcher + h.Path).Methods(h.Method).Handler(m.instrumentHandler(h.Path, filter(h.HandlerFunc)))
r.Path(h.Path).Methods(h.Method).Handler(m.instrumentHandler(h.Path, filter(h.HandlerFunc)))
}
}

// metrics
r.Handle("/metrics", router.metrics.instrumentHandler("/metrics", promhttp.Handler().ServeHTTP))

if s.Config.Debug || s.Config.EnableProfiler {
r.PathPrefix("/debug/pprof/").HandlerFunc(pprof.Index)
}
return router
return r
}

func handleMetrics(ctx context.Context, rw http.ResponseWriter, req *http.Request) (err error) {
promhttp.Handler().ServeHTTP(rw, req)
return nil
}

func filter(handler Handler) http.HandlerFunc {
Expand Down
7 changes: 4 additions & 3 deletions supernode/server/router_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package server

import (
"encoding/json"
"github.com/gorilla/mux"
"io/ioutil"
"math/rand"
"net"
Expand Down Expand Up @@ -32,7 +33,7 @@ func init() {
type RouterTestSuite struct {
addr string
listener net.Listener
router *metricsRouter
router *mux.Router
}

func (rs *RouterTestSuite) SetUpSuite(c *check.C) {
Expand Down Expand Up @@ -63,7 +64,7 @@ func (rs *RouterTestSuite) SetUpSuite(c *check.C) {
rs.router = initRoute(s)
rs.listener, err = net.Listen("tcp", rs.addr)
c.Check(err, check.IsNil)
go http.Serve(rs.listener, rs.router.router)
go http.Serve(rs.listener, rs.router)
}

func (rs *RouterTestSuite) TearDownSuite(c *check.C) {
Expand Down Expand Up @@ -119,7 +120,7 @@ func (rs *RouterTestSuite) TestHTTPMetrics(c *check.C) {
c.Check(err, check.IsNil)
c.Assert(code, check.Equals, 200)

counter := rs.router.metrics.requestCounter
counter := m.requestCounter
c.Assert(1, check.Equals,
int(prom_testutil.ToFloat64(counter.WithLabelValues("/metrics", strconv.Itoa(http.StatusOK)))))

Expand Down
2 changes: 1 addition & 1 deletion supernode/server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ func (s *Server) Start() error {
}

server := &http.Server{
Handler: router.router,
Handler: router,
ReadTimeout: time.Minute * 10,
ReadHeaderTimeout: time.Minute * 10,
IdleTimeout: time.Minute * 10,
Expand Down
72 changes: 72 additions & 0 deletions test/api_metrics_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
package main

import (
"fmt"

"github.com/dragonflyoss/Dragonfly/test/command"
"github.com/dragonflyoss/Dragonfly/test/request"

"github.com/go-check/check"
)

// APIMetricsSuite is the test suite for Prometheus metrics.
type APIMetricsSuite struct {
starter *command.Starter
}

func init() {
check.Suite(&APIMetricsSuite{})
}

// SetUpSuite does common setup in the beginning of each test.
func (s *APIMetricsSuite) SetUpSuite(c *check.C) {
s.starter = command.NewStarter("SupernodeMetricsTestSuite")
if _, err := s.starter.Supernode(0); err != nil {
panic(fmt.Sprintf("start supernode failed:%v", err))
}
}

func (s *APIMetricsSuite) TearDownSuite(c *check.C) {
s.starter.Clean()
}

// TestMetrics tests /metrics API.
func (s *APIMetricsSuite) TestMetrics(c *check.C) {
resp, err := request.Get("/metrics")
c.Assert(err, check.IsNil)
defer resp.Body.Close()

CheckRespStatus(c, resp, 200)
}

// TestMetricsRequestTotal tests http-related metrics.
func (s *APIMetricsSuite) TestHttpMetrics(c *check.C) {
requestCounter := `dragonfly_supernode_http_requests_total{code="%d",handler="%s",method="%s"}`
responseSizeSum := `dragonfly_supernode_http_response_size_bytes_sum{code="%d",handler="%s",method="%s"}`
responseSizeCount := `dragonfly_supernode_http_response_size_bytes_count{code="%d",handler="%s",method="%s"}`
requestSizeCount := `dragonfly_supernode_http_request_size_bytes_count{code="%d",handler="%s",method="%s"}`

resp, err := request.Get("/_ping")
c.Assert(err, check.IsNil)
CheckRespStatus(c, resp, 200)

// Get httpRequest counter value equals 1.
pingTimes, found := GetMetricValue(c, fmt.Sprintf(requestCounter, 200, "/_ping", "get"))
c.Assert(found, check.Equals, true)
c.Assert(pingTimes, check.Equals, float64(1))

// Get httpResponse size sum value equals 2.
responseBytes, found := GetMetricValue(c, fmt.Sprintf(responseSizeSum, 200, "/_ping", "get"))
c.Assert(found, check.Equals, true)
c.Assert(responseBytes, check.Equals, float64(2))

// Get httpResponse size count value equals 1.
responseCount, found := GetMetricValue(c, fmt.Sprintf(responseSizeCount, 200, "/_ping", "get"))
c.Assert(found, check.Equals, true)
c.Assert(responseCount, check.Equals, float64(1))

// Get httpRequest size count value equals 1.
requestCount, found := GetMetricValue(c, fmt.Sprintf(requestSizeCount, 200, "/_ping", "get"))
c.Assert(found, check.Equals, true)
c.Assert(requestCount, check.Equals, float64(1))
}
6 changes: 3 additions & 3 deletions test/api_ping_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ func init() {
check.Suite(&APIPingSuite{})
}

// SetUpTest does common setup in the beginning of each test.
func (s *APIPingSuite) SetUpTest(c *check.C) {
// SetUpSuite does common setup in the beginning of each test.
func (s *APIPingSuite) SetUpSuite(c *check.C) {
s.starter = command.NewStarter("SupernodeAPITestSuite")
if _, err := s.starter.Supernode(0); err != nil {
panic(fmt.Sprintf("start supernode failed:%v", err))
Expand All @@ -30,7 +30,7 @@ func (s *APIPingSuite) TearDownSuite(c *check.C) {
s.starter.Clean()
}

// TestPing tests /info API.
// TestPing tests /_ping API.
func (s *APIPingSuite) TestPing(c *check.C) {
resp, err := request.Get("/_ping")
c.Assert(err, check.IsNil)
Expand Down
4 changes: 2 additions & 2 deletions test/cli_dfget_p2p_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ import (
. "path/filepath"
"time"

"github.com/go-check/check"

"github.com/dragonflyoss/Dragonfly/test/command"
"github.com/dragonflyoss/Dragonfly/test/environment"

"github.com/go-check/check"
)

func init() {
Expand Down
2 changes: 1 addition & 1 deletion test/request/request.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ func Delete(endpoint string, opts ...Option) (*http.Response, error) {

// Debug sends request to the default pouchd server to get the debug info.
//
// NOTE: without any vesion information.
// NOTE: without any version information.
func Debug(endpoint string) (*http.Response, error) {
apiClient, err := newAPIClient(environment.DragonflyAddress, environment.TLSConfig)
if err != nil {
Expand Down
Loading

0 comments on commit 82ccf80

Please sign in to comment.