Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): Monitor and publish metrics to Prometheus. #1437

Merged
merged 11 commits into from
Mar 10, 2021
2 changes: 2 additions & 0 deletions cmd/gossamer/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -575,6 +575,8 @@ func setDotNetworkConfig(ctx *cli.Context, tomlCfg ctoml.NetworkConfig, cfg *dot
cfg.NoMDNS = true
}

cfg.PublishMetrics = ctx.Bool("publish-metrics")

logger.Debug(
"network configuration",
"port", cfg.Port,
Expand Down
9 changes: 9 additions & 0 deletions cmd/gossamer/flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,12 @@ var (
Name: "memprof",
Usage: "File to write memory profile to",
}

// PublishMetrics publishes node metrics to prometheus.
PublishMetrics = cli.BoolFlag{
Name: "publish-metrics",
Usage: "publish node metrics",
}
)

// Initialization-only flags
Expand Down Expand Up @@ -263,6 +269,9 @@ var (
WSFlag,
WSExternalFlag,
WSPortFlag,

// metrics flag
PublishMetrics,
}
)

Expand Down
1 change: 0 additions & 1 deletion cmd/gossamer/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ import (
"github.com/ChainSafe/gossamer/dot"
"github.com/ChainSafe/gossamer/lib/keystore"
"github.com/ChainSafe/gossamer/lib/utils"

log "github.com/ChainSafe/log15"
"github.com/urfave/cli"
)
Expand Down
15 changes: 8 additions & 7 deletions dot/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -75,13 +75,14 @@ type AccountConfig struct {

// NetworkConfig is to marshal/unmarshal toml network config vars
type NetworkConfig struct {
Port uint32
Bootnodes []string
ProtocolID string
NoBootstrap bool
NoMDNS bool
MinPeers int
MaxPeers int
Port uint32
Bootnodes []string
ProtocolID string
NoBootstrap bool
NoMDNS bool
MinPeers int
MaxPeers int
PublishMetrics bool
}

// CoreConfig is to marshal/unmarshal toml core config vars
Expand Down
2 changes: 2 additions & 0 deletions dot/network/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,8 @@ type Config struct {

// privateKey the private key for the network p2p identity
privateKey crypto.PrivKey

PublishMetrics bool
}

// build checks the configuration, sets up the private key for the network service,
Expand Down
60 changes: 60 additions & 0 deletions dot/network/metrics.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
package network

import (
"runtime"
"time"

"github.com/ethereum/go-ethereum/metrics"
)

const (
refresh = time.Second * 10
refreshFreq = int64(refresh / time.Second)
)

// CollectProcessMetrics periodically collects various metrics about the running process.
func CollectProcessMetrics() {
arijitAD marked this conversation as resolved.
Show resolved Hide resolved
metrics.Enabled = true
// Create the various data collectors
cpuStats := make([]*metrics.CPUStats, 2)
memStats := make([]*runtime.MemStats, 2)
for i := 0; i < len(memStats); i++ {
cpuStats[i] = new(metrics.CPUStats)
memStats[i] = new(runtime.MemStats)
}

// Define the various metrics to collect
var (
cpuSysLoad = metrics.GetOrRegisterGauge("system/cpu/sysload", metrics.DefaultRegistry)
cpuSysWait = metrics.GetOrRegisterGauge("system/cpu/syswait", metrics.DefaultRegistry)
cpuProcLoad = metrics.GetOrRegisterGauge("system/cpu/procload", metrics.DefaultRegistry)
cpuGoroutines = metrics.GetOrRegisterGauge("system/cpu/goroutines", metrics.DefaultRegistry)

memPauses = metrics.GetOrRegisterMeter("system/memory/pauses", metrics.DefaultRegistry)
memAlloc = metrics.GetOrRegisterMeter("system/memory/allocs", metrics.DefaultRegistry)
memFrees = metrics.GetOrRegisterMeter("system/memory/frees", metrics.DefaultRegistry)
memHeld = metrics.GetOrRegisterGauge("system/memory/held", metrics.DefaultRegistry)
memUsed = metrics.GetOrRegisterGauge("system/memory/used", metrics.DefaultRegistry)
)

// Iterate loading the different stats and updating the meters
for i := 1; ; i++ {
location1 := i % 2
location2 := (i - 1) % 2

metrics.ReadCPUStats(cpuStats[location1])
cpuSysLoad.Update((cpuStats[location1].GlobalTime - cpuStats[location2].GlobalTime) / refreshFreq)
cpuSysWait.Update((cpuStats[location1].GlobalWait - cpuStats[location2].GlobalWait) / refreshFreq)
cpuProcLoad.Update((cpuStats[location1].LocalTime - cpuStats[location2].LocalTime) / refreshFreq)
cpuGoroutines.Update(int64(runtime.NumGoroutine()))

runtime.ReadMemStats(memStats[location1])
memPauses.Mark(int64(memStats[location1].PauseTotalNs - memStats[location2].PauseTotalNs))
memAlloc.Mark(int64(memStats[location1].Mallocs - memStats[location2].Mallocs))
memFrees.Mark(int64(memStats[location1].Frees - memStats[location2].Frees))
memHeld.Update(int64(memStats[location1].HeapSys - memStats[location1].HeapReleased))
memUsed.Update(int64(memStats[location1].Alloc))

time.Sleep(refresh)
}
}
28 changes: 28 additions & 0 deletions dot/network/service.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import (

"github.com/ChainSafe/gossamer/lib/common"
"github.com/ChainSafe/gossamer/lib/services"
"github.com/ethereum/go-ethereum/metrics"

log "github.com/ChainSafe/log15"
libp2pnetwork "github.com/libp2p/go-libp2p-core/network"
Expand Down Expand Up @@ -238,10 +239,37 @@ func (s *Service) Start() error {

logger.Info("started network service", "supported protocols", s.host.protocols())

if s.cfg.PublishMetrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I missed it but I don't see this being set anywhere? eg in dot/services.go createNetworkService?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it. I missed it while refactoring the metrics package.

go s.collectNetworkMetrics()
}

go s.logPeerCount()
return nil
}

func (s *Service) collectNetworkMetrics() {
metrics.Enabled = true
for {
peerCount := metrics.GetOrRegisterGauge("network/node/peerCount", metrics.DefaultRegistry)
totalConn := metrics.GetOrRegisterGauge("network/node/totalConnection", metrics.DefaultRegistry)
networkLatency := metrics.GetOrRegisterGauge("network/node/latency", metrics.DefaultRegistry)
syncedBlocks := metrics.GetOrRegisterGauge("service/blocks/sync", metrics.DefaultRegistry)

peerCount.Update(int64(s.host.peerCount()))
totalConn.Update(int64(len(s.host.h.Network().Conns())))
networkLatency.Update(int64(s.host.h.Peerstore().LatencyEWMA(s.host.id())))

num, err := s.blockState.BestBlockNumber()
if err != nil {
syncedBlocks.Update(0)
} else {
syncedBlocks.Update(num.Int64())
}

time.Sleep(refresh)
}
}

func (s *Service) logPeerCount() {
for {
logger.Debug("peer count", "num", s.host.peerCount(), "min", s.cfg.MinPeers, "max", s.cfg.MaxPeers)
Expand Down
28 changes: 28 additions & 0 deletions dot/node.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ package dot

import (
"fmt"
"net/http"
"os"
"os/signal"
"path"
Expand All @@ -30,6 +31,8 @@ import (
"github.com/ChainSafe/gossamer/lib/genesis"
"github.com/ChainSafe/gossamer/lib/keystore"
"github.com/ChainSafe/gossamer/lib/services"
"github.com/ethereum/go-ethereum/metrics"
"github.com/ethereum/go-ethereum/metrics/prometheus"

"github.com/ChainSafe/chaindb"
log "github.com/ChainSafe/log15"
Expand Down Expand Up @@ -282,9 +285,34 @@ func NewNode(cfg *Config, ks *keystore.GlobalKeystore, stopFunc func()) (*Node,
node.Services.RegisterService(srvc)
}

if cfg.Network.PublishMetrics {
publishMetrics(cfg)
}

return node, nil
}

func publishMetrics(cfg *Config) {
address := fmt.Sprintf("%s:%d", metrics.DefaultConfig.HTTP, cfg.Network.Port+11)
log.Info("Enabling stand-alone metrics HTTP endpoint", "address", address)
setupMetricsServer(address)

// Start system runtime metrics collection
go network.CollectProcessMetrics()
}

// setupMetricsServer starts a dedicated metrics server at the given address.
func setupMetricsServer(address string) {
m := http.NewServeMux()
m.Handle("/metrics", prometheus.Handler(metrics.DefaultRegistry))
log.Info("Starting metrics server", "addr", fmt.Sprintf("http://%s/metrics", address))
go func() {
if err := http.ListenAndServe(address, m); err != nil {
log.Error("Failure in running metrics server", "err", err)
}
}()
}

// Start starts all dot node services
func (n *Node) Start() error {
logger.Info("🕸️ starting node services...")
Expand Down
23 changes: 12 additions & 11 deletions dot/services.go
Original file line number Diff line number Diff line change
Expand Up @@ -262,17 +262,18 @@ func createNetworkService(cfg *Config, stateSrvc *state.Service) (*network.Servi

// network service configuation
networkConfig := network.Config{
LogLvl: cfg.Log.NetworkLvl,
BlockState: stateSrvc.Block,
BasePath: cfg.Global.BasePath,
Roles: cfg.Core.Roles,
Port: cfg.Network.Port,
Bootnodes: cfg.Network.Bootnodes,
ProtocolID: cfg.Network.ProtocolID,
NoBootstrap: cfg.Network.NoBootstrap,
NoMDNS: cfg.Network.NoMDNS,
MinPeers: cfg.Network.MinPeers,
MaxPeers: cfg.Network.MaxPeers,
LogLvl: cfg.Log.NetworkLvl,
BlockState: stateSrvc.Block,
BasePath: cfg.Global.BasePath,
Roles: cfg.Core.Roles,
Port: cfg.Network.Port,
Bootnodes: cfg.Network.Bootnodes,
ProtocolID: cfg.Network.ProtocolID,
NoBootstrap: cfg.Network.NoBootstrap,
NoMDNS: cfg.Network.NoMDNS,
MinPeers: cfg.Network.MinPeers,
MaxPeers: cfg.Network.MaxPeers,
PublishMetrics: cfg.Network.PublishMetrics,
}

networkSrvc, err := network.NewService(&networkConfig)
Expand Down
19 changes: 6 additions & 13 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ require (
github.com/ChainSafe/go-schnorrkel v0.0.0-20210127175223-0f934d64ecac
github.com/ChainSafe/log15 v1.0.0
github.com/OneOfOne/xxhash v1.2.5
github.com/StackExchange/wmi v0.0.0-20190523213315-cbe66965904d // indirect
github.com/btcsuite/btcutil v1.0.2
github.com/bytecodealliance/wasmtime-go v0.20.0
github.com/centrifuge/go-substrate-rpc-client/v2 v2.0.1
Expand All @@ -13,18 +14,17 @@ require (
github.com/dgraph-io/badger/v2 v2.2007.2 // indirect
github.com/disiqueira/gotree v1.0.0
github.com/docker/docker v1.13.1
github.com/ethereum/go-ethereum v1.9.7
github.com/ethereum/go-ethereum v1.10.0
github.com/go-ole/go-ole v1.2.4 // indirect
github.com/go-playground/validator/v10 v10.4.1
github.com/golang/protobuf v1.4.2
github.com/golang/snappy v0.0.3-0.20201103224600-674baa8c7fc3 // indirect
github.com/golang/protobuf v1.4.3
github.com/gorilla/mux v1.7.4
github.com/gorilla/rpc v1.2.0
github.com/gorilla/websocket v1.4.2
github.com/gtank/merlin v0.1.1
github.com/ipfs/go-ds-badger2 v0.1.0
github.com/jcelliott/lumber v0.0.0-20160324203708-dd349441af25 // indirect
github.com/jpillora/ipfilter v1.2.2
github.com/kylelemons/godebug v1.1.0 // indirect
github.com/libp2p/go-libp2p v0.12.0
github.com/libp2p/go-libp2p-core v0.7.0
github.com/libp2p/go-libp2p-discovery v0.5.0
Expand All @@ -37,21 +37,14 @@ require (
github.com/mattn/go-isatty v0.0.11 // indirect
github.com/multiformats/go-multiaddr v0.3.1
github.com/nanobox-io/golang-scribble v0.0.0-20190309225732-aa3e7c118975
github.com/naoina/go-stringutil v0.1.0 // indirect
github.com/naoina/toml v0.1.2-0.20170918210437-9fafd6967416
github.com/onsi/ginkgo v1.14.0 // indirect
github.com/perlin-network/life v0.0.0-20191203030451-05c0e0f7eaea
github.com/stretchr/testify v1.6.1
github.com/urfave/cli v1.20.0
github.com/stretchr/testify v1.7.0
github.com/urfave/cli v1.22.1
github.com/wasmerio/go-ext-wasm v0.3.2-0.20200326095750-0a32be6068ec
golang.org/x/crypto v0.0.0-20201221181555-eec23a3978ad
golang.org/x/mod v0.1.1-0.20191209134235-331c550502dd // indirect
golang.org/x/net v0.0.0-20200822124328-c89045814202 // indirect
golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9 // indirect
golang.org/x/sys v0.0.0-20200824131525-c12d262b63d8 // indirect
golang.org/x/text v0.3.3 // indirect
golang.org/x/tools v0.0.0-20200221224223-e1da425f72fd // indirect
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
google.golang.org/protobuf v1.25.0
)

Expand Down
Loading