Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sp services add metrics #211

Merged
merged 10 commits into from
Mar 23, 2023
Merged

feat: sp services add metrics #211

merged 10 commits into from
Mar 23, 2023

Conversation

sysvm
Copy link
Collaborator

@sysvm sysvm commented Mar 16, 2023

Description

This pr adds some metrics method to monitor sp services.

Rationale

For better monitoring systems

Example

N/A

Changes

N/A

@sysvm sysvm added the wip Working in process label Mar 16, 2023
@@ -24,6 +24,8 @@ var (
BlockSyncerService = strings.ToLower("BlockSyncer")
// ManagerService defines the name of manager service
ManagerService = strings.ToLower("Manager")
// MetricsService defines the name of metrics service
MetricsService = strings.ToLower("Metrics")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the "Metrics" name more meaningful, like "MetricsMonitor", "MetricsReporter", or "MetricsCollector".

Copy link
Collaborator Author

@sysvm sysvm Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Metrics is better

@sysvm sysvm changed the title feat: sp service adds metrics feat: sp services add metrics Mar 17, 2023
@joeylichang joeylichang deleted the branch develop March 18, 2023 13:45
@sysvm sysvm reopened this Mar 19, 2023
@sysvm sysvm force-pushed the feat-metrics branch 2 times, most recently from a1e7f84 to 58868d1 Compare March 21, 2023 04:01
@@ -36,6 +36,8 @@ var (
utils.LogLevelFlag,
utils.LogPathFlag,
utils.LogStdOutputFlag,
utils.MetricsEnabledFlag,
utils.MetricsHTTPFlag,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supplementary update the run-book docs/run-book.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -15,48 +21,69 @@ var (
}
ConfigRemoteFlag = &cli.BoolFlag{
Name: "configremote",
Usage: "Flag load config from remote db,if 'configremote' be set, the db.user, " +
Usage: "Flag load config from remote db, if 'configremote' be set, the db.user, " +
"db.password and db.address flags are needed, otherwise use default value",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the default value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// MetricsHTTPFlag defines the endpoint for a stand-alone metrics HTTP endpoint.
MetricsHTTPFlag = &cli.StringFlag{
Name: "metrics.addr",
Usage: `Enable stand-alone metrics HTTP server listening interface`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistent with above, use double quotes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -10,6 +11,7 @@ receiver = "localhost:9533"
signer = "localhost:9633"
tasknode = "localhost:9433"
uploader = "localhost:9133"
metricsMonitor = "localhost:9833"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use metricsmonitor? which aligns with blocksyncer, metadata, and tasknode.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -235,3 +236,11 @@ func (cfg *StorageProviderConfig) MakeBlockSyncerConfig() (*tomlconfig.TomlConfi
},
}, nil
}

// MakeMetricsMonitorConfig make metrics monitor config from StorageProviderConfig
func (cfg StorageProviderConfig) MakeMetricsMonitorConfig() (*metrics.MetricsMonitorConfig, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfg *StorageProviderConfig.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}
if traceID != "" {
observer.(prometheus.ExemplarObserver).ObserveWithExemplar(
time.Since(now).Seconds(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the tradeoff here? I feel that the second is a bit big.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we find a tracingID, we can expose it as exemplar

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use microsecond replace second, which is more precise?

Copy link
Collaborator Author

@sysvm sysvm Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will improve this place when I add traceID for sp if we need more preciser time.

default:
log.Errorw("unknown service", "service", serviceName)
return nil, fmt.Errorf("unknown service: %s", serviceName)
}
return server, nil
}

// applyMetricConfig will use the config cli first
func applyMetricConfig(ctx *cli.Context, cfg *config.StorageProviderConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyMetricConfig -> intMetricConfig ?
and move intMetricConfig to init.go begin , after intlog?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not set address only set enable, should use default value?

Copy link
Collaborator Author

@sysvm sysvm Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, fixed

Usage: "log output standard io",
}
// Metrics flags
// MetricsEnabledFlag defines whether start metrics service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag Usage should have descripetion, so the note may be not need?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Usage: "Enable metrics collection and reporting",
Category: MetricsCategory,
}
// MetricsHTTPFlag defines the endpoint for a stand-alone metrics HTTP endpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

config/config.go Outdated
@@ -30,6 +31,7 @@ type StorageProviderConfig struct {
SignerCfg *signer.SignerConfig
BlockSyncerCfg *blocksyncer.Config
LogCfg *LogConfig
MetricsMonitorCfg *metrics.MetricsMonitorConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetricsMonitorCfg -> MetricsCfg?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -1,4 +1,5 @@
Service = ["gateway", "uploader", "downloader", "challenge", "tasknode", "receiver", "signer", "blocksyncer", "metadata", "manager"]
Service = ["gateway", "uploader", "downloader", "challenge", "tasknode", "receiver", "signer", "blocksyncer",
"metadata", "manager", "metricsmonitor"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think metricsmonitor are the same log, it is only start from flag?
if no other services start, only metricsmonitor is meaningless.
add metricsmonitor to Service will confuse users

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will improve this.

statusCode int
}

func (wd *responseWriterDelegator) Header() http.Header {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions starting with capital letters need to be annotated as exported functions

Copy link
Collaborator Author

@sysvm sysvm Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this place should not be lower case, because it implements the method of http.ResponseWriter: Header() Header, Write([]byte) (int, error), WriteHeader(statusCode int)

return wd.w.Header()
}

func (wd *responseWriterDelegator) Write(bytes []byte) (int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

// GaugeOption lets you add options to gauge metrics using With* functions.
type GaugeOption func(opts *prometheus.GaugeOpts)

type gaugeOptions []GaugeOption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gaugeOptions -> GaugeOptions

Copy link
Collaborator Author

@sysvm sysvm Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gaugeOptions should not be upper case, it cannot be used by external method

@@ -21,9 +23,11 @@ type UploaderClient struct {
// NewUploaderClient return an UploaderClient instance
func NewUploaderClient(address string) (*UploaderClient, error) {
conn, err := grpc.DialContext(context.Background(), address,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithChainUnaryInterceptor(openmetrics.UnaryClientInterceptor(metrics.DefaultGRPCClientMetrics)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not set metrics flags, the openmetrics also running and collecting metrics?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it still works, I'll improve this.

@@ -112,6 +113,7 @@ func (gateway *Gateway) Start(ctx context.Context) error {
// Serve starts http service.
func (gateway *Gateway) serve() {
router := mux.NewRouter().SkipClean(true)
router.Use(metrics.DefaultHTTPServerMetrics.InstrumentationHandler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to collect gateway metrics?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use global DefaultHTTPServerMetrics to collect metrics

@@ -87,6 +89,7 @@ func makeConfig(ctx *cli.Context) (*config.StorageProviderConfig, error) {
services := util.SplitByComma(ctx.String(utils.ServerFlag.Name))
cfg.Service = services
}
applyMetricConfig(ctx, cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move after initLog?
maybe applyMetricConfig(initMetricConfig) use log in the future?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll improve this

@sysvm sysvm added r4r Ready for review and removed wip Working in process labels Mar 22, 2023
@@ -31,7 +31,7 @@ var ConfigUploadCmd = &cli.Command{
utils.DBUserFlag,
utils.DBPasswordFlag,
utils.DBAddressFlag,
utils.DBDataBaseFlag,
utils.DBDatabaseFlag,
},
Category: "CONFIG COMMANDS",
Copy link
Collaborator Author

@sysvm sysvm Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these Category should be defined in the same file,do this in next pr

@@ -65,6 +66,10 @@ P2PPrivateKey = ""
Bootstrap = []
PingPeriod = 2

[MetricsMonitorCfg]
Copy link
Collaborator

@will-2012 will-2012 Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetricsMonitorCfg to MetricsCfg?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -45,6 +46,23 @@ func initLog(ctx *cli.Context, cfg *config.StorageProviderConfig) error {
return nil
}

// initMetricsConfig initializes metrics, this will use cli first
func initMetricsConfig(ctx *cli.Context, cfg *config.StorageProviderConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initMetricsConfig -> initMetrics?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -1,4 +1,5 @@
Service = ["gateway", "uploader", "downloader", "challenge", "tasknode", "receiver", "signer", "blocksyncer", "metadata", "manager"]
Service = ["gateway", "uploader", "downloader", "challenge", "tasknode", "receiver", "signer", "blocksyncer",
"metadata", "manager"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -21,9 +23,11 @@ type UploaderClient struct {
// NewUploaderClient return an UploaderClient instance
func NewUploaderClient(address string) (*UploaderClient, error) {
conn, err := grpc.DialContext(context.Background(), address,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithChainUnaryInterceptor(openmetrics.UnaryClientInterceptor(metrics.DefaultGRPCClientMetrics)),
grpc.WithChainStreamInterceptor(openmetrics.StreamClientInterceptor(metrics.DefaultGRPCClientMetrics)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not check metrics.Enabled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

uploader.grpcServer = grpc.NewServer(
grpc.ChainUnaryInterceptor(openmetrics.UnaryServerInterceptor(metrics.DefaultGRPCServerMetrics),
grpcrecovery.UnaryServerInterceptor(grpcrecovery.WithRecoveryHandler(gRPCPanicRecoveryHandler))),
grpc.ChainStreamInterceptor(openmetrics.StreamServerInterceptor(metrics.DefaultGRPCServerMetrics)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need check metrics.Enabled()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@joeylichang
Copy link
Contributor

LGTM

@joeylichang joeylichang merged commit 6fcfe19 into develop Mar 23, 2023
@joeylichang joeylichang deleted the feat-metrics branch March 23, 2023 13:08
will-2012 added a commit that referenced this pull request Mar 27, 2023
* fix: fix verify permission bug (#225)

* feat: merge release v0.0.4 fix to develop (#230)

* fix: fix route bug

* build: change signer wiltlist ip

* fix: fix https sync failed

* fix: router getUserBucketsRouterName path (#229)

* fix: router getUserBucketsRouterName path and update router comments

Co-authored-by: BarryTong65 <122767193+BarryTong65@users.noreply.github.com>
Co-authored-by: will-2012 <will.w@nodereal.io>

* ci: add gosec action (#231)

* feat: implement p2p protocol and rpc service (#221)

* feat: implement p2p protocol node and RPC service

* feat: p2p RPC server

* fix: config port conflict

* feat: get secondary SP approval from p2p server

* feat: verify approval in gateway

* fix: golang ci lint error

* fix: integration testing bugs

* feat: load p2p private key from ENV var

* feat: implement p2p protocol msg sign and verify

* feat: p2p protocol msg sign in signer service

* feat: signer client method for sign p2p protocol msg

* feat: signature verify in related modules

* fix: manager grace stop

* fix: object Id proto clone to deep copy panic

* feat: p2p create key pairs command

* feat: p2p config to local up script

* fix: local up script deploy p2p error

* fix: pr comments

* fix: change p2p config name for following the project specification

* chore: refine error code

* feat: sp services add metrics (#211)

* feat: sp service adds metrics

* fix: add metrics config

* refactor: refactor metrics service

* fix: update go.mod

* fix: fix config toml template

* chore: coding adjustments

* fix: golang ci lint error

---------

Co-authored-by: DylanYong <dylan.y@nodereal.io>
Co-authored-by: joeylichang <joeycli0919@gmail.com>

* feat: implement metadata payment apis (#235)

* feat: implement metadata payment apis

fix: update commnets and db schema

fix: update OutFlows type json

fix: update metadata

* fix: add the errDesciption and fix requestID

* fix: fix Id & CheckSums

* feat: update bucket id (#241)

* feat: metadata schema update

fix: update config

* fix: update SettleTimestamp and format imports

* fix: fix owner and end_block_number

* fix: adjust check condition to avoid panic (#243)

Co-authored-by: DylanYong <dylan.y@nodereal.io>

* feat: update the juno version (#244)

* feat: resource manager (#246)

* feat: resouce manager

* feat: add resource manager flags

* feat: add resource manager to dowmloader service

* feat: add resource manager to challenge service

* feat: add resource manager to task node service

* chore: rich log info for rcmgr

* feat: init global resource manager

* fix: integration testing bugs

* feat: return preparations error to client during put object

* chore: fix cr comments

* docs: release changlog for v0.0.5 (#249)

* docs: release changlog for v0.0.5

* build: update the greenfield chain version (#248)

* build: update the greenfield chain version

* feat:update juno version

* fix: update metadata for chain v0.0.10

fix: update metadata for chain v0.0.10

fix: update comments for bsdb schema

* fix: update from OneRowId to OneRowID

---------

Co-authored-by: constwz <changbohao30@gmail.com>
Co-authored-by: BarryTong65 <barrytong.work@gmail.com>

---------

Co-authored-by: joeycli <joeycli0919@gmail.com>
Co-authored-by: BarryTong65 <122767193+BarryTong65@users.noreply.github.com>
Co-authored-by: VM <112189277+sysvm@users.noreply.github.com>
Co-authored-by: DylanYong <dylan.y@nodereal.io>
Co-authored-by: krish-nr <krish.z@nodereal.io>
Co-authored-by: krish-z <122767080+krish-nr@users.noreply.github.com>
Co-authored-by: constwz <122766871+constwz@users.noreply.github.com>
Co-authored-by: dylanhuang <j75689@gmail.com>
Co-authored-by: constwz <changbohao30@gmail.com>
Co-authored-by: BarryTong65 <barrytong.work@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
r4r Ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants