Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Conditionally emit metrics based on enablement #19903

Merged
merged 17 commits into from
Apr 11, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@ Every module contains its own CHANGELOG.md. Please refer to the module you are i

### Improvements

* (telemetry) [#19903](https://github.com/cosmos/cosmos-sdk/pull/19903) Conditionally emit metrics based on enablement.
* **Introduction of `Now` Function**: Added a new function called `Now` to the telemetry package. It returns the current system time if telemetry is enabled, or a zero time if telemetry is not enabled.
* **Atomic Global Variable**: Implemented an atomic global variable to manage the state of telemetry's enablement. This ensures thread safety for the telemetry state.
* **Conditional Telemetry Emission**: All telemetry functions have been updated to emit metrics only when telemetry is enabled. They perform a check with `isTelemetryEnabled()` and return early if telemetry is disabled, minimizing unnecessary operations and overhead.
* (types) [#19869](https://github.com/cosmos/cosmos-sdk/pull/19869) Removed `Any` type from `codec/types` and replaced it with an alias for `cosmos/gogoproto/types/any`.
* (server) [#19854](https://github.com/cosmos/cosmos-sdk/pull/19854) Add customizability to start command.
* Add `StartCmdOptions` in `server.AddCommands` instead of `servertypes.ModuleInitFlags`. To set custom flags set them in the `StartCmdOptions` struct on the `AddFlags` field.
Expand Down
3 changes: 2 additions & 1 deletion baseapp/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,8 @@ func (app *BaseApp) Query(_ context.Context, req *abci.RequestQuery) (resp *abci

telemetry.IncrCounter(1, "query", "count")
telemetry.IncrCounter(1, "query", req.Path)
defer telemetry.MeasureSince(time.Now(), req.Path)
start := telemetry.Now()
defer telemetry.MeasureSince(start, req.Path)

if req.Path == QueryPathBroadcastTx {
return sdkerrors.QueryResult(errorsmod.Wrap(sdkerrors.ErrInvalidRequest, "can't route a broadcast tx message"), app.trace), nil
Expand Down
17 changes: 17 additions & 0 deletions telemetry/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"encoding/json"
"fmt"
"net/http"
"sync/atomic"
"time"

"github.com/hashicorp/go-metrics"
Expand All @@ -14,6 +15,21 @@ import (
"github.com/prometheus/common/expfmt"
)

// globalTelemetryEnabled is a private variable that stores the telemetry enabled state.
// It is set on initialization and does not change for the lifetime of the program.
var globalTelemetryEnabled atomic.Bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good use of global var but why do you use an atomic bool when it is not modified after init? Could be a simple bool.


// initTelemetry sets the global variable based on the configuration.
// It is called only once, at startup, to set the telemetry enabled state.
func initTelemetry(enabled bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is called only once (beside tests). No need for a function IMHO

globalTelemetryEnabled.Store(enabled)
}

// isTelemetryEnabled provides controlled access to check if telemetry is enabled.
func isTelemetryEnabled() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personal preference: with a standard bool, this method can be inlined. If you want to provide the information to other packages or modules, it can make sense to make this public though.

return globalTelemetryEnabled.Load()
}

// globalLabels defines the set of global labels that will be applied to all
// metrics emitted using the telemetry package function wrappers.
var globalLabels = []metrics.Label{}
Expand Down Expand Up @@ -95,6 +111,7 @@ type GatherResponse struct {

// New creates a new instance of Metrics
func New(cfg Config) (_ *Metrics, rerr error) {
initTelemetry(cfg.Enabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inline the function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note: the constructor is called by the server/start.go only when cfg.Telemetry is enabled. Any solution must work without this constructor being called or the file must be edited.

if !cfg.Enabled {
return nil, nil
}
Expand Down
37 changes: 37 additions & 0 deletions telemetry/wrapper.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ func NewLabel(name, value string) metrics.Label {
// metric for a module with a given set of keys. If any global labels are defined,
// they will be added to the module label.
func ModuleMeasureSince(module string, start time.Time, keys ...string) {
if !isTelemetryEnabled() {
return
}

metrics.MeasureSinceWithLabels(
keys,
start.UTC(),
Expand All @@ -35,6 +39,10 @@ func ModuleMeasureSince(module string, start time.Time, keys ...string) {
// module with a given set of keys. If any global labels are defined, they will
// be added to the module label.
func ModuleSetGauge(module string, val float32, keys ...string) {
if !isTelemetryEnabled() {
return
}

metrics.SetGaugeWithLabels(
keys,
val,
Expand All @@ -45,29 +53,58 @@ func ModuleSetGauge(module string, val float32, keys ...string) {
// IncrCounter provides a wrapper functionality for emitting a counter metric with
// global labels (if any).
func IncrCounter(val float32, keys ...string) {
if !isTelemetryEnabled() {
return
}

metrics.IncrCounterWithLabels(keys, val, globalLabels)
}

// IncrCounterWithLabels provides a wrapper functionality for emitting a counter
// metric with global labels (if any) along with the provided labels.
func IncrCounterWithLabels(keys []string, val float32, labels []metrics.Label) {
if !isTelemetryEnabled() {
return
}

metrics.IncrCounterWithLabels(keys, val, append(labels, globalLabels...))
}

// SetGauge provides a wrapper functionality for emitting a gauge metric with
// global labels (if any).
func SetGauge(val float32, keys ...string) {
if !isTelemetryEnabled() {
return
}

metrics.SetGaugeWithLabels(keys, val, globalLabels)
}

// SetGaugeWithLabels provides a wrapper functionality for emitting a gauge
// metric with global labels (if any) along with the provided labels.
func SetGaugeWithLabels(keys []string, val float32, labels []metrics.Label) {
if !isTelemetryEnabled() {
return
}

metrics.SetGaugeWithLabels(keys, val, append(labels, globalLabels...))
}

// MeasureSince provides a wrapper functionality for emitting a a time measure
// metric with global labels (if any).
func MeasureSince(start time.Time, keys ...string) {
if !isTelemetryEnabled() {
return
}

metrics.MeasureSinceWithLabels(keys, start.UTC(), globalLabels)
}

// Now return the current time if telemetry is enabled or a zero time if it's not
func Now() time.Time {
if !isTelemetryEnabled() {
return time.Time{}
}

return time.Now()
}
54 changes: 54 additions & 0 deletions telemetry/wrapper_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
package telemetry

import (
"sync"
"testing"
"time"

"github.com/stretchr/testify/suite"
)

// TelemetrySuite is a struct that holds the setup for the telemetry tests.
// It includes a mutex to ensure that tests that depend on the global state
// do not run in parallel, which can cause race conditions and unpredictable results.
type TelemetrySuite struct {
suite.Suite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personal preference: the suite type adds a lot of complexity and boiler plate code to the test cases compared to vanilla go + testify asserts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share this preference.

mu sync.Mutex
}

// SetupTest is called before each test to reset the global state to a known disabled state.
// This ensures each test starts with the telemetry disabled
func (suite *TelemetrySuite) SetupTest() {
initTelemetry(false)
}

// TestNow tests the Now function when telemetry is enabled and disabled.
func (suite *TelemetrySuite) TestNow() {
suite.mu.Lock()
defer suite.mu.Unlock()

initTelemetry(true)
telemetryTime := Now()
suite.NotEqual(time.Time{}, telemetryTime, "Now() should not return zero time when telemetry is enabled")

initTelemetry(false)
telemetryTime = Now()
suite.Equal(time.Time{}, telemetryTime, "Now() should return zero time when telemetry is disabled")
}

// TestIsTelemetryEnabled tests the isTelemetryEnabled function.
func (suite *TelemetrySuite) TestIsTelemetryEnabled() {
suite.mu.Lock()
defer suite.mu.Unlock()

initTelemetry(true)
suite.True(isTelemetryEnabled(), "isTelemetryEnabled() should return true when globalTelemetryEnabled is set to true")

initTelemetry(false)
suite.False(isTelemetryEnabled(), "isTelemetryEnabled() should return false when globalTelemetryEnabled is set to false")
}

// TestTelemetrySuite initiates the test suite.
func TestTelemetrySuite(t *testing.T) {
suite.Run(t, new(TelemetrySuite))
}
3 changes: 1 addition & 2 deletions x/circuit/module.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ import (
"context"
"encoding/json"
"fmt"
"time"

gwruntime "github.com/grpc-ecosystem/grpc-gateway/runtime"
"google.golang.org/grpc"
Expand Down Expand Up @@ -93,7 +92,7 @@ func (am AppModule) ValidateGenesis(bz json.RawMessage) error {

// InitGenesis performs genesis initialization for the circuit module.
func (am AppModule) InitGenesis(ctx context.Context, data json.RawMessage) error {
start := time.Now()
start := telemetry.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from using time.Now() to telemetry.Now() in the InitGenesis function aligns with the PR's objective to optimize telemetry operations. However, it would be beneficial to add a comment explaining the reason for this change, enhancing code maintainability.

+	// Using telemetry.Now() to optimize performance when telemetry is disabled
	start := telemetry.Now()

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
start := telemetry.Now()
// Using telemetry.Now() to optimize performance when telemetry is disabled
start := telemetry.Now()

var genesisState types.GenesisState
if err := am.cdc.UnmarshalJSON(data, &genesisState); err != nil {
return err
Expand Down
4 changes: 2 additions & 2 deletions x/crisis/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ package crisis

import (
"context"
"time"

"github.com/cosmos/cosmos-sdk/telemetry"
sdk "github.com/cosmos/cosmos-sdk/types"
Expand All @@ -12,7 +11,8 @@ import (

// check all registered invariants
func EndBlocker(ctx context.Context, k keeper.Keeper) {
defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyEndBlocker)
start := telemetry.Now()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would take the time here and pass it into the defer later on, is that expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the comment! if time.Now() is called within the defer statement, it takes the timestamp at the point where the defer is declared, not when the function exits. This leads to an inaccurate duration measurement because it doesn't reflect the actual time span of the operations we're interested in measuring, I realized that from @alexanderbez comment (here). By assigning time.Now() to start outside of defer, we ensure that we're measuring the duration of the function's operations accurately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement is not correct. The parameters are evaluated when the function is deferred already and not on execution. See demo or stackoverflow
Inlined or not is personal preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 the parameters are evaluated and closed over when the function is deferred, not executed, I prefer the prior syntax (not using a local var) since it's fewer LoC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you both! So I'll use the prior syntax!

defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyEndBlocker)

sdkCtx := sdk.UnwrapSDKContext(ctx)
if k.InvCheckPeriod() == 0 || sdkCtx.BlockHeight()%int64(k.InvCheckPeriod()) != 0 {
Expand Down
3 changes: 1 addition & 2 deletions x/crisis/module.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ import (
"context"
"encoding/json"
"fmt"
"time"

"github.com/spf13/cobra"
"google.golang.org/grpc"
Expand Down Expand Up @@ -118,7 +117,7 @@ func (am AppModule) ValidateGenesis(bz json.RawMessage) error {

// InitGenesis performs genesis initialization for the crisis module.
func (am AppModule) InitGenesis(ctx context.Context, data json.RawMessage) error {
start := time.Now()
start := telemetry.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing time.Now() with telemetry.Now() in the InitGenesis function aligns with the PR's goal of optimizing telemetry operations. Adding a comment to explain this choice would enhance code maintainability.

+	// Using telemetry.Now() to optimize performance when telemetry is disabled
	start := telemetry.Now()

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
start := telemetry.Now()
// Using telemetry.Now() to optimize performance when telemetry is disabled
start := telemetry.Now()

var genesisState types.GenesisState
if err := am.cdc.UnmarshalJSON(data, &genesisState); err != nil {
return err
Expand Down
5 changes: 2 additions & 3 deletions x/distribution/keeper/abci.go
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
package keeper

import (
"time"

"cosmossdk.io/x/distribution/types"

"github.com/cosmos/cosmos-sdk/telemetry"
Expand All @@ -13,7 +11,8 @@ import (
// and distribute rewards for the previous block.
// TODO: use context.Context after including the comet service
func (k Keeper) BeginBlocker(ctx sdk.Context) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)

// determine the total power signing the block
var previousTotalPower int64
Expand Down
4 changes: 2 additions & 2 deletions x/evidence/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ package keeper
import (
"context"
"fmt"
"time"

"cosmossdk.io/core/comet"
"cosmossdk.io/x/evidence/types"
Expand All @@ -15,7 +14,8 @@ import (
// BeginBlocker iterates through and handles any newly discovered evidence of
// misbehavior submitted by CometBFT. Currently, only equivocation is handled.
func (k Keeper) BeginBlocker(ctx context.Context) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)

bi := sdk.UnwrapSDKContext(ctx).CometInfo()

Expand Down
3 changes: 2 additions & 1 deletion x/gov/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ import (

// EndBlocker is called every block.
func (k Keeper) EndBlocker(ctx context.Context) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyEndBlocker)
start := telemetry.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The adjustment to use telemetry.Now() at the start of the EndBlocker function is a good optimization for telemetry performance. Adding a comment to explain this choice would be helpful for future code maintainers.

+	// Using telemetry.Now() to optimize performance when telemetry is disabled
	start := telemetry.Now()

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
start := telemetry.Now()
// Using telemetry.Now() to optimize performance when telemetry is disabled
start := telemetry.Now()

defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyEndBlocker)

logger := k.Logger()
// delete dead proposals from store and returns theirs deposits.
Expand Down
4 changes: 2 additions & 2 deletions x/mint/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ package keeper

import (
"context"
"time"

"cosmossdk.io/core/event"
"cosmossdk.io/x/mint/types"
Expand All @@ -13,7 +12,8 @@ import (

// BeginBlocker mints new tokens for the previous block.
func (k Keeper) BeginBlocker(ctx context.Context, ic types.InflationCalculationFn) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)

// fetch stored minter & params
minter, err := k.Minter.Get(ctx)
Expand Down
4 changes: 2 additions & 2 deletions x/slashing/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ package slashing

import (
"context"
"time"

"cosmossdk.io/x/slashing/keeper"
"cosmossdk.io/x/slashing/types"
Expand All @@ -14,7 +13,8 @@ import (
// BeginBlocker check for infraction evidence or downtime of validators
// on every begin block
func BeginBlocker(ctx context.Context, k keeper.Keeper) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)

// Iterate over all the validators which *should* have signed this block
// store whether or not they have actually signed it and slash/unbond any
Expand Down
7 changes: 4 additions & 3 deletions x/staking/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ package keeper

import (
"context"
"time"

"cosmossdk.io/core/appmodule"
"cosmossdk.io/x/staking/types"
Expand All @@ -13,12 +12,14 @@ import (
// BeginBlocker will persist the current header and validator set as a historical entry
// and prune the oldest entry based on the HistoricalEntries parameter
func (k *Keeper) BeginBlocker(ctx context.Context) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)
return k.TrackHistoricalInfo(ctx)
}

// EndBlocker called at every block, update validator set
func (k *Keeper) EndBlocker(ctx context.Context) ([]appmodule.ValidatorUpdate, error) {
defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyEndBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyEndBlocker)
return k.BlockValidatorUpdates(ctx)
}
4 changes: 2 additions & 2 deletions x/upgrade/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ import (
"context"
"errors"
"fmt"
"time"

storetypes "cosmossdk.io/store/types"
"cosmossdk.io/x/upgrade/types"
Expand All @@ -22,7 +21,8 @@ import (
// a migration to be executed if needed upon this switch (migration defined in the new binary)
// skipUpgradeHeightArray is a set of block heights for which the upgrade must be skipped
func (k Keeper) PreBlocker(ctx context.Context) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update to use telemetry.Now() instead of time.Now() in the PreBlocker function is a good optimization when telemetry is disabled. Consider adding a comment to explain this choice for future code maintainers.

+	// Using telemetry.Now() to optimize performance when telemetry is disabled
	start := telemetry.Now()

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
start := telemetry.Now()
// Using telemetry.Now() to optimize performance when telemetry is disabled
start := telemetry.Now()

defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)

blockHeight := k.environment.HeaderService.GetHeaderInfo(ctx).Height
plan, err := k.GetUpgradePlan(ctx)
Expand Down
Loading