Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: support per-store IO metrics with fine granularity #119885

Merged
merged 3 commits into from
Mar 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 10 additions & 1 deletion docs/generated/metrics/metrics.html
Original file line number Diff line number Diff line change
Expand Up @@ -630,6 +630,15 @@
<tr><td>STORAGE</td><td>storage.compactions.keys.pinned.count</td><td>Cumulative count of storage engine KVs written to sstables during flushes and compactions due to open LSM snapshots.<br/><br/>Various subsystems of CockroachDB take LSM snapshots to maintain a consistent view<br/>of the database over an extended duration. In order to maintain the consistent view,<br/>flushes and compactions within the storage engine must preserve keys that otherwise<br/>would have been dropped. This increases write amplification, and introduces keys<br/>that must be skipped during iteration. This metric records the cumulative count of<br/>KVs preserved during flushes and compactions over the lifetime of the process.<br/></td><td>Keys</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.disk-slow</td><td>Number of instances of disk operations taking longer than 10s</td><td>Events</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.disk-stalled</td><td>Number of instances of disk operations taking longer than 20s</td><td>Events</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.disk.io.time</td><td>Time spent reading from or writing to the store&#39;s disk since this process started (as reported by the OS)</td><td>Time</td><td>GAUGE</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.disk.iopsinprogress</td><td>IO operations currently in progress on the store&#39;s disk (as reported by the OS)</td><td>Operations</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.disk.read.bytes</td><td>Bytes read from the store&#39;s disk since this process started (as reported by the OS)</td><td>Bytes</td><td>GAUGE</td><td>BYTES</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.disk.read.count</td><td>Disk read operations on the store&#39;s disk since this process started (as reported by the OS)</td><td>Operations</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.disk.read.time</td><td>Time spent reading from the store&#39;s disk since this process started (as reported by the OS)</td><td>Time</td><td>GAUGE</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.disk.weightedio.time</td><td>Weighted time spent reading from or writing to the store&#39;s disk since this process started (as reported by the OS)</td><td>Time</td><td>GAUGE</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.disk.write.bytes</td><td>Bytes written to the store&#39;s disk since this process started (as reported by the OS)</td><td>Bytes</td><td>GAUGE</td><td>BYTES</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.disk.write.count</td><td>Disk write operations on the store&#39;s disk since this process started (as reported by the OS)</td><td>Operations</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.disk.write.time</td><td>Time spent writing to the store&#39;s disks since this process started (as reported by the OS)</td><td>Time</td><td>GAUGE</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.flush.ingest.count</td><td>Flushes performing an ingest (flushable ingestions)</td><td>Flushes</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.flush.ingest.table.bytes</td><td>Bytes ingested via flushes (flushable ingestions)</td><td>Bytes</td><td>GAUGE</td><td>BYTES</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>storage.flush.ingest.table.count</td><td>Tables ingested via flushes (flushable ingestions)</td><td>Tables</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
Expand Down Expand Up @@ -1590,7 +1599,7 @@
<tr><td>SERVER</td><td>sys.host.disk.read.bytes</td><td>Bytes read from all disks since this process started (as reported by the OS)</td><td>Bytes</td><td>GAUGE</td><td>BYTES</td><td>AVG</td><td>NONE</td></tr>
<tr><td>SERVER</td><td>sys.host.disk.read.count</td><td>Disk read operations across all disks since this process started (as reported by the OS)</td><td>Operations</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>SERVER</td><td>sys.host.disk.read.time</td><td>Time spent reading from all disks since this process started (as reported by the OS)</td><td>Time</td><td>GAUGE</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>SERVER</td><td>sys.host.disk.weightedio.time</td><td>Weighted time spent reading from or writing to to all disks since this process started (as reported by the OS)</td><td>Time</td><td>GAUGE</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>SERVER</td><td>sys.host.disk.weightedio.time</td><td>Weighted time spent reading from or writing to all disks since this process started (as reported by the OS)</td><td>Time</td><td>GAUGE</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>SERVER</td><td>sys.host.disk.write.bytes</td><td>Bytes written to all disks since this process started (as reported by the OS)</td><td>Bytes</td><td>GAUGE</td><td>BYTES</td><td>AVG</td><td>NONE</td></tr>
<tr><td>SERVER</td><td>sys.host.disk.write.count</td><td>Disk write operations across all disks since this process started (as reported by the OS)</td><td>Operations</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>SERVER</td><td>sys.host.disk.write.time</td><td>Time spent writing to all disks since this process started (as reported by the OS)</td><td>Time</td><td>GAUGE</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
Expand Down
3 changes: 3 additions & 0 deletions pkg/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -613,6 +613,7 @@ ALL_TESTS = [
"//pkg/sql/types:types_test",
"//pkg/sql:sql_disallowed_imports_test",
"//pkg/sql:sql_test",
"//pkg/storage/disk:disk_test",
"//pkg/storage/enginepb:enginepb_test",
"//pkg/storage/fs:fs_test",
"//pkg/storage/metamorphic:metamorphic_test",
Expand Down Expand Up @@ -2225,6 +2226,8 @@ GO_TARGETS = [
"//pkg/sql/vtable:vtable",
"//pkg/sql:sql",
"//pkg/sql:sql_test",
"//pkg/storage/disk:disk",
"//pkg/storage/disk:disk_test",
"//pkg/storage/enginepb:enginepb",
"//pkg/storage/enginepb:enginepb_test",
"//pkg/storage/fs:fs",
Expand Down
3 changes: 3 additions & 0 deletions pkg/kv/kvserver/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ go_library(
"//pkg/spanconfig",
"//pkg/spanconfig/spanconfigstore",
"//pkg/storage",
"//pkg/storage/disk",
"//pkg/storage/enginepb",
"//pkg/storage/fs",
"//pkg/util",
Expand Down Expand Up @@ -462,6 +463,7 @@ go_test(
"//pkg/sql/sem/tree",
"//pkg/sql/sqlstats",
"//pkg/storage",
"//pkg/storage/disk",
"//pkg/storage/enginepb",
"//pkg/storage/fs",
"//pkg/testutils",
Expand Down Expand Up @@ -515,6 +517,7 @@ go_test(
"@com_github_cockroachdb_errors//oserror",
"@com_github_cockroachdb_logtags//:logtags",
"@com_github_cockroachdb_pebble//:pebble",
"@com_github_cockroachdb_pebble//vfs",
"@com_github_cockroachdb_redact//:redact",
"@com_github_dustin_go_humanize//:go-humanize",
"@com_github_gogo_protobuf//proto",
Expand Down
88 changes: 88 additions & 0 deletions pkg/kv/kvserver/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/raft/raftpb"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/storage"
"github.com/cockroachdb/cockroach/pkg/storage/disk"
"github.com/cockroachdb/cockroach/pkg/storage/enginepb"
"github.com/cockroachdb/cockroach/pkg/storage/fs"
"github.com/cockroachdb/cockroach/pkg/util/log"
Expand Down Expand Up @@ -2355,6 +2356,60 @@ Note that the measurement does not include the duration for replicating the eval
Measurement: "Batches",
Unit: metric.Unit_COUNT,
}
metaDiskReadCount = metric.Metadata{
Name: "storage.disk.read.count",
Unit: metric.Unit_COUNT,
Measurement: "Operations",
Help: "Disk read operations on the store's disk since this process started (as reported by the OS)",
}
metaDiskReadBytes = metric.Metadata{
Name: "storage.disk.read.bytes",
Unit: metric.Unit_BYTES,
Measurement: "Bytes",
Help: "Bytes read from the store's disk since this process started (as reported by the OS)",
}
metaDiskReadTime = metric.Metadata{
Name: "storage.disk.read.time",
Unit: metric.Unit_NANOSECONDS,
Measurement: "Time",
Help: "Time spent reading from the store's disk since this process started (as reported by the OS)",
}
metaDiskWriteCount = metric.Metadata{
Name: "storage.disk.write.count",
Unit: metric.Unit_COUNT,
Measurement: "Operations",
Help: "Disk write operations on the store's disk since this process started (as reported by the OS)",
}
metaDiskWriteBytes = metric.Metadata{
Name: "storage.disk.write.bytes",
Unit: metric.Unit_BYTES,
Measurement: "Bytes",
Help: "Bytes written to the store's disk since this process started (as reported by the OS)",
}
metaDiskWriteTime = metric.Metadata{
Name: "storage.disk.write.time",
Unit: metric.Unit_NANOSECONDS,
Measurement: "Time",
Help: "Time spent writing to the store's disks since this process started (as reported by the OS)",
}
metaDiskIOTime = metric.Metadata{
Name: "storage.disk.io.time",
Unit: metric.Unit_NANOSECONDS,
Measurement: "Time",
Help: "Time spent reading from or writing to the store's disk since this process started (as reported by the OS)",
}
metaDiskWeightedIOTime = metric.Metadata{
Name: "storage.disk.weightedio.time",
Unit: metric.Unit_NANOSECONDS,
Measurement: "Time",
Help: "Weighted time spent reading from or writing to the store's disk since this process started (as reported by the OS)",
}
metaIopsInProgress = metric.Metadata{
Name: "storage.disk.iopsinprogress",
Unit: metric.Unit_COUNT,
Measurement: "Operations",
Help: "IO operations currently in progress on the store's disk (as reported by the OS)",
}
)

// StoreMetrics is the set of metrics for a given store.
Expand Down Expand Up @@ -2750,6 +2805,17 @@ type StoreMetrics struct {

FlushUtilization *metric.GaugeFloat64
FsyncLatency *metric.ManualWindowHistogram

// Disk metrics
DiskReadBytes *metric.Gauge
DiskReadCount *metric.Gauge
DiskReadTime *metric.Gauge
DiskWriteBytes *metric.Gauge
DiskWriteCount *metric.Gauge
DiskWriteTime *metric.Gauge
DiskIOTime *metric.Gauge
DiskWeightedIOTime *metric.Gauge
IopsInProgress *metric.Gauge
}

type tenantMetricsRef struct {
Expand Down Expand Up @@ -3491,6 +3557,16 @@ func newStoreMetrics(histogramWindow time.Duration) *StoreMetrics {

ReplicaReadBatchDroppedLatchesBeforeEval: metric.NewCounter(metaReplicaReadBatchDroppedLatchesBeforeEval),
ReplicaReadBatchWithoutInterleavingIter: metric.NewCounter(metaReplicaReadBatchWithoutInterleavingIter),

DiskReadBytes: metric.NewGauge(metaDiskReadBytes),
DiskReadCount: metric.NewGauge(metaDiskReadCount),
DiskReadTime: metric.NewGauge(metaDiskReadTime),
DiskWriteBytes: metric.NewGauge(metaDiskWriteBytes),
DiskWriteCount: metric.NewGauge(metaDiskWriteCount),
DiskWriteTime: metric.NewGauge(metaDiskWriteTime),
DiskIOTime: metric.NewGauge(metaDiskIOTime),
DiskWeightedIOTime: metric.NewGauge(metaDiskWeightedIOTime),
IopsInProgress: metric.NewGauge(metaIopsInProgress),
}

storeRegistry.AddMetricStruct(sm)
Expand Down Expand Up @@ -3700,6 +3776,18 @@ func (sm *StoreMetrics) updateEnvStats(stats fs.EnvStats) {
sm.EncryptionAlgorithm.Update(int64(stats.EncryptionType))
}

func (sm *StoreMetrics) updateDiskStats(stats disk.Stats) {
sm.DiskReadCount.Update(int64(stats.ReadsCount))
sm.DiskReadBytes.Update(int64(stats.BytesRead()))
sm.DiskReadTime.Update(int64(stats.ReadsDuration))
sm.DiskWriteCount.Update(int64(stats.WritesCount))
sm.DiskWriteBytes.Update(int64(stats.BytesWritten()))
sm.DiskWriteTime.Update(int64(stats.WritesDuration))
sm.DiskIOTime.Update(int64(stats.CumulativeDuration))
sm.DiskWeightedIOTime.Update(int64(stats.WeightedIODuration))
sm.IopsInProgress.Update(int64(stats.InProgressCount))
}

func (sm *StoreMetrics) handleMetricsResult(ctx context.Context, metric result.Metrics) {
sm.LeaseRequestSuccessCount.Inc(int64(metric.LeaseRequestSuccess))
metric.LeaseRequestSuccess = 0
Expand Down
13 changes: 13 additions & 0 deletions pkg/kv/kvserver/store.go
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/spanconfig"
"github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigstore"
"github.com/cockroachdb/cockroach/pkg/storage"
"github.com/cockroachdb/cockroach/pkg/storage/disk"
"github.com/cockroachdb/cockroach/pkg/util"
"github.com/cockroachdb/cockroach/pkg/util/admission"
"github.com/cockroachdb/cockroach/pkg/util/admission/admissionpb"
Expand Down Expand Up @@ -1094,6 +1095,9 @@ type Store struct {
spanConfigUpdateQueueRateLimiter *quotapool.RateLimiter

rangeFeedSlowClosedTimestampNudge *singleflight.Group

// diskMonitor provides metrics for the disk associated with this store.
diskMonitor *disk.Monitor
}

var _ kv.Sender = &Store{}
Expand Down Expand Up @@ -3363,6 +3367,15 @@ func (s *Store) computeMetrics(ctx context.Context) (m storage.Metrics, err erro
s.metrics.RdbCheckpoints.Update(int64(len(dirs)))
}

// Get disk stats for the disk associated with this store.
if s.diskMonitor != nil {
diskStats, err := s.diskMonitor.CumulativeStats()
if err != nil {
return m, err
}
s.metrics.updateDiskStats(diskStats)
}

return m, nil
}

Expand Down
30 changes: 30 additions & 0 deletions pkg/kv/kvserver/stores.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvadmission"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/storage"
"github.com/cockroachdb/cockroach/pkg/storage/disk"
"github.com/cockroachdb/cockroach/pkg/util/future"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
"github.com/cockroachdb/cockroach/pkg/util/log"
Expand Down Expand Up @@ -301,3 +302,32 @@ func (ls *Stores) updateBootstrapInfoLocked(bi *gossip.BootstrapInfo) error {
})
return err
}

// RegisterDiskMonitors injects a monitor into each store to track an individual disk's stats.
func (ls *Stores) RegisterDiskMonitors(
diskManager *disk.MonitorManager, diskPathToStore map[string]roachpb.StoreID,
) error {
monitors := make(map[roachpb.StoreID]disk.Monitor)
for path, id := range diskPathToStore {
monitor, err := diskManager.Monitor(path)
if err != nil {
return err
}
monitors[id] = *monitor
}
return ls.VisitStores(func(s *Store) error {
if monitor, ok := monitors[s.StoreID()]; ok {
s.diskMonitor = &monitor
}
return nil
})
}

func (ls *Stores) CloseDiskMonitors() {
_ = ls.VisitStores(func(s *Store) error {
if s.diskMonitor != nil {
s.diskMonitor.Close()
}
return nil
})
}
38 changes: 38 additions & 0 deletions pkg/kv/kvserver/stores_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,26 @@ package kvserver

import (
"context"
"path"
"reflect"
"strconv"
"testing"

"github.com/cockroachdb/cockroach/pkg/gossip"
"github.com/cockroachdb/cockroach/pkg/kv/kvpb"
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/logstore"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/storage"
"github.com/cockroachdb/cockroach/pkg/storage/disk"
"github.com/cockroachdb/cockroach/pkg/testutils"
"github.com/cockroachdb/cockroach/pkg/util"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/stop"
"github.com/cockroachdb/cockroach/pkg/util/timeutil"
"github.com/cockroachdb/errors"
"github.com/cockroachdb/pebble/vfs"
"github.com/stretchr/testify/require"
)

Expand Down Expand Up @@ -339,3 +344,36 @@ func TestStoresGossipStorageReadLatest(t *testing.T) {
t.Errorf("bootstrap info %+v not equal to expected %+v", verifyBI, bi)
}
}

func TestRegisterDiskMonitors(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

dir, dirCleanupFn := testutils.TempDir(t)
defer dirCleanupFn()

_, stores, ls, stopper := createStores(2)
defer stopper.Stop(context.Background())
defer ls.CloseDiskMonitors()

ls.AddStore(stores[0])
ls.AddStore(stores[1])

fs := vfs.Default
pathToStore := make(map[string]roachpb.StoreID, len(stores))
for i, store := range stores {
storePath := path.Join(dir, strconv.Itoa(i))
pathToStore[storePath] = store.StoreID()

_, err := fs.Create(storePath)
require.NoError(t, err)
require.Nil(t, store.diskMonitor)
}

diskManager := disk.NewMonitorManager(fs)
err := ls.RegisterDiskMonitors(diskManager, pathToStore)
require.NoError(t, err)
for _, store := range stores {
require.NotNil(t, store.diskMonitor)
}
}
3 changes: 3 additions & 0 deletions pkg/server/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,7 @@ go_library(
"//pkg/sql/ttl/ttljob",
"//pkg/sql/ttl/ttlschedule",
"//pkg/storage",
"//pkg/storage/disk",
"//pkg/storage/enginepb",
"//pkg/storage/fs",
"//pkg/testutils",
Expand Down Expand Up @@ -518,6 +519,7 @@ go_test(
"//pkg/sql/sessiondata",
"//pkg/sql/sqlstats",
"//pkg/storage",
"//pkg/storage/disk",
"//pkg/storage/fs",
"//pkg/testutils",
"//pkg/testutils/datapathutils",
Expand Down Expand Up @@ -556,6 +558,7 @@ go_test(
"@com_github_cockroachdb_datadriven//:datadriven",
"@com_github_cockroachdb_errors//:errors",
"@com_github_cockroachdb_logtags//:logtags",
"@com_github_cockroachdb_pebble//vfs",
"@com_github_cockroachdb_redact//:redact",
"@com_github_dustin_go_humanize//:go-humanize",
"@com_github_gogo_protobuf//jsonpb",
Expand Down
6 changes: 6 additions & 0 deletions pkg/server/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/settings/cluster"
"github.com/cockroachdb/cockroach/pkg/sql/sem/catconstants"
"github.com/cockroachdb/cockroach/pkg/storage"
"github.com/cockroachdb/cockroach/pkg/storage/disk"
"github.com/cockroachdb/cockroach/pkg/storage/enginepb"
"github.com/cockroachdb/cockroach/pkg/storage/fs"
"github.com/cockroachdb/cockroach/pkg/ts"
Expand All @@ -51,6 +52,7 @@ import (
"github.com/cockroachdb/errors"
"github.com/cockroachdb/pebble"
"github.com/cockroachdb/pebble/bloom"
"github.com/cockroachdb/pebble/vfs"
"github.com/cockroachdb/redact"
)

Expand Down Expand Up @@ -275,6 +277,9 @@ type BaseConfig struct {
// listeners. This is set by in-memory tenants if the user has
// specified port range preferences.
RPCListenerFactory RPCListenerFactory

// DiskMonitorManager provides metrics for individual disks.
DiskMonitorManager *disk.MonitorManager
}

// MakeBaseConfig returns a BaseConfig with default values.
Expand Down Expand Up @@ -324,6 +329,7 @@ func (cfg *BaseConfig) SetDefaults(
cfg.Config.InitDefaults()
cfg.InitTestingKnobs()
cfg.EarlyBootExternalStorageAccessor = cloud.NewEarlyBootExternalStorageAccessor(st, cfg.ExternalIODirConfig)
cfg.DiskMonitorManager = disk.NewMonitorManager(vfs.Default)
}

// InitTestingKnobs sets up any testing knobs based on e.g. envvars.
Expand Down