Skip to content

Commit

Permalink
Merge #95190
Browse files Browse the repository at this point in the history
95190: ui: display more load stats on /hotranges page  r=koorosh a=kvoli

First two commits are from #95388.

Previously the hot ranges page only showed one statistic that helped
identify high load ("hot") ranges: queries per second (QPS). QPS is a
measure of the number of batch request a replica processed per second,
averaged over the last 30 minutes. If the batch request composition is
non-uniform in terms of incidental load on the cluster, the correlation
between QPS and importance of a range to the end user weakens.

This commit adds more statistics per-range to the hot ranges ui page to
provide better insight into load composition. The statistics are rated
(per-second) and averaged over the last 30 minutes, the same as the
existing load statistic QPS.

- CPU: CPU time used in processing this range.
- Write (keys):  number of keys written on this range.
- Write (bytes): number of bytes written on this range.
- Read (keys):   number of keys read on this range.
- Read (bytes):  number of bytes read on this range.

The ranges that are displayed on the hot ranges page can be ordered
using these statistics as a key. An example page view is shown below
running kv50.

![image](https://user-images.githubusercontent.com/39606633/213811279-07348a81-05dd-463f-a970-706083c681c3.png)

depends on: #95388

resolves: #95386

Release note (ui change) Add write bytes, write keys, read bytes, read
keys and cpu statistics to the `/hotranges` db-console page. These
statistics are the rated average over the last 30 minutes.

Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
  • Loading branch information
craig[bot] and kvoli committed Jan 27, 2023
2 parents 54ba1a0 + 4ba747e commit 85e146e
Show file tree
Hide file tree
Showing 10 changed files with 197 additions and 32 deletions.
10 changes: 10 additions & 0 deletions docs/generated/http/full.md
Expand Up @@ -1329,6 +1329,7 @@ only.
| reads_per_second | [double](#cockroach.server.serverpb.RaftDebugResponse-double) | | Reads per second served is the number of keys read from this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |
| write_bytes_per_second | [double](#cockroach.server.serverpb.RaftDebugResponse-double) | | Writes (bytes) per second is the number of bytes written to this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |
| read_bytes_per_second | [double](#cockroach.server.serverpb.RaftDebugResponse-double) | | Reads (bytes) per second is the number of bytes read from this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |
| cpu_time_per_second | [double](#cockroach.server.serverpb.RaftDebugResponse-double) | | CPU time (ns) per second is the cpu usage of this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |



Expand Down Expand Up @@ -1575,6 +1576,7 @@ only.
| reads_per_second | [double](#cockroach.server.serverpb.RangesResponse-double) | | Reads per second served is the number of keys read from this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |
| write_bytes_per_second | [double](#cockroach.server.serverpb.RangesResponse-double) | | Writes (bytes) per second is the number of bytes written to this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |
| read_bytes_per_second | [double](#cockroach.server.serverpb.RangesResponse-double) | | Reads (bytes) per second is the number of bytes read from this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |
| cpu_time_per_second | [double](#cockroach.server.serverpb.RangesResponse-double) | | CPU time (ns) per second is the cpu usage of this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |



Expand Down Expand Up @@ -1784,6 +1786,7 @@ only.
| reads_per_second | [double](#cockroach.server.serverpb.TenantRangesResponse-double) | | Reads per second served is the number of keys read from this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |
| write_bytes_per_second | [double](#cockroach.server.serverpb.TenantRangesResponse-double) | | Writes (bytes) per second is the number of bytes written to this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |
| read_bytes_per_second | [double](#cockroach.server.serverpb.TenantRangesResponse-double) | | Reads (bytes) per second is the number of bytes read from this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |
| cpu_time_per_second | [double](#cockroach.server.serverpb.TenantRangesResponse-double) | | CPU time (ns) per second is the cpu usage of this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |



Expand Down Expand Up @@ -3495,6 +3498,7 @@ target node(s) selected in a HotRangesRequest.
| reads_per_second | [double](#cockroach.server.serverpb.HotRangesResponse-double) | | Reads per second is the recent number of keys read per second on this range. | [reserved](#support-status) |
| write_bytes_per_second | [double](#cockroach.server.serverpb.HotRangesResponse-double) | | Write bytes per second is the recent number of bytes written per second on this range. | [reserved](#support-status) |
| read_bytes_per_second | [double](#cockroach.server.serverpb.HotRangesResponse-double) | | Read bytes per second is the recent number of bytes read per second on this range. | [reserved](#support-status) |
| cpu_time_per_second | [double](#cockroach.server.serverpb.HotRangesResponse-double) | | CPU time per second is the recent cpu usage in nanoseconds of this range. | [reserved](#support-status) |



Expand Down Expand Up @@ -3567,6 +3571,11 @@ HotRange message describes a single hot range, ie its QPS, node ID it belongs to
| leaseholder_node_id | [int32](#cockroach.server.serverpb.HotRangesResponseV2-int32) | | leaseholder_node_id indicates the Node ID that is the current leaseholder for the given range. | [reserved](#support-status) |
| schema_name | [string](#cockroach.server.serverpb.HotRangesResponseV2-string) | | schema_name provides the name of schema (if exists) for table in current range. | [reserved](#support-status) |
| store_id | [int32](#cockroach.server.serverpb.HotRangesResponseV2-int32) | | store_id indicates the Store ID where range is stored. | [reserved](#support-status) |
| writes_per_second | [double](#cockroach.server.serverpb.HotRangesResponseV2-double) | | writes_per_second is the recent number of keys written per second on this range. | [reserved](#support-status) |
| reads_per_second | [double](#cockroach.server.serverpb.HotRangesResponseV2-double) | | reads_per_second is the recent number of keys read per second on this range. | [reserved](#support-status) |
| write_bytes_per_second | [double](#cockroach.server.serverpb.HotRangesResponseV2-double) | | write_bytes_per_second is the recent number of bytes written per second on this range. | [reserved](#support-status) |
| read_bytes_per_second | [double](#cockroach.server.serverpb.HotRangesResponseV2-double) | | read_bytes_per_second is the recent number of bytes read per second on this range. | [reserved](#support-status) |
| cpu_time_per_second | [double](#cockroach.server.serverpb.HotRangesResponseV2-double) | | CPU time (ns) per second is the recent cpu usage per second on this range. | [reserved](#support-status) |



Expand Down Expand Up @@ -3881,6 +3890,7 @@ only.
| reads_per_second | [double](#cockroach.server.serverpb.RangeResponse-double) | | Reads per second served is the number of keys read from this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |
| write_bytes_per_second | [double](#cockroach.server.serverpb.RangeResponse-double) | | Writes (bytes) per second is the number of bytes written to this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |
| read_bytes_per_second | [double](#cockroach.server.serverpb.RangeResponse-double) | | Reads (bytes) per second is the number of bytes read from this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |
| cpu_time_per_second | [double](#cockroach.server.serverpb.RangeResponse-double) | | CPU time (ns) per second is the cpu usage of this range per second, averaged over the last 30 minute period. | [reserved](#support-status) |



Expand Down
1 change: 1 addition & 0 deletions docs/generated/http/hotranges-other.md
Expand Up @@ -68,5 +68,6 @@ Support status: [alpha](#support-status)
| reads_per_second | [double](#double) | | Reads per second is the recent number of keys read per second on this range. | [reserved](#support-status) |
| write_bytes_per_second | [double](#double) | | Write bytes per second is the recent number of bytes written per second on this range. | [reserved](#support-status) |
| read_bytes_per_second | [double](#double) | | Read bytes per second is the recent number of bytes read per second on this range. | [reserved](#support-status) |
| cpu_time_per_second | [double](#double) | | CPU time per second is the recent cpu usage in nanoseconds of this range. | [reserved](#support-status) |


27 changes: 26 additions & 1 deletion docs/generated/swagger/spec.json
Expand Up @@ -2104,6 +2104,31 @@
"format": "double",
"x-go-name": "QPS"
},
"writes_per_second": {
"type": "number",
"format": "double",
"x-go-name": "WritesPerSecond"
},
"reads_per_second": {
"type": "number",
"format": "double",
"x-go-name": "ReadsPerSecond"
},
"write_bytes_per_second": {
"type": "number",
"format": "double",
"x-go-name": "WriteBytesPerSecond"
},
"read_bytes_per_second": {
"type": "number",
"format": "double",
"x-go-name": "ReadBytesPerSecond"
},
"cpu_time_per_second": {
"type": "number",
"format": "double",
"x-go-name": "CPUTimePerSecond"
},
"range_id": {
"$ref": "#/definitions/RangeID"
},
Expand Down Expand Up @@ -2499,4 +2524,4 @@
"in": "header"
}
}
}
}
3 changes: 2 additions & 1 deletion pkg/kv/kvserver/store.go
Expand Up @@ -3032,7 +3032,7 @@ type HotReplicaInfo struct {
WriteKeysPerSecond float64
WriteBytesPerSecond float64
ReadBytesPerSecond float64
CPUNanosPerSecond float64
CPUTimePerSecond float64
}

// HottestReplicas returns the hottest replicas on a store, sorted by their
Expand Down Expand Up @@ -3064,6 +3064,7 @@ func mapToHotReplicasInfo(repls []CandidateReplica) []HotReplicaInfo {
hotRepls[i].ReadKeysPerSecond = loadStats.ReadKeysPerSecond
hotRepls[i].WriteBytesPerSecond = loadStats.WriteBytesPerSecond
hotRepls[i].ReadBytesPerSecond = loadStats.ReadBytesPerSecond
hotRepls[i].CPUTimePerSecond = loadStats.RaftCPUNanosPerSecond + loadStats.RequestCPUNanosPerSecond
}
return hotRepls
}
Expand Down
1 change: 1 addition & 0 deletions pkg/server/BUILD.bazel
Expand Up @@ -487,6 +487,7 @@ go_test(
"//pkg/util/encoding",
"//pkg/util/envutil",
"//pkg/util/grpcutil",
"//pkg/util/grunning",
"//pkg/util/hlc",
"//pkg/util/httputil",
"//pkg/util/humanizeutil",
Expand Down
50 changes: 30 additions & 20 deletions pkg/server/api_v2_ranges.go
Expand Up @@ -440,16 +440,21 @@ type hotRangesResponse struct {
//
// swagger:model hotRangeInfo
type hotRangeInfo struct {
RangeID roachpb.RangeID `json:"range_id"`
NodeID roachpb.NodeID `json:"node_id"`
QPS float64 `json:"qps"`
LeaseholderNodeID roachpb.NodeID `json:"leaseholder_node_id"`
TableName string `json:"table_name"`
DatabaseName string `json:"database_name"`
IndexName string `json:"index_name"`
SchemaName string `json:"schema_name"`
ReplicaNodeIDs []roachpb.NodeID `json:"replica_node_ids"`
StoreID roachpb.StoreID `json:"store_id"`
RangeID roachpb.RangeID `json:"range_id"`
NodeID roachpb.NodeID `json:"node_id"`
QPS float64 `json:"qps"`
WritesPerSecond float64 `json:"writes_per_second"`
ReadsPerSecond float64 `json:"reads_per_second"`
WriteBytesPerSecond float64 `json:"write_bytes_per_second"`
ReadBytesPerSecond float64 `json:"read_bytes_per_second"`
CPUTimePerSecond float64 `json:"cpu_time_per_second"`
LeaseholderNodeID roachpb.NodeID `json:"leaseholder_node_id"`
TableName string `json:"table_name"`
DatabaseName string `json:"database_name"`
IndexName string `json:"index_name"`
SchemaName string `json:"schema_name"`
ReplicaNodeIDs []roachpb.NodeID `json:"replica_node_ids"`
StoreID roachpb.StoreID `json:"store_id"`
}

// swagger:operation GET /ranges/hot/ listHotRanges
Expand Down Expand Up @@ -522,16 +527,21 @@ func (a *apiV2Server) listHotRanges(w http.ResponseWriter, r *http.Request) {
var hotRangeInfos = make([]hotRangeInfo, len(resp.Ranges))
for i, r := range resp.Ranges {
hotRangeInfos[i] = hotRangeInfo{
RangeID: r.RangeID,
NodeID: r.NodeID,
QPS: r.QPS,
LeaseholderNodeID: r.LeaseholderNodeID,
TableName: r.TableName,
DatabaseName: r.DatabaseName,
IndexName: r.IndexName,
ReplicaNodeIDs: r.ReplicaNodeIds,
SchemaName: r.SchemaName,
StoreID: r.StoreID,
RangeID: r.RangeID,
NodeID: r.NodeID,
QPS: r.QPS,
WritesPerSecond: r.WritesPerSecond,
ReadsPerSecond: r.ReadsPerSecond,
WriteBytesPerSecond: r.WriteBytesPerSecond,
ReadBytesPerSecond: r.ReadBytesPerSecond,
CPUTimePerSecond: r.CPUTimePerSecond,
LeaseholderNodeID: r.LeaseholderNodeID,
TableName: r.TableName,
DatabaseName: r.DatabaseName,
IndexName: r.IndexName,
ReplicaNodeIDs: r.ReplicaNodeIds,
SchemaName: r.SchemaName,
StoreID: r.StoreID,
}
}
return hotRangeInfos, nil
Expand Down
20 changes: 20 additions & 0 deletions pkg/server/serverpb/status.proto
Expand Up @@ -414,6 +414,9 @@ message RangeStatistics {
// Reads (bytes) per second is the number of bytes read from this range per
// second, averaged over the last 30 minute period.
double read_bytes_per_second = 6;
// CPU time (ns) per second is the cpu usage of this range per second,
// averaged over the last 30 minute period.
double cpu_time_per_second = 7 [(gogoproto.customname) = "CPUTimePerSecond"];
}

message PrettySpan {
Expand Down Expand Up @@ -1340,6 +1343,8 @@ message HotRangesResponse {
// Read bytes per second is the recent number of bytes read per second on
// this range.
double read_bytes_per_second = 8;
// CPU time per second is the recent cpu usage in nanoseconds of this range.
double cpu_time_per_second = 9 [(gogoproto.customname) = "CPUTimePerSecond"];
}

// StoreResponse contains the part of a hot ranges report that
Expand Down Expand Up @@ -1432,6 +1437,21 @@ message HotRangesResponseV2 {
(gogoproto.casttype) =
"github.com/cockroachdb/cockroach/pkg/roachpb.StoreID"
];
// writes_per_second is the recent number of keys written per second on
// this range.
double writes_per_second = 11;
// reads_per_second is the recent number of keys read per second on
// this range.
double reads_per_second = 12;
// write_bytes_per_second is the recent number of bytes written per second
// on this range.
double write_bytes_per_second = 13;
// read_bytes_per_second is the recent number of bytes read per second on
// this range.
double read_bytes_per_second = 14;
// CPU time (ns) per second is the recent cpu usage per second on this
// range.
double cpu_time_per_second = 15 [(gogoproto.customname) = "CPUTimePerSecond"];
}
// Ranges contain list of hot ranges info that has highest number of QPS.
repeated HotRange ranges = 1;
Expand Down
27 changes: 17 additions & 10 deletions pkg/server/status.go
Expand Up @@ -2096,6 +2096,7 @@ func (s *systemStatusServer) rangesHelper(
ReadsPerSecond: loadStats.ReadKeysPerSecond,
WriteBytesPerSecond: loadStats.WriteKeysPerSecond,
ReadBytesPerSecond: loadStats.ReadBytesPerSecond,
CPUTimePerSecond: loadStats.RaftCPUNanosPerSecond + loadStats.RequestCPUNanosPerSecond,
},
Problems: serverpb.RangeProblems{
Unavailable: metrics.Unavailable,
Expand Down Expand Up @@ -2538,16 +2539,21 @@ func (s *systemStatusServer) HotRangesV2(
}

ranges = append(ranges, &serverpb.HotRangesResponseV2_HotRange{
RangeID: r.Desc.RangeID,
NodeID: requestedNodeID,
QPS: r.QueriesPerSecond,
TableName: tableName,
SchemaName: schemaName,
DatabaseName: dbName,
IndexName: indexName,
ReplicaNodeIds: replicaNodeIDs,
LeaseholderNodeID: r.LeaseholderNodeID,
StoreID: store.StoreID,
RangeID: r.Desc.RangeID,
NodeID: requestedNodeID,
QPS: r.QueriesPerSecond,
WritesPerSecond: r.WritesPerSecond,
ReadsPerSecond: r.ReadsPerSecond,
WriteBytesPerSecond: r.WriteBytesPerSecond,
ReadBytesPerSecond: r.ReadBytesPerSecond,
CPUTimePerSecond: r.CPUTimePerSecond,
TableName: tableName,
SchemaName: schemaName,
DatabaseName: dbName,
IndexName: indexName,
ReplicaNodeIds: replicaNodeIDs,
LeaseholderNodeID: r.LeaseholderNodeID,
StoreID: store.StoreID,
})
}
}
Expand Down Expand Up @@ -2641,6 +2647,7 @@ func (s *systemStatusServer) localHotRanges(
storeResp.HotRanges[i].ReadsPerSecond = r.ReadKeysPerSecond
storeResp.HotRanges[i].WriteBytesPerSecond = r.WriteBytesPerSecond
storeResp.HotRanges[i].ReadBytesPerSecond = r.ReadBytesPerSecond
storeResp.HotRanges[i].CPUTimePerSecond = r.CPUTimePerSecond
}
resp.Stores = append(resp.Stores, storeResp)
return nil
Expand Down
24 changes: 24 additions & 0 deletions pkg/server/status_test.go
Expand Up @@ -59,6 +59,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/testutils/sqlutils"
"github.com/cockroachdb/cockroach/pkg/ts"
"github.com/cockroachdb/cockroach/pkg/util"
"github.com/cockroachdb/cockroach/pkg/util/grunning"
"github.com/cockroachdb/cockroach/pkg/util/httputil"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
"github.com/cockroachdb/cockroach/pkg/util/log"
Expand Down Expand Up @@ -1054,6 +1055,18 @@ func TestHotRangesResponse(t *testing.T) {
if r.Desc.RangeID == 0 || (len(r.Desc.StartKey) == 0 && len(r.Desc.EndKey) == 0) {
t.Errorf("unexpected empty/unpopulated range descriptor: %+v", r.Desc)
}
if r.QueriesPerSecond > 0 {
if r.ReadsPerSecond == 0 && r.WritesPerSecond == 0 {
t.Errorf("qps %.2f > 0, expected either reads=%.2f or writes=%.2f to be non-zero",
r.QueriesPerSecond, r.ReadsPerSecond, r.WritesPerSecond)
}
// If the architecture doesn't support sampling CPU, it
// will also be zero.
if grunning.Supported() && r.CPUTimePerSecond == 0 {
t.Errorf("qps %.2f > 0, expected cpu=%.2f to be non-zero",
r.QueriesPerSecond, r.CPUTimePerSecond)
}
}
if r.QueriesPerSecond > lastQPS {
t.Errorf("unexpected increase in qps between ranges; prev=%.2f, current=%.2f, desc=%v",
lastQPS, r.QueriesPerSecond, r.Desc)
Expand Down Expand Up @@ -1083,6 +1096,17 @@ func TestHotRanges2Response(t *testing.T) {
if r.RangeID == 0 {
t.Errorf("unexpected empty range id: %d", r.RangeID)
}
if r.QPS > 0 {
if r.ReadsPerSecond == 0 && r.WritesPerSecond == 0 {
t.Errorf("qps %.2f > 0, expected either reads=%.2f or writes=%.2f to be non-zero",
r.QPS, r.ReadsPerSecond, r.WritesPerSecond)
}
// If the architecture doesn't support sampling CPU, it
// will also be zero.
if grunning.Supported() && r.CPUTimePerSecond == 0 {
t.Errorf("qps %.2f > 0, expected cpu=%.2f to be non-zero", r.QPS, r.CPUTimePerSecond)
}
}
if r.QPS > lastQPS {
t.Errorf("unexpected increase in qps between ranges; prev=%.2f, current=%.2f", lastQPS, r.QPS)
}
Expand Down

0 comments on commit 85e146e

Please sign in to comment.