Skip to content

Commit

Permalink
doc: add description in subsystem latency panels (#3017)
Browse files Browse the repository at this point in the history
* doc: add description in subsystem latency panels
  • Loading branch information
rahulguptajss committed Jun 26, 2024
1 parent 779a6e7 commit 2eec9a7
Show file tree
Hide file tree
Showing 4 changed files with 80 additions and 52 deletions.
15 changes: 14 additions & 1 deletion cmd/tools/generate/counter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -445,7 +445,20 @@ counters:
Unit: b_per_sec

- Name: qos_detail_resource_latency
Description: This refers to the average latency for workload within the subsystems of the Data ONTAP. These subsystems are the various modules or components within the system that could contribute to delays or latency during data or task processing. The calculated latency includes both the processing time within the subsystem and the waiting time at that subsystem.
Description: |
This refers to the average latency for workloads within the subsystems of Data ONTAP. These subsystems are the various modules or components within the system that could contribute to delays or latency during data or task processing. The calculated latency includes both the processing time within the subsystem and the waiting time at that subsystem. Below is the description of subsystems' latency.
* **frontend**: Represents the delays in the network layer of ONTAP.
* **backend**: Represents the delays in the data/WAFL layer of ONTAP.
* **cluster**: Represents delays caused by the cluster switches, cables, and adapters which physically connect clustered nodes.If the cluster interconnect component is in contention, it means high wait time for I/O requests at the cluster interconnect is impacting the latency of one or more workloads.
* **cp**: Represents delays due to buffered write flushes, called consistency points (cp).
* **disk**: Represents slowness due to attached hard drives or solid state drives.
* **network**: `Note:` Typically these latencies only apply to SAN not NAS. Represents the wait time of I/O requests by the external networking protocols on the cluster. The wait time is time spent waiting for transfer ready transactions to finish before the cluster can respond to an I/O request. If the network component is in contention, it means high wait time at the protocol layer is impacting the latency of one or more workloads.
* **nvlog**: Represents delays due to mirroring writes to the NVRAM/NVLOG memory and to the HA partner NVRAM/NVLOG memory.
* **suspend**: Represents delays due to operations suspending on a delay mechanism. Typically this is diagnosed by NetApp Support.
* **throttle**: Represents the throughput maximum (ceiling) setting of the storage Quality of Service (QoS) policy group assigned to the workload. If the policy group component is in contention, it means all workloads in the policy group are being throttled by the set throughput limit, which is impacting the latency of one or more of those workloads.
* **qos_min**: Represents the latency to a workload that is being caused by QoS throughput floor (expected) setting assigned to other workloads. If the QoS floor set on certain workloads use the majority of the bandwidth to guarantee the promised throughput, other workloads will be throttled and see more latency.
* **cloud**: Represents the software component in the cluster involved with I/O processing between the cluster and the cloud tier on which user data is stored. If the cloud latency component is in contention, it means that a large amount of reads from volumes that are hosted on the cloud tier are impacting the latency of one or more workloads.
APIs:
- API: REST
Endpoint: api/cluster/counter/tables/qos_detail
Expand Down
17 changes: 15 additions & 2 deletions docs/ontap-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ These can be generated on demand by running `bin/harvest grafana metrics`. See
- More information about ONTAP REST performance counters can be found [here](https://docs.netapp.com/us-en/ontap-pcmap-9121/index.html).

```
Creation Date : 2024-Jun-24
Creation Date : 2024-Jun-26
ONTAP Version: 9.15.1
```
## Understanding the structure
Expand Down Expand Up @@ -9434,7 +9434,20 @@ This is the average number of concurrent requests for the workload.

### qos_detail_resource_latency

This refers to the average latency for workload within the subsystems of the Data ONTAP. These subsystems are the various modules or components within the system that could contribute to delays or latency during data or task processing. The calculated latency includes both the processing time within the subsystem and the waiting time at that subsystem.
This refers to the average latency for workloads within the subsystems of Data ONTAP. These subsystems are the various modules or components within the system that could contribute to delays or latency during data or task processing. The calculated latency includes both the processing time within the subsystem and the waiting time at that subsystem. Below is the description of subsystems' latency.

* **frontend**: Represents the delays in the network layer of ONTAP.
* **backend**: Represents the delays in the data/WAFL layer of ONTAP.
* **cluster**: Represents delays caused by the cluster switches, cables, and adapters which physically connect clustered nodes.If the cluster interconnect component is in contention, it means high wait time for I/O requests at the cluster interconnect is impacting the latency of one or more workloads.
* **cp**: Represents delays due to buffered write flushes, called consistency points (cp).
* **disk**: Represents slowness due to attached hard drives or solid state drives.
* **network**: `Note:` Typically these latencies only apply to SAN not NAS. Represents the wait time of I/O requests by the external networking protocols on the cluster. The wait time is time spent waiting for transfer ready transactions to finish before the cluster can respond to an I/O request. If the network component is in contention, it means high wait time at the protocol layer is impacting the latency of one or more workloads.
* **nvlog**: Represents delays due to mirroring writes to the NVRAM/NVLOG memory and to the HA partner NVRAM/NVLOG memory.
* **suspend**: Represents delays due to operations suspending on a delay mechanism. Typically this is diagnosed by NetApp Support.
* **throttle**: Represents the throughput maximum (ceiling) setting of the storage Quality of Service (QoS) policy group assigned to the workload. If the policy group component is in contention, it means all workloads in the policy group are being throttled by the set throughput limit, which is impacting the latency of one or more of those workloads.
* **qos_min**: Represents the latency to a workload that is being caused by QoS throughput floor (expected) setting assigned to other workloads. If the QoS floor set on certain workloads use the majority of the bandwidth to guarantee the promised throughput, other workloads will be throttled and see more latency.
* **cloud**: Represents the software component in the cluster involved with I/O processing between the cluster and the cloud tier on which user data is stored. If the cloud latency component is in contention, it means that a large amount of reads from volumes that are hosted on the cloud tier are impacting the latency of one or more workloads.


| API | Endpoint | Metric | Template |
|--------|----------|--------|---------|
Expand Down
50 changes: 26 additions & 24 deletions grafana/dashboards/cmode-details/volumeDeepDive.json
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,12 @@
}
]
},
"description": "",
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"iteration": 1711032845860,
"iteration": 1719377699253,
"links": [],
"panels": [
{
Expand Down Expand Up @@ -1783,7 +1784,7 @@
"h": 5,
"w": 24,
"x": 0,
"y": 3
"y": 43
},
"id": 44,
"options": {
Expand All @@ -1796,7 +1797,7 @@
},
{
"datasource": "${DS_PROMETHEUS}",
"description": "average latency for workload on Data ONTAP subsystems.",
"description": "`Note:` Typically these latencies only apply to SAN not NAS.\n\nRepresents the wait time of I/O requests by the external networking protocols on the cluster. The wait time is time spent waiting for transfer ready transactions to finish before the cluster can respond to an I/O request. If the network component is in contention, it means high wait time at the protocol layer is impacting the latency of one or more workloads.",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -1852,7 +1853,7 @@
"h": 11,
"w": 8,
"x": 0,
"y": 8
"y": 48
},
"id": 46,
"options": {
Expand Down Expand Up @@ -1886,7 +1887,7 @@
},
{
"datasource": "${DS_PROMETHEUS}",
"description": "average latency for workload on Data ONTAP subsystems.",
"description": "Represents the throughput maximum (ceiling) setting of the storage Quality of Service (QoS) policy group assigned to the workload. If the policy group component is in contention, it means all workloads in the policy group are being throttled by the set throughput limit, which is impacting the latency of one or more of those workloads.",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -1943,7 +1944,7 @@
"h": 11,
"w": 8,
"x": 8,
"y": 8
"y": 48
},
"id": 48,
"options": {
Expand Down Expand Up @@ -2034,7 +2035,7 @@
"h": 11,
"w": 8,
"x": 16,
"y": 8
"y": 48
},
"id": 50,
"options": {
Expand Down Expand Up @@ -2068,7 +2069,7 @@
},
{
"datasource": "${DS_PROMETHEUS}",
"description": "average latency for workload on Data ONTAP subsystems.",
"description": "Represents delays caused by the cluster switches, cables, and adapters which physically connect clustered nodes. \n\nIf the cluster interconnect component is in contention, it means high wait time for I/O requests at the cluster interconnect is impacting the latency of one or more workloads.",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -2124,7 +2125,7 @@
"h": 11,
"w": 8,
"x": 0,
"y": 19
"y": 59
},
"id": 52,
"options": {
Expand Down Expand Up @@ -2158,7 +2159,7 @@
},
{
"datasource": "${DS_PROMETHEUS}",
"description": "average latency for workload on Data ONTAP subsystems.",
"description": "Represents the delays in the data/WAFL layer of ONTAP.",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -2214,7 +2215,7 @@
"h": 11,
"w": 8,
"x": 8,
"y": 19
"y": 59
},
"id": 54,
"options": {
Expand Down Expand Up @@ -2248,7 +2249,7 @@
},
{
"datasource": "${DS_PROMETHEUS}",
"description": "average latency for workload on Data ONTAP subsystems.",
"description": "Represents delays due to buffered write flushes, called consistency points (cp).",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -2305,7 +2306,7 @@
"h": 11,
"w": 8,
"x": 16,
"y": 19
"y": 59
},
"id": 56,
"options": {
Expand Down Expand Up @@ -2339,7 +2340,7 @@
},
{
"datasource": "${DS_PROMETHEUS}",
"description": "average latency for workload on Data ONTAP subsystems.",
"description": "Represents delays due to operations suspending on a delay mechanism. Typically this is diagnosed by NetApp Support.",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -2395,7 +2396,7 @@
"h": 11,
"w": 8,
"x": 0,
"y": 30
"y": 70
},
"id": 58,
"options": {
Expand Down Expand Up @@ -2430,7 +2431,7 @@
},
{
"datasource": "${DS_PROMETHEUS}",
"description": "average latency for workload on Data ONTAP subsystems.",
"description": "Represents the software component in the cluster involved with I/O processing between the cluster and the cloud tier on which user data is stored. If the cloud latency component is in contention, it means that a large amount of reads from volumes that are hosted on the cloud tier are impacting the latency of one or more workloads.",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -2487,7 +2488,7 @@
"h": 11,
"w": 8,
"x": 8,
"y": 30
"y": 70
},
"id": 60,
"options": {
Expand Down Expand Up @@ -2521,7 +2522,7 @@
},
{
"datasource": "${DS_PROMETHEUS}",
"description": "average latency for workload on Data ONTAP subsystems.",
"description": "Represents the delays in the network layer of ONTAP.",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -2577,7 +2578,7 @@
"h": 11,
"w": 8,
"x": 16,
"y": 30
"y": 70
},
"id": 62,
"options": {
Expand Down Expand Up @@ -2611,7 +2612,7 @@
},
{
"datasource": "${DS_PROMETHEUS}",
"description": "average latency for workload on Data ONTAP subsystems.",
"description": "Represents delays due to mirroring writes to the NVRAM/NVLOG memory and to the HA partner NVRAM/NVLOG memory.",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -2667,7 +2668,7 @@
"h": 11,
"w": 8,
"x": 0,
"y": 41
"y": 81
},
"id": 64,
"options": {
Expand Down Expand Up @@ -2701,7 +2702,7 @@
},
{
"datasource": "${DS_PROMETHEUS}",
"description": "average latency for workload on Data ONTAP subsystems.",
"description": "Represents slowness due to attached hard drives or solid state drives.",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -2757,7 +2758,7 @@
"h": 11,
"w": 8,
"x": 8,
"y": 41
"y": 81
},
"id": 66,
"options": {
Expand Down Expand Up @@ -3567,6 +3568,7 @@
"type": "row"
}
],
"refresh": "",
"schemaVersion": 30,
"style": "dark",
"tags": [
Expand Down Expand Up @@ -3734,5 +3736,5 @@
"timezone": "",
"title": "ONTAP: Volume Deep Dive",
"uid": "cdot-volume-deep-dive",
"version": 2
"version": 3
}
Loading

0 comments on commit 2eec9a7

Please sign in to comment.