Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVMe Namespace dashboard #1829

Closed
rahulguptajss opened this issue Mar 17, 2023 Discussed in #1823 · 15 comments · Fixed by #1850 or #1862
Closed

NVMe Namespace dashboard #1829

rahulguptajss opened this issue Mar 17, 2023 Discussed in #1823 · 15 comments · Fixed by #1850 or #1862
Labels

Comments

@rahulguptajss
Copy link
Contributor

Discussed in #1823

Originally posted by ddhti March 16, 2023
Hi,

I'm using nabox 3.2 with harvest 23.02.0-1. I have NVMe LUNs (namespaces) in use but they are not under the LUNs dashboard and I can't find them elsewhere. There are NVMe stats under the SVM dashboard but that's it. ... Also, I don't see a storagegrid dashboard (but I haven't added my SG yet).

What step or config am I missing?

thank you.

@rahulguptajss rahulguptajss self-assigned this Mar 17, 2023
@rahulguptajss rahulguptajss added 23.05 feature New feature or request customer labels Mar 17, 2023
@rahulguptajss
Copy link
Contributor Author

rahulguptajss commented Mar 17, 2023

@ddhti Harvest exposes namespace metrics as available in our docs. As part of this issue, we'll add these to the dashboard. Let us know if these metrics cover your use case.

@ddhti
Copy link

ddhti commented Mar 17, 2023

Those are a good start. I guess what I'm looking for are the same metrics you'd get on the ONTAP: LUN dashboard where applicable. For example its good to have latency counters but knowing where the latency is coming from is key to troubleshooting. Backend? Frontend? Throttle? I don't know what counters can be reported on but I see Namespaces as LUNs for all practical purposes.

@rahulguptajss rahulguptajss changed the title NVMe dashboard NVMe Namespace dashboard Mar 20, 2023
@rahulguptajss
Copy link
Contributor Author

@ddhti Sure, we'll check with ONTAP if latency breakup is available for this Object. I only see latency break up for workloads.

@rahulguptajss
Copy link
Contributor Author

@ddhti According to ONTAP's response, the latency breakdown at the delay center level is not available for the Namespace. To obtain this information, you will need to submit a feature request to ONTAP. However, Harvest does provide delay center level latency breakdown for workload/volume objects.

We'll create a NVMe Namespace dashboard with available metrics.

@rahulguptajss
Copy link
Contributor Author

@ddhti NVMe Dashboard is ready. It is available in nightly builds.

image
image

@ddhti
Copy link

ddhti commented Mar 24, 2023

Very cool! I'll grab it and let you know.

@ddhti
Copy link

ddhti commented Mar 24, 2023

On first glance it looks like it has all the data I need. However I have "No Data" in the Table drilldown. And in my case the labels in the performance drilldown are a bit unwieldy. I'm running 9.10.1P10 if that matters.

image

@rahulguptajss
Copy link
Contributor Author

I believe , your harvest templates are not upto date or are still at older Harvest version. Could you share Harvest version from NABox UI.

Make sure, you have templates like below in NABox. After ssh into NABox, Folder /opt/harvest2-conf/

https://github.com/NetApp/harvest/blob/main/conf/zapiperf/cdot/9.10.1/namespace.yaml#L8
https://github.com/NetApp/harvest/blob/main/conf/zapi/cdot/9.8.0/namespace.yaml

@rahulguptajss
Copy link
Contributor Author

Also please share NABox logs with us.

If you're using nabox, see log collection.

Email them to ng-harvest-files@netapp.com This mail address is accessible to NetApp Harvest employees only.

@ddhti
Copy link

ddhti commented Mar 24, 2023

Hmm. Embarrassing but I refreshed the page after about an hour and everything came up correctly, including the tables.

@rahulguptajss
Copy link
Contributor Author

Thanks for the update.

@ddhti
Copy link

ddhti commented Mar 24, 2023

One more thing, I think the units are off (hoping). The page displays hundreds of ms for every NVMe lun, but I think it should be us. When comparing to the volumes they are hosted in the values (units) seem off. When I dig into Prometheus for a particular volume and look at volume_read_latency I get an absolute number such as 632.1234 ... that's graphed as 632us. But when I look at namespace_avg_read_latency it's absolute number is similar but graphed in ms (or Seconds!).

@rahulguptajss
Copy link
Contributor Author

Thanks. Yes you are right. We'll fix the units.

@rahulguptajss rahulguptajss linked a pull request Mar 24, 2023 that will close this issue
@ddhti
Copy link

ddhti commented Mar 24, 2023

Awesome!! I was worried for my performance lol

@cgrinds
Copy link
Collaborator

cgrinds commented May 1, 2023

Verified on 3ffa2eb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants