Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvest should improve CI to detect missing metrics #2820

Closed
1 of 3 tasks
cgrinds opened this issue Apr 9, 2024 · 4 comments · Fixed by #2934 or #2945
Closed
1 of 3 tasks

Harvest should improve CI to detect missing metrics #2820

cgrinds opened this issue Apr 9, 2024 · 4 comments · Fixed by #2934 or #2945
Assignees
Labels

Comments

@cgrinds
Copy link
Collaborator

cgrinds commented Apr 9, 2024

Recent example #2819

  • Run the harvest REST/ZAPI diff CLI and analyze result
  • Fix parity
  • Add ZapiRest Diff Tool to CI
@cgrinds cgrinds added feature New feature or request 24.05 CI labels Apr 9, 2024
@rahulguptajss
Copy link
Contributor

Consider catching examples like #2816

@rahulguptajss
Copy link
Contributor

Below is the summary for Rest/Zapi gaps

Most of these are known.

Missing Metrics by Object in Prometheus

  • flexcache: [flexcache_reconciled_data_entries, flexcache_evict_skipped_reason_offline, flexcache_blocks_retrieved_from_origin, flexcache_reconciled_lock_entries, flexcache_nix_skipped_reason_in_progress, flexcache_nix_skipped_reason_disconnected, flexcache_nix_retry_skipped_reason_initiator_retrieve, flexcache_evict_skipped_reason_disconnected, flexcache_nix_skipped_reason_offline, flexcache_nix_skipped_reason_config_noent, flexcache_evict_rw_cache_skipped_reason_disconnected, flexcache_invalidate_skipped_reason_config_noent, flexcache_invalidate_skipped_reason_disconnected, flexcache_evict_skipped_reason_config_noent, flexcache_invalidate_skipped_reason_offline, flexcache_blocks_requested_from_client, flexcache_miss_percent]

Reason: Missing flexcache template in Rest

  • iscsi: [iscsi_lif_cmd_transfered]

Reason: Rest publishes with correct spelling iscsi_lif_cmd_transferred

  • rw: [rw_ctx_nfs_rewinds, rw_ctx_qos_flowcontrol, rw_ctx_nfs_giveups, rw_ctx_qos_rewinds, rw_ctx_cifs_giveups, rw_ctx_cifs_rewinds]

Reason: Missing template in Rest

  • qtree: [qtree_threshold]

Reason: Not available in Rest. The REST API provides space.soft_limit which has similar functionality.

  • wafl: [wafl_reads_from_pmem]

Reason: Not available in Rest

  • security: [security_audit_destination_port]

Reason: Not available in Rest

  • aggr: [aggr_hybrid_cache_size_total, aggr_snapshot_inode_used_percent, aggr_space_reserved]

Reason: Not available in Rest

  • node: [node_nvme_fc_data_sent, node_nvme_fc_ops, node_nvme_fc_data_recv]

Reason We are publishing these in Zapi and Rest as node_nvmf_data_sent,node_nvmf_ops and node_nvmf_data_recv. But these are old names. Should we use new nvme names?

  • quota: [quota_threshold]

Reason: Not available in Rest. The REST API provides space.soft_limit which has similar functionality.

Missing Metrics from Dashboard

  • aggr: [aggr_snapshot_inode_used_percent]
  • flexcache: [flexcache_reconciled_data_entries, flexcache_evict_skipped_reason_offline, flexcache_blocks_retrieved_from_origin, flexcache_reconciled_lock_entries, flexcache_nix_skipped_reason_in_progress, flexcache_nix_skipped_reason_disconnected, flexcache_nix_retry_skipped_reason_initiator_retrieve, flexcache_evict_skipped_reason_disconnected, flexcache_nix_skipped_reason_offline, flexcache_nix_skipped_reason_config_noent, flexcache_evict_rw_cache_skipped_reason_disconnected, flexcache_invalidate_skipped_reason_config_noent, flexcache_invalidate_skipped_reason_disconnected, flexcache_evict_skipped_reason_config_noent, flexcache_blocks_requested_from_client, flexcache_miss_percent]
  • rw: [rw_ctx_nfs_rewinds, rw_ctx_qos_flowcontrol, rw_ctx_nfs_giveups, rw_ctx_qos_rewinds, rw_ctx_cifs_giveups, rw_ctx_cifs_rewinds]

@cgrinds
Copy link
Collaborator Author

cgrinds commented May 23, 2024

When ZAPI and REST have different spellings or metric names. Pick one:

  • change REST templates to match ZAPI

  • change ZAPI to match REST, after consulting with community

  • Nvme name changes happen because of counter deprecation. For REST, choose the new names

  • We should add space.soft_limit to REST. Change plugin to publish same as ZAPI

  • Consider consolidating the two tools: bin/harvest generate metrics, bin/harvest doctor diffzapirest and address the issue identify in this issue that neither of the tools handle well today. If needed, encode the known issues into the tool(s) so we track these and don't have to revisit before release

@cgrinds cgrinds added the 24.08 label May 23, 2024
@rahulguptajss rahulguptajss linked a pull request May 29, 2024 that will close this issue
@rahulguptajss
Copy link
Contributor

Here are some additional improvements for CI that we discussed:

Currently, we run all smoke tests for RPM, native, and containers. We could limit the RPM to only the upgrade test, and restrict containers to test Harvest start and stop commands only. We could run the full smoke suite for only native.

@rahulguptajss rahulguptajss linked a pull request May 30, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants