New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disk status incorrectly reported #182
Labels
Comments
cgrinds
added
bug
Something isn't working
status/open
and removed
status/needs-triage
labels
Jun 16, 2021
cgrinds
added a commit
that referenced
this issue
Jun 16, 2021
that disk status is correctly reported Fixes #182
Thanks @hashi825 for the issue and fix |
vgratian
pushed a commit
that referenced
this issue
Jun 21, 2021
deb/rpm harvest.example changes Handle special characters in passwords This change only addresses passwords in Pollers and Defaults. The bigger refactor is to use HarvestConfig through out the codebase, but that was too big a change at the moment. That change touches a lot more code. When that change is made, the code in conf.LoadConfig can be removed. fix remaining merge Enable GitHub code scanning Remove extra fmt workflow action Remove redundant Slack section and polish Add Dev team to clabot Add license check and GitHub action add zerolog pretty print for console InsecureSkipVerify with basicauth Correct httpd logging pattern Replace snake case with camel Fix mistyped package Shelf purges instances too soon Fixes #75 update clabot allow user-defined URL for the influxDB server update conf tests, move allow_addrs_regex: not influxdb parameter auth test cases Change triage label Replace CCLA.pdf with online link to CCLA Remove CONTRIBUTING_CCLA.pdf uniform structure of collector doc, add explanation about metric collection/calculation add known issue on WSL update toc add rename example, remove tabs disliked by markdown removed allow_addrs_regex, not a parameter tab to space tab to space remove redundant TOC; spelling typos in docs support/hacks for workload objects templates for 4 workload objects re-add earlier removed disk counters chrishenzie has signed the CCLA Make vendored copy of dependencies handle panic in collector Allow insecure Grafana TLS connections `harvest/grafana` should not rewrite https connections into http Fixes #111 enable caller for zerolog Remove buildmode=plugin Add support for cluster simulator WIP Implement Caddy style plugins for collectors Fix go vet warnings in node.go enable stacktrace during errors InfluxDB exporter should pass url unchanged Thanks to @steverweber for the suggestion Fixes #63 Add unique prom ports and export type checks to doctor Prometheus dashboards don't load when exemplar = true Fixes #96 Don't run harvest as root on RHEL/Deb See also #122 Improve harvest start behavior Two cases are improved here: 1) Harvest detects when there is a stale pidfile and correctly restarts the poller process. A stale pidfile is when the pidfile exists in `/var/run/harvest` but there is no running process associated with that pid. 1) Harvest no longer suggests killing an already running poller when you try to start it. This is a a no-op. Fixes #123 stop renamed pollers resolved comments for stop pollers in case of rename Addressed review comments Fixes #20 Restore Zapiperf support workload changes add missing tag for labels pseudometric cache ZAPI counters to distinct from own metircs Update needs triage label rpb deb bugs Fixes #50 Fixes #129 Auth_style should not be redacted Run workflows on release branch Remove unused graphite_leaves PrometheusPort should be int Trim absolute file system paths Add -trimpath to go build so errors and stacktraces print with module path@version instead of this {"level":"info","Poller":"infinity","collector":"ZapiPerf:WAFLAggr","caller":"/var/jenkins_home/workspace/BuildHarvestArtifacts/harvest/cmd/poller/collector/collector.go:318","time":"2021-06-11T13:40:03-04:00","message":"recovered from standby mode, back to normal schedule"} correct ghost poll kill Sridevi has signed CCLA Update README.md Added Upgrade steps to README file Removed specific links in the Installation steps Overall updated format Polish README.md Reduce redundant information Make tar gz example copy pasteable Fix panic in unix.go When a poller in harvest.yml is changed while a unix collector is running it panics Fixes #160 Remove pidfiles - Improve poller detection by injecting IS_HARVEST into exec-ed process's environment. - Simplify management code and improve accuracy - Remove /var/run logic from RPM and Deb script to validate metrics at runtime typo update changelog update support md update readme run ghost kill poller during harvest start Store reason as a label for disk.yaml so that disk status is correctly reported Fixes #182 check trailing newline needs to be done before splitlines make sure stream trails with newline label value can be empty fix mistake in label regex include empty keys, to make sure label set is consistent fix export options, to avoid duplicate labels properly parse boolean parameters avoid metric name conflict fix return value when nothing is scraped drop using lib alias typo in plugin params Correcting Grafana Cluster Dashboard Typo plus other same typos port range changes resolved merge commits port range review comments Encapsulate port mapping port range changes Reduce the amount of time and attempts spinning for status checks Makes a big difference on Mac when process is not found Goes from 19.5 seconds to (not) start 27 pollers to 1.9 seconds Add README on how to setup per poller systemd services. Add generate systemd subcommand check for duplicate metatags, since telegraf complains about this as well ugly temporary solution against duplicate metatags temporary fix to duplicate node labels, until fixed in Aggregator plugin resolve conflicting names with system_node.yaml, to prevent label inconsistency shelf dashboard: adding ovverride option for shelf field Node Dashboard Bugs
vgratian
pushed a commit
that referenced
this issue
Jun 22, 2021
* script to validate metrics at runtime * typo * check trailing newline needs to be done before splitlines * make sure stream trails with newline * label value can be empty * fix mistake in label regex * include empty keys, to make sure label set is consistent * fix export options, to avoid duplicate labels * properly parse boolean parameters * avoid metric name conflict * fix return value when nothing is scraped * drop using lib alias * typo in plugin params * check for duplicate metatags, since telegraf complains about this as well * ugly temporary solution against duplicate metatags * temporary fix to duplicate node labels, until fixed in Aggregator plugin * resolve conflicting names with system_node.yaml, to prevent label inconsistency * harvest yml changes deb/rpm harvest.example changes Handle special characters in passwords This change only addresses passwords in Pollers and Defaults. The bigger refactor is to use HarvestConfig through out the codebase, but that was too big a change at the moment. That change touches a lot more code. When that change is made, the code in conf.LoadConfig can be removed. fix remaining merge Enable GitHub code scanning Remove extra fmt workflow action Remove redundant Slack section and polish Add Dev team to clabot Add license check and GitHub action add zerolog pretty print for console InsecureSkipVerify with basicauth Correct httpd logging pattern Replace snake case with camel Fix mistyped package Shelf purges instances too soon Fixes #75 update clabot allow user-defined URL for the influxDB server update conf tests, move allow_addrs_regex: not influxdb parameter auth test cases Change triage label Replace CCLA.pdf with online link to CCLA Remove CONTRIBUTING_CCLA.pdf uniform structure of collector doc, add explanation about metric collection/calculation add known issue on WSL update toc add rename example, remove tabs disliked by markdown removed allow_addrs_regex, not a parameter tab to space tab to space remove redundant TOC; spelling typos in docs support/hacks for workload objects templates for 4 workload objects re-add earlier removed disk counters chrishenzie has signed the CCLA Make vendored copy of dependencies handle panic in collector Allow insecure Grafana TLS connections `harvest/grafana` should not rewrite https connections into http Fixes #111 enable caller for zerolog Remove buildmode=plugin Add support for cluster simulator WIP Implement Caddy style plugins for collectors Fix go vet warnings in node.go enable stacktrace during errors InfluxDB exporter should pass url unchanged Thanks to @steverweber for the suggestion Fixes #63 Add unique prom ports and export type checks to doctor Prometheus dashboards don't load when exemplar = true Fixes #96 Don't run harvest as root on RHEL/Deb See also #122 Improve harvest start behavior Two cases are improved here: 1) Harvest detects when there is a stale pidfile and correctly restarts the poller process. A stale pidfile is when the pidfile exists in `/var/run/harvest` but there is no running process associated with that pid. 1) Harvest no longer suggests killing an already running poller when you try to start it. This is a a no-op. Fixes #123 stop renamed pollers resolved comments for stop pollers in case of rename Addressed review comments Fixes #20 Restore Zapiperf support workload changes add missing tag for labels pseudometric cache ZAPI counters to distinct from own metircs Update needs triage label rpb deb bugs Fixes #50 Fixes #129 Auth_style should not be redacted Run workflows on release branch Remove unused graphite_leaves PrometheusPort should be int Trim absolute file system paths Add -trimpath to go build so errors and stacktraces print with module path@version instead of this {"level":"info","Poller":"infinity","collector":"ZapiPerf:WAFLAggr","caller":"/var/jenkins_home/workspace/BuildHarvestArtifacts/harvest/cmd/poller/collector/collector.go:318","time":"2021-06-11T13:40:03-04:00","message":"recovered from standby mode, back to normal schedule"} correct ghost poll kill Sridevi has signed CCLA Update README.md Added Upgrade steps to README file Removed specific links in the Installation steps Overall updated format Polish README.md Reduce redundant information Make tar gz example copy pasteable Fix panic in unix.go When a poller in harvest.yml is changed while a unix collector is running it panics Fixes #160 Remove pidfiles - Improve poller detection by injecting IS_HARVEST into exec-ed process's environment. - Simplify management code and improve accuracy - Remove /var/run logic from RPM and Deb script to validate metrics at runtime typo update changelog update support md update readme run ghost kill poller during harvest start Store reason as a label for disk.yaml so that disk status is correctly reported Fixes #182 check trailing newline needs to be done before splitlines make sure stream trails with newline label value can be empty fix mistake in label regex include empty keys, to make sure label set is consistent fix export options, to avoid duplicate labels properly parse boolean parameters avoid metric name conflict fix return value when nothing is scraped drop using lib alias typo in plugin params Correcting Grafana Cluster Dashboard Typo plus other same typos port range changes resolved merge commits port range review comments Encapsulate port mapping port range changes Reduce the amount of time and attempts spinning for status checks Makes a big difference on Mac when process is not found Goes from 19.5 seconds to (not) start 27 pollers to 1.9 seconds Add README on how to setup per poller systemd services. Add generate systemd subcommand check for duplicate metatags, since telegraf complains about this as well ugly temporary solution against duplicate metatags temporary fix to duplicate node labels, until fixed in Aggregator plugin resolve conflicting names with system_node.yaml, to prevent label inconsistency shelf dashboard: adding ovverride option for shelf field Node Dashboard Bugs Co-authored-by: rahulg2 <rahul.gupta@netapp.com>
verified in 21.08 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
Disk status incorrectly reported
Environment
Provide accurate information about the environment to help us reproduce the issue.
harvest version 21.05.2-1 (commit ce091de) (build date 2021-06-14T20:31:09+0530) linux/amd64
bin/harvest start --config=foo.yml --collectors Zapi
]To Reproduce
N/A
Expected behavior
disk_status metrics should return 0 for a failed disk and 1 for a healthy disk
Actual behavior
disk_status reports 1 for all disks regardless of state
Possible solution, workaround, fix
Issue is due to the outage info not included as a label so the value mapping does not work in file
/conf/zapi/cdot/9.8.0/disk.yaml
harvest/conf/zapi/cdot/9.8.0/disk.yaml
Line 32 in 667411a
adding
^
to include thereason
label corrects the status value mappingThe text was updated successfully, but these errors were encountered: