Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk status incorrectly reported #182

Closed
hashi825 opened this issue Jun 16, 2021 · 2 comments · Fixed by #183
Closed

Disk status incorrectly reported #182

hashi825 opened this issue Jun 16, 2021 · 2 comments · Fixed by #183
Labels
bug Something isn't working status/done

Comments

@hashi825
Copy link

Describe the bug
Disk status incorrectly reported

Environment
Provide accurate information about the environment to help us reproduce the issue.

  • Harvest version: harvest version 21.05.2-1 (commit ce091de) (build date 2021-06-14T20:31:09+0530) linux/amd64
  • Command line arguments used: [e.g. bin/harvest start --config=foo.yml --collectors Zapi]
  • OS: RHEL 7.9
  • Install method: yum
  • ONTAP Version: 9.7
  • Other:

To Reproduce
N/A

Expected behavior
disk_status metrics should return 0 for a failed disk and 1 for a healthy disk

Actual behavior
disk_status reports 1 for all disks regardless of state

Possible solution, workaround, fix
Issue is due to the outage info not included as a label so the value mapping does not work in file /conf/zapi/cdot/9.8.0/disk.yaml

- reason => outage

adding ^ to include the reason label corrects the status value mapping

@cgrinds cgrinds added bug Something isn't working status/open and removed status/needs-triage labels Jun 16, 2021
cgrinds added a commit that referenced this issue Jun 16, 2021
that disk status is correctly reported

Fixes #182
@cgrinds
Copy link
Collaborator

cgrinds commented Jun 16, 2021

Thanks @hashi825 for the issue and fix

vgratian pushed a commit that referenced this issue Jun 21, 2021
deb/rpm harvest.example changes

Handle special characters in passwords

This change only addresses passwords in Pollers and Defaults. The bigger
refactor is to use HarvestConfig through out the codebase, but that was too
big a change at the moment. That change touches a lot more code.

When that change is made, the code in conf.LoadConfig can be removed.

fix remaining merge

Enable GitHub code scanning

Remove extra fmt workflow action

Remove redundant Slack section and polish

Add Dev team to clabot

Add license check and GitHub action

add zerolog pretty print for console

InsecureSkipVerify with basicauth

Correct httpd logging pattern

Replace snake case with camel

Fix mistyped package

Shelf purges instances too soon

Fixes #75

update clabot

allow user-defined URL for the influxDB server

update conf tests, move allow_addrs_regex: not influxdb parameter

auth test cases

Change triage label

Replace CCLA.pdf with online link to CCLA

Remove CONTRIBUTING_CCLA.pdf

uniform structure of collector doc, add explanation about metric collection/calculation

add known issue on WSL

update toc

add rename example, remove tabs disliked by markdown

removed allow_addrs_regex, not a parameter

tab to space

tab to space

remove redundant TOC; spelling

typos in docs

support/hacks for workload objects

templates for 4 workload objects

re-add earlier removed disk counters

chrishenzie has signed the CCLA

Make vendored copy of dependencies

handle panic in collector

Allow insecure Grafana TLS connections

`harvest/grafana` should not rewrite https connections into http

Fixes #111

enable caller for zerolog

Remove buildmode=plugin

Add support for cluster simulator
WIP Implement Caddy style plugins for collectors
Fix go vet warnings in node.go

enable stacktrace during errors

InfluxDB exporter should pass url unchanged

Thanks to @steverweber for the suggestion
Fixes #63

Add unique prom ports and export type

checks to doctor

Prometheus dashboards don't load when exemplar = true

Fixes #96

Don't run harvest as root on RHEL/Deb

See also #122

Improve harvest start behavior

Two cases are improved here:
1) Harvest detects when there is a stale pidfile and correctly restarts the poller process. A stale pidfile is when the pidfile exists in `/var/run/harvest` but there is no running process associated with that pid.

1) Harvest no longer suggests killing an already running poller when you try to start it. This is a a no-op.

Fixes #123

stop renamed pollers

resolved comments for stop pollers in case of rename

Addressed review comments Fixes #20

Restore Zapiperf support workload changes

add missing tag for labels pseudometric

cache ZAPI counters to distinct from own metircs

Update needs triage label

rpb deb bugs Fixes #50 Fixes #129

Auth_style should not be redacted

Run workflows on release branch

Remove unused graphite_leaves

PrometheusPort should be int

Trim absolute file system paths

Add -trimpath to go build so errors and stacktraces print
with module path@version instead of this

{"level":"info","Poller":"infinity","collector":"ZapiPerf:WAFLAggr","caller":"/var/jenkins_home/workspace/BuildHarvestArtifacts/harvest/cmd/poller/collector/collector.go:318","time":"2021-06-11T13:40:03-04:00","message":"recovered from standby mode, back to normal schedule"}

correct ghost poll kill

Sridevi has signed CCLA

Update README.md

Added Upgrade steps to README file
Removed specific links in the Installation steps
Overall updated format

Polish README.md

Reduce redundant information
Make tar gz example copy pasteable

Fix panic in unix.go

When a poller in harvest.yml is changed while a unix collector is running it panics

Fixes #160

Remove pidfiles

- Improve poller detection by injecting IS_HARVEST into exec-ed process's
environment.
- Simplify management code and improve accuracy
- Remove /var/run logic from RPM and Deb

script to validate metrics at runtime

typo

update changelog

update support md

update readme

run ghost kill poller during harvest start

Store reason as a label for disk.yaml so

that disk status is correctly reported

Fixes #182

check trailing newline needs to be done before splitlines

make sure stream trails with newline

label value can be empty

fix mistake in label regex

include empty keys, to make sure label set is consistent

fix export options, to avoid duplicate labels

properly parse boolean parameters

avoid metric name conflict

fix return value when nothing is scraped

drop using lib alias

typo in plugin params

Correcting Grafana Cluster Dashboard Typo plus other same typos

port range changes

resolved merge commits

port range review comments

Encapsulate port mapping

port range changes

Reduce the amount of time and attempts spinning

for status checks

Makes a big difference on Mac when process is not found
Goes from 19.5 seconds to (not) start 27 pollers to
1.9 seconds

Add README on how to setup per poller systemd

services.

Add generate systemd subcommand

check for duplicate metatags, since telegraf complains about this as well

ugly temporary solution against duplicate metatags

temporary fix to duplicate node labels, until fixed in Aggregator plugin

resolve conflicting names with system_node.yaml, to prevent label inconsistency

shelf dashboard: adding ovverride option for shelf field

Node Dashboard Bugs
vgratian pushed a commit that referenced this issue Jun 22, 2021
* script to validate metrics at runtime

* typo

* check trailing newline needs to be done before splitlines

* make sure stream trails with newline

* label value can be empty

* fix mistake in label regex

* include empty keys, to make sure label set is consistent

* fix export options, to avoid duplicate labels

* properly parse boolean parameters

* avoid metric name conflict

* fix return value when nothing is scraped

* drop using lib alias

* typo in plugin params

* check for duplicate metatags, since telegraf complains about this as well

* ugly temporary solution against duplicate metatags

* temporary fix to duplicate node labels, until fixed in Aggregator plugin

* resolve conflicting names with system_node.yaml, to prevent label inconsistency

* harvest yml changes

deb/rpm harvest.example changes

Handle special characters in passwords

This change only addresses passwords in Pollers and Defaults. The bigger
refactor is to use HarvestConfig through out the codebase, but that was too
big a change at the moment. That change touches a lot more code.

When that change is made, the code in conf.LoadConfig can be removed.

fix remaining merge

Enable GitHub code scanning

Remove extra fmt workflow action

Remove redundant Slack section and polish

Add Dev team to clabot

Add license check and GitHub action

add zerolog pretty print for console

InsecureSkipVerify with basicauth

Correct httpd logging pattern

Replace snake case with camel

Fix mistyped package

Shelf purges instances too soon

Fixes #75

update clabot

allow user-defined URL for the influxDB server

update conf tests, move allow_addrs_regex: not influxdb parameter

auth test cases

Change triage label

Replace CCLA.pdf with online link to CCLA

Remove CONTRIBUTING_CCLA.pdf

uniform structure of collector doc, add explanation about metric collection/calculation

add known issue on WSL

update toc

add rename example, remove tabs disliked by markdown

removed allow_addrs_regex, not a parameter

tab to space

tab to space

remove redundant TOC; spelling

typos in docs

support/hacks for workload objects

templates for 4 workload objects

re-add earlier removed disk counters

chrishenzie has signed the CCLA

Make vendored copy of dependencies

handle panic in collector

Allow insecure Grafana TLS connections

`harvest/grafana` should not rewrite https connections into http

Fixes #111

enable caller for zerolog

Remove buildmode=plugin

Add support for cluster simulator
WIP Implement Caddy style plugins for collectors
Fix go vet warnings in node.go

enable stacktrace during errors

InfluxDB exporter should pass url unchanged

Thanks to @steverweber for the suggestion
Fixes #63

Add unique prom ports and export type

checks to doctor

Prometheus dashboards don't load when exemplar = true

Fixes #96

Don't run harvest as root on RHEL/Deb

See also #122

Improve harvest start behavior

Two cases are improved here:
1) Harvest detects when there is a stale pidfile and correctly restarts the poller process. A stale pidfile is when the pidfile exists in `/var/run/harvest` but there is no running process associated with that pid.

1) Harvest no longer suggests killing an already running poller when you try to start it. This is a a no-op.

Fixes #123

stop renamed pollers

resolved comments for stop pollers in case of rename

Addressed review comments Fixes #20

Restore Zapiperf support workload changes

add missing tag for labels pseudometric

cache ZAPI counters to distinct from own metircs

Update needs triage label

rpb deb bugs Fixes #50 Fixes #129

Auth_style should not be redacted

Run workflows on release branch

Remove unused graphite_leaves

PrometheusPort should be int

Trim absolute file system paths

Add -trimpath to go build so errors and stacktraces print
with module path@version instead of this

{"level":"info","Poller":"infinity","collector":"ZapiPerf:WAFLAggr","caller":"/var/jenkins_home/workspace/BuildHarvestArtifacts/harvest/cmd/poller/collector/collector.go:318","time":"2021-06-11T13:40:03-04:00","message":"recovered from standby mode, back to normal schedule"}

correct ghost poll kill

Sridevi has signed CCLA

Update README.md

Added Upgrade steps to README file
Removed specific links in the Installation steps
Overall updated format

Polish README.md

Reduce redundant information
Make tar gz example copy pasteable

Fix panic in unix.go

When a poller in harvest.yml is changed while a unix collector is running it panics

Fixes #160

Remove pidfiles

- Improve poller detection by injecting IS_HARVEST into exec-ed process's
environment.
- Simplify management code and improve accuracy
- Remove /var/run logic from RPM and Deb

script to validate metrics at runtime

typo

update changelog

update support md

update readme

run ghost kill poller during harvest start

Store reason as a label for disk.yaml so

that disk status is correctly reported

Fixes #182

check trailing newline needs to be done before splitlines

make sure stream trails with newline

label value can be empty

fix mistake in label regex

include empty keys, to make sure label set is consistent

fix export options, to avoid duplicate labels

properly parse boolean parameters

avoid metric name conflict

fix return value when nothing is scraped

drop using lib alias

typo in plugin params

Correcting Grafana Cluster Dashboard Typo plus other same typos

port range changes

resolved merge commits

port range review comments

Encapsulate port mapping

port range changes

Reduce the amount of time and attempts spinning

for status checks

Makes a big difference on Mac when process is not found
Goes from 19.5 seconds to (not) start 27 pollers to
1.9 seconds

Add README on how to setup per poller systemd

services.

Add generate systemd subcommand

check for duplicate metatags, since telegraf complains about this as well

ugly temporary solution against duplicate metatags

temporary fix to duplicate node labels, until fixed in Aggregator plugin

resolve conflicting names with system_node.yaml, to prevent label inconsistency

shelf dashboard: adding ovverride option for shelf field

Node Dashboard Bugs

Co-authored-by: rahulg2 <rahul.gupta@netapp.com>
@rahulguptajss rahulguptajss self-assigned this Aug 26, 2021
@rahulguptajss
Copy link
Contributor

verified in 21.08

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status/done
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants