Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ceph-mixin: Fix CephNodeNetworkPacket alerts #47707

Merged
merged 1 commit into from Aug 30, 2022
Merged

Conversation

bosc0
Copy link

@bosc0 bosc0 commented Aug 19, 2022

Currently the CephNodeNetworkPacketDrops and CephNodeNetworkPacketErrors alert queries count the amount of packet drops/errors per minute instead of per second as mentioned in the description. This commit fixes that and makes the threshold values customizable with slightly higher defaults

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • [] Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@bosc0 bosc0 requested a review from a team as a code owner August 19, 2022 16:13
@bosc0 bosc0 requested review from aaSharma14 and nizamial09 and removed request for a team August 19, 2022 16:13
@bosc0 bosc0 changed the title Ceph-mixin: Fix CephNodeNetworkPacketDrops alert Ceph-mixin: Fix CephNodeNetworkPacket alerts Aug 19, 2022
@github-actions github-actions bot added this to In progress in Dashboard Aug 19, 2022
Dashboard automation moved this from In progress to Review in progress Aug 22, 2022
Copy link
Member

@MrFreezeex MrFreezeex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you regenerate the yaml file with the alerts?

@bosc0 bosc0 force-pushed the fix_alert branch 2 times, most recently from 93e1197 to b712484 Compare August 23, 2022 07:09
Dashboard automation moved this from Review in progress to Reviewer approved Aug 23, 2022
@bosc0 bosc0 force-pushed the fix_alert branch 5 times, most recently from 6d6abcf to 03a1981 Compare August 23, 2022 13:18
Signed-off-by: Aswin Toni <aswin.toni@cern.ch>
@MrFreezeex
Copy link
Member

jenkins test make check

Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @bosc0 !

@mmgaggle could you please review this and check if this would be consistent with your fix in #46842?

) / (
rate(node_network_receive_packets_total{device!="lo"}[1m]) +
rate(node_network_transmit_packets_total{device!="lo"}[1m])
) >= 0.0050000000000000001 and (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

floats, you have to love them! 🙈

@epuertat epuertat requested a review from mmgaggle August 24, 2022 12:03
@aaSharma14
Copy link
Contributor

jenkins test dashboard cephadm

@aaSharma14
Copy link
Contributor

jenkins test dashboard

@aaSharma14
Copy link
Contributor

jenkins test windows

@MrFreezeex
Copy link
Member

I'm merging this as I want to be able to backport the recent changes of ceph-mixin up to pacific (and there are a couple not tracked in the trackers oups). I'm leaving CERN rather soon and I probably won't have access to a dev environment for a few weeks to do that but if @mmgaggle or someone else has a follow up suggestion/PR would be happy to help.

@MrFreezeex MrFreezeex merged commit f744a93 into ceph:main Aug 30, 2022
9 of 13 checks passed
Dashboard automation moved this from Reviewer approved to Done Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Dashboard
  
Done
4 participants