Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mean Time to Restore prometheus rules can miss tickets in calculation #1127

Open
1 task done
etsauer opened this issue Mar 14, 2024 · 0 comments
Open
1 task done

Mean Time to Restore prometheus rules can miss tickets in calculation #1127

etsauer opened this issue Mar 14, 2024 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@etsauer
Copy link
Collaborator

etsauer commented Mar 14, 2024

OpenShift version

Not related to OpenShift

Problem description

There's something wrong with how we are calculating the mean time to restore for individual issues. I'm not sure why, but sometimes we can skip an issue.

Here's a readout from my prometheus:

min by (issue_number, service) (min_over_time(failure_resolution_timestamp{app=~".*pelorus-api.*"}[2d] @ 1710475200)) - min by (issue_number, service) (min_over_time(failure_creation_timestamp{app=~".*pelorus-api.*"}[2d] @ 1710475200))
{issue_number="23", service="github-failure-exporter"}
525
{issue_number="24", service="github-failure-exporter"}
64822
{issue_number="25", service="github-failure-exporter"}
66518
{issue_number="29", service="github-failure-exporter"}
6315
{issue_number="22", service="github-failure-exporter"}
564
sdp:time_to_restore:by_issue{app=~".*pelorus-api.*"}[2d] @ 1710475200

sdp:time_to_restore:by_issue{app="/pelorus-api/", container="github-failure-exporter", endpoint="http", instance="10.129.0.25:8080", issue_number="29", job="github-failure-exporter", namespace="pelorus", pod="github-failure-exporter-1-lnssw", service="github-failure-exporter"}
6315 @1710428928.453
6315 @1710428958.453
6315 @1710428988.453
6315 @1710429018.453
6315 @1710429048.453
6315 @1710429078.453
6315 @1710429108.453
6315 @1710429138.453
6315 @1710429168.453
6315 @1710429198.453
 
sdp:time_to_restore:by_issue{app="/pelorus-api/", container="github-failure-exporter", endpoint="http", instance="10.129.0.32:8080", issue_number="22", job="github-failure-exporter", namespace="pelorus", pod="github-failure-exporter-1-lnssw", service="github-failure-exporter"}
564 @1710340278.453
564 @1710340308.453
564 @1710340338.453
564 @1710340368.453
564 @1710340398.453
564 @1710340428.453
564 @1710340458.453
564 @1710340488.453
564 @1710340518.453
 
sdp:time_to_restore:by_issue{app="/pelorus-api/", container="github-failure-exporter", endpoint="http", instance="10.129.0.32:8080", issue_number="23", job="github-failure-exporter", namespace="pelorus", pod="github-failure-exporter-1-lnssw", service="github-failure-exporter"}
525 @1710355968.453
525 @1710355998.453
525 @1710356028.453
525 @1710356058.453
525 @1710356088.453
525 @1710356118.453
525 @1710356148.453

These two queries should yield the same number of results, but they do not.

Steps to reproduce

  1. Install pelorus with github-failure-exporter
  2. Open and close a bunch of github issues

Current behavior

See above

Expected behavior

See Above

Code of Conduct

  • I agree to follow Pelorus's Code of Conduct
@etsauer etsauer added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

1 participant