Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KPI Request: Percent of Persistent Failures and Time Spent Fixing Failures #116

Closed
1 task done
chauhankaranraj opened this issue Feb 8, 2021 · 4 comments · Fixed by #122
Closed
1 task done
Assignees
Projects

Comments

@chauhankaranraj
Copy link
Member

chauhankaranraj commented Feb 8, 2021

As an OpenShift manager, I would like to see how many tests fail consistently and how much time is devoted to fixing failing tests, so that I can track our engineering efficiency over time.

On a job level, this analysis of failures can help understand the time and engineering resources spent on fixing issues (assuming more consecutive failures means more resources spent). A decrease in consecutive failures would suggest an improvement in the speed and efficiency of builds. A comparison can also be made between different jobs within the same dashboard to evaluate resources allocation.

How to collect metric
The testgrid platform records the result of each test run as one of the values specified in this doc. It also stores the timestamp at which each test was run. The metrics relevant to this issue can be calculated by finding cells values where “12”s are seen repeatedly, and looking at the corresponding timestamps.

Acceptance criteria

  • small notebook in notebooks/data-sources/TestGrid/metrics/ that collects this metric and stores it in ceph as a parquet.
@MichaelClifford
Copy link
Member

This notebook and existing function for finding consecutive failures might be helpful for this metrics.

image

@chauhankaranraj
Copy link
Member Author

This notebook and existing function for finding consecutive failures might be helpful for this metrics.

image

Awesome, then I think we could just expand this function to calculate metrics like the average length (# cells), time difference, percent occurrence, etc :)

@MichaelClifford MichaelClifford added this to New in AI-4-CI via automation Feb 10, 2021
@MichaelClifford MichaelClifford moved this from New to In Progress in AI-4-CI Feb 10, 2021
@chauhankaranraj
Copy link
Member Author

Awesome, then I think we could just expand this function to calculate metrics like the average length (# cells), time difference, percent occurrence, etc :)

Turns out for many of the metrics @Shreyanand and I wanted to calculate, we can get away without actually unrolling the status dict. So since this function requires unrolled input, I think it won't be used in the notebook 😞

@MichaelClifford
Copy link
Member

@chauhankaranraj that's great! not needing to unroll the data is probably better wherever possible

AI-4-CI automation moved this from In Progress to Done Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
AI-4-CI
  
Done
Development

Successfully merging a pull request may close this issue.

3 participants