KPI Request: Percent of Persistent Failures and Time Spent Fixing Failures #116

chauhankaranraj · 2021-02-08T15:44:17Z

As an OpenShift manager, I would like to see how many tests fail consistently and how much time is devoted to fixing failing tests, so that I can track our engineering efficiency over time.

On a job level, this analysis of failures can help understand the time and engineering resources spent on fixing issues (assuming more consecutive failures means more resources spent). A decrease in consecutive failures would suggest an improvement in the speed and efficiency of builds. A comparison can also be made between different jobs within the same dashboard to evaluate resources allocation.

How to collect metric
The testgrid platform records the result of each test run as one of the values specified in this doc. It also stores the timestamp at which each test was run. The metrics relevant to this issue can be calculated by finding cells values where “12”s are seen repeatedly, and looking at the corresponding timestamps.

Acceptance criteria

small notebook in notebooks/data-sources/TestGrid/metrics/ that collects this metric and stores it in ceph as a parquet.

The text was updated successfully, but these errors were encountered:

MichaelClifford · 2021-02-08T15:52:55Z

This notebook and existing function for finding consecutive failures might be helpful for this metrics.

chauhankaranraj · 2021-02-08T16:15:27Z

This notebook and existing function for finding consecutive failures might be helpful for this metrics.

Awesome, then I think we could just expand this function to calculate metrics like the average length (# cells), time difference, percent occurrence, etc :)

chauhankaranraj · 2021-02-10T21:48:15Z

Awesome, then I think we could just expand this function to calculate metrics like the average length (# cells), time difference, percent occurrence, etc :)

Turns out for many of the metrics @Shreyanand and I wanted to calculate, we can get away without actually unrolling the status dict. So since this function requires unrolled input, I think it won't be used in the notebook 😞

MichaelClifford · 2021-02-10T23:38:23Z

@chauhankaranraj that's great! not needing to unroll the data is probably better wherever possible

chauhankaranraj assigned Shreyanand and chauhankaranraj Feb 8, 2021

MichaelClifford added this to New in AI-4-CI via automation Feb 10, 2021

MichaelClifford moved this from New to In Progress in AI-4-CI Feb 10, 2021

chauhankaranraj mentioned this issue Feb 10, 2021

KPIs: persistent failures and time to fix #122

Merged

2 tasks

sesheta closed this as completed in #122 Feb 10, 2021

AI-4-CI automation moved this from In Progress to Done Feb 10, 2021

MichaelClifford mentioned this issue Apr 5, 2021

Update persistent_failures_analysis notebook to match template #201

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KPI Request: Percent of Persistent Failures and Time Spent Fixing Failures #116

KPI Request: Percent of Persistent Failures and Time Spent Fixing Failures #116

chauhankaranraj commented Feb 8, 2021 •

edited

MichaelClifford commented Feb 8, 2021

chauhankaranraj commented Feb 8, 2021

chauhankaranraj commented Feb 10, 2021

MichaelClifford commented Feb 10, 2021

KPI Request: Percent of Persistent Failures and Time Spent Fixing Failures #116

KPI Request: Percent of Persistent Failures and Time Spent Fixing Failures #116

Comments

chauhankaranraj commented Feb 8, 2021 • edited

MichaelClifford commented Feb 8, 2021

chauhankaranraj commented Feb 8, 2021

chauhankaranraj commented Feb 10, 2021

MichaelClifford commented Feb 10, 2021

chauhankaranraj commented Feb 8, 2021 •

edited