Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KPI Request : Change in success/failure rate #136

Closed
2 of 3 tasks
aakankshaduggal opened this issue Mar 2, 2021 · 5 comments · Fixed by #172
Closed
2 of 3 tasks

KPI Request : Change in success/failure rate #136

aakankshaduggal opened this issue Mar 2, 2021 · 5 comments · Fixed by #172
Assignees
Projects

Comments

@aakankshaduggal
Copy link
Member

aakankshaduggal commented Mar 2, 2021

As an OpenShift product manager, I would like to see the change in success or failure rate, so that I can track and measure the effectiveness and success of builds and deployment.

Acceptance criteria

  • small notebook in notebooks/data-sources/TestGrid/metrics/ that collects this metric and stores it in ceph as a parquet.
  • Calculate build/deployment success and failure rate
  • Calculate the change in success and failure rate over time
@aakankshaduggal aakankshaduggal created this issue from a note in AI-4-CI (New) Mar 2, 2021
@oindrillac oindrillac self-assigned this Mar 2, 2021
@MichaelClifford MichaelClifford moved this from New to In Progress in AI-4-CI Mar 4, 2021
@oindrillac
Copy link
Member

If the goal of this metric is to capture "build" success and failure as per the description in the Potential KPI List, we have a notebook in #148 where we capture build pass and failure from Testgrid data. The current results for that metric is "No data for this metric currently".

As per the Acceptance criteria 3, if we are trying capturing change in the build pass/fail metric over time, that will also have no results, so probably there is no point capturing that over time.

If we see value in it, and decide to extend this issue to capture "change in test pass/failure", we can make changes in test_pass_failures.ipynb notebook to capture how the percent pass or percent fail metric changes over time.

@chauhankaranraj
Copy link
Member

If the goal of this metric is to capture "build" success and failure as per the description in the Potential KPI List, we have a notebook in #148 where we capture build pass and failure from Testgrid data. The current results for that metric is "No data for this metric currently".

I think to figure out how to address this issue, we should clarify what we want to measure as the "build/deployment success rate". That is, clarify whether "build/deployment success" is defined as

  1. having a BUILD_PASS label in the test cell on testgrid, or
  2. having a PASS label for all tests during the current "run" (all cells green across Y-axis)

If it's option 1, then that calculation is already being done in #148.
If it's option 2, then we can add a cell in test_pass_failures.ipynb, in which we apply the required aggregation on passing_df and calculate the KPI.

cc @aakankshaduggal @oindrillac

@MichaelClifford
Copy link
Member

Please see this comment from @hemajv #144. It looks like there is an additional field, "overall_status" that can be pulled from the test grid data that corresponds to whether or not the build was a success. Collecting that additional field should solve our "build" success/ failure issues.

If its still not sufficient, we can follow @chauhankaranraj 's option number 2 above. "having a PASS label for all tests during the current "run" (all cells green across Y-axis)" This may miss some passes, where a couple failures occur, but it should capture only true passes.

WDYT?

@chauhankaranraj
Copy link
Member

It looks like there is an additional field, "overall_status" that can be pulled from the test grid data that corresponds to whether or not the build was a success.

Perfect, we can just use that label then :)

@hemajv
Copy link
Collaborator

hemajv commented Mar 9, 2021

I am a little confused as to whether the overall-status actually corresponds to a "build" being successful or not, assuming that the terms "build" and "job" are not being used interchangeably. (see comment here)

My understanding was that the overall_status was being defined for the entire "job" itself as being passing/failing/flaky etc based on how it is defined here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
AI-4-CI
  
Done
Development

Successfully merging a pull request may close this issue.

6 participants