Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KPI Request: Test Runtime or time to Test #117

Closed
1 task
oindrillac opened this issue Feb 8, 2021 · 6 comments · Fixed by #163
Closed
1 task

KPI Request: Test Runtime or time to Test #117

oindrillac opened this issue Feb 8, 2021 · 6 comments · Fixed by #163
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. kind/question Categorizes issue or PR as a support question.
Projects

Comments

@oindrillac
Copy link
Member

oindrillac commented Feb 8, 2021

As an OpenShift product manager, I would like to see the time to test or the total runtime for each test as it would help filter and observe tests with longer than expected runtimes.

By measuring this metric, we can observe the trend of test runtimes and check if the execution time of the tests exceed the specified values. If the execution of the test suite takes a long time, we may wish to optimize our test code or track down tests that are taking too long. This metric can also be used to draw a correlation to the tests turning out to be flaky.

How to collect metric

In the TestGrid_EDA notebook, each test is associated with a last_run_timestamp and each run has fail_timestamp as well as pass_timestamp.

Is there some available documentation which explains the meaning of those labels? More specifically

  • Does last_run_timestamp mean the timestamp of the final run?
  • Is there a label capturing the timestamp of the start of each passing and failing run?

Acceptance Criteria

  • small notebook in notebooks/data-sources/TestGrid/metrics/ that collects this metric and stores it in ceph as a parquet.

cc: @hemajv @MichaelClifford

@MichaelClifford
Copy link
Member

@oindrillac where is fail_timestamp? Probably I am blind, but I don't see it in this notebook.

last_run_timestamp is probably the timestamp associated with the start time for the most recent run of a specific test on a specific grid. I'm not sure if that data will answer your question.

We can see in the Prow data in the image bellow that for each test run there is a runtime associated with it.

image

The question is: is this info present in the testgrid data? Or do we have to step into the Prow data to get it (something I would like to avoid for this current round of work). There are a number of meta-date fields for each test run that we have not yet fully explored, and perhaps the runtime value can be found here somewhere

image

@hemajv
Copy link
Collaborator

hemajv commented Feb 8, 2021

@MichaelClifford When we run this cell, we see that one of the columns in the resulting data frame is fail_timestamp:

Screenshot from 2021-02-08 15-16-13

@MichaelClifford MichaelClifford added this to New in AI-4-CI via automation Feb 10, 2021
@MichaelClifford MichaelClifford moved this from New to In Progress in AI-4-CI Feb 10, 2021
@MichaelClifford
Copy link
Member

@oindrillac @hemajv not sure where you are at on this issue. But here is a quick answer to what is meant by fail_timestamp although I'm not sure if it will answer the question you are looking to answer here.

If you go to:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-informing

Click "Show Alerts" on any of the grid names, it will display some highlighted tests and includes a field for "First Failed" and "Last Passed" I'm pretty sure these are the values displayed in fail_timestamp and pass_timestamp

image

Since these values are only for the the "alert" tests I don't think it is sufficient to make a claim about the time to test or the total runtime for each test for the whole platform.

Where you able to find the test runtime any other way?

@MichaelClifford MichaelClifford moved this from In Progress to Backlog in AI-4-CI Feb 11, 2021
@sesheta sesheta added kind/question Categorizes issue or PR as a support question. kind/feature Categorizes issue or PR as related to a new feature. and removed question labels Feb 13, 2021
@MichaelClifford MichaelClifford moved this from Backlog to To Do in AI-4-CI Feb 25, 2021
@MichaelClifford MichaelClifford moved this from To Do to New in AI-4-CI Feb 25, 2021
@MichaelClifford MichaelClifford moved this from New to To Do in AI-4-CI Feb 25, 2021
@MichaelClifford MichaelClifford moved this from To Do to In Progress in AI-4-CI Mar 4, 2021
@oindrillac
Copy link
Member Author

I think, fairly simple way to do this without looking into the Prow data, would be to leverage the inbuilt Graph within TestGrid for each Test in the job.

image

These Graphs which can be toggled on and off by going to Graph>test-duration-metrics capture the time elapsed while running each test.

eg: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.2-informing#release-openshift-ocp-installer-e2e-azure-4.2&graph-metrics=test-duration-minutes

The time elapsed for the test(time-duration-minutes) is plotted over time and captures values such as 24.76 minutes, time that it took to run the test.

image

To access the json for this, we can get it from https://testgrid.k8s.io/redhat-openshift-ocp-release-4.2-informing/table?%20\%20&show-stale-tests=&tab=release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.1-to-4.2&graph-metrics=test-duration-minutes

which contains the graphs metadata like "graphs":[{"metric":["test-duration-minutes"],"values":[[24.766666666666666,37.516666666666666,31.05,41.85,32.65,28.216666666666665,30.3,29.333333333333332,36.75,27.133333333333333,40.36666666666667,35.6,27,42.96666666666667,55.733333333333334,31.85,28.016666666666666]]}].

This should be fairly straightforward to plot in a notebook.

@hemajv
Copy link
Collaborator

hemajv commented Mar 10, 2021

I agree, by extracting the test-duration-minutes metric we can calculate other relevant information such as mean test duration time and how it can be related to the overall job.

@MichaelClifford
Copy link
Member

@oindrillac good catch on the &graph-metrics=test-duration-minutes

AI-4-CI automation moved this from In Progress to Done (03/25/21 - 04/08/21) Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. kind/question Categorizes issue or PR as a support question.
Projects
AI-4-CI
  
Done (03/25/21 - 04/08/21)
Development

Successfully merging a pull request may close this issue.

4 participants