# Quantify Tests

This notebook is an extension to the [number_of_flakes](number_of_flakes.ipynb) notebook. In this notebook, the key perfomance indicators that we would like to create greater visbility into and track over time is the percent of tests that passed/failed. This can help track and measure the effectiveness and success of our testing process.

By measuring this metric, we can observe the trend of test failures over time and a decrease in the percent of failures over time (or increase in percent of test passes), should correlate to improved test efficiencies, enhanced testing process and error free releases.  In this notebook, we derive the following metrics from the TestGrid dataset:

* total number of test cases
* number of test cases passed
* number of test cases failed
* percent of tests that pass
* percent of tests that fail

In [1]:
import gzip
import json
import os
import sys
import pandas as pd

sys.path.append("../../..")

module_path_1 = os.path.abspath(os.path.join("../../../data-sources/TestGrid"))
if module_path_1 not in sys.path:
    sys.path.append(module_path_1)

from ipynb.fs.defs.number_of_flakes import (  # noqa: E402
    testgrid_labelwise_encoding,
)  # noqa: E402

In [2]:
# Load test file
with gzip.open("../../../../data/raw/testgrid_810.json.gz", "rb") as read_file:
    testgrid_data = json.load(read_file)

In [3]:
failures_list = testgrid_labelwise_encoding(testgrid_data, 12)

In [4]:
len(failures_list)

19483548

In [5]:
failures_list[0]

(datetime.datetime(2020, 10, 8, 16, 48, 5),
 '"redhat-openshift-informing"',
 'release-openshift-okd-installer-e2e-aws-upgrade',
 'Application behind service load balancer with PDB is not disrupted',
 False)

In [6]:
# Convert to dataframe
failures_df = pd.DataFrame(
    failures_list, columns=["timestamp", "tab", "job", "test", "failure"]
)
failures_df.head()

Unnamed: 0,timestamp,tab,job,test,failure
0,2020-10-08 16:48:05,"""redhat-openshift-informing""",release-openshift-okd-installer-e2e-aws-upgrade,Application behind service load balancer with ...,False
1,2020-10-08 15:12:01,"""redhat-openshift-informing""",release-openshift-okd-installer-e2e-aws-upgrade,Application behind service load balancer with ...,True
2,2020-10-08 10:18:13,"""redhat-openshift-informing""",release-openshift-okd-installer-e2e-aws-upgrade,Application behind service load balancer with ...,False
3,2020-10-08 07:15:28,"""redhat-openshift-informing""",release-openshift-okd-installer-e2e-aws-upgrade,Application behind service load balancer with ...,False
4,2020-10-08 04:27:53,"""redhat-openshift-informing""",release-openshift-okd-installer-e2e-aws-upgrade,Application behind service load balancer with ...,False


In [7]:
# saving only the first 1000000 out of 19 million rows due to pvc limits.
# 250mb = 1 million --> 4750 mb = 19 million
failures_df.head(1000000).to_csv(
    "../../../../data/processed/failures.csv",
    header=False,
)

In [8]:
passing_list = testgrid_labelwise_encoding(testgrid_data, 1)

In [9]:
len(passing_list)

19483548

In [10]:
# Convert to dataframe
passing_df = pd.DataFrame(
    passing_list, columns=["timestamp", "tab", "job", "test", "passing"]
)
passing_df.head()

Unnamed: 0,timestamp,tab,job,test,passing
0,2020-10-08 16:48:05,"""redhat-openshift-informing""",release-openshift-okd-installer-e2e-aws-upgrade,Application behind service load balancer with ...,False
1,2020-10-08 15:12:01,"""redhat-openshift-informing""",release-openshift-okd-installer-e2e-aws-upgrade,Application behind service load balancer with ...,False
2,2020-10-08 10:18:13,"""redhat-openshift-informing""",release-openshift-okd-installer-e2e-aws-upgrade,Application behind service load balancer with ...,False
3,2020-10-08 07:15:28,"""redhat-openshift-informing""",release-openshift-okd-installer-e2e-aws-upgrade,Application behind service load balancer with ...,False
4,2020-10-08 04:27:53,"""redhat-openshift-informing""",release-openshift-okd-installer-e2e-aws-upgrade,Application behind service load balancer with ...,False


In [11]:
# saving only the first 1000000 out of 19 million rows due to pvc limits.
# 250mb = 1 million --> 4750 mb = 19 million
passing_df.head(1000000).to_csv(
    "../../../../data/processed/pass.csv",
    header=False,
)

In [12]:
# Metrics
no_tests = passing_df.head(1000000).test.count()
print("Total number of tests: %i" % (no_tests))
no_failures = failures_df.head(1000000).failure.sum()
print("Total number of failing tests: %i" % (no_failures))
test_failures_percentage = (
    (
        failures_df.head(1000000).failure.sum()
        / failures_df.head(1000000).test.count()
    )
) * 100
print("Test failure percentage: %f" % (test_failures_percentage))
no_pass = passing_df.head(1000000).passing.sum()
print("Total number of passing tests: %i" % (no_pass))
test_pass_percentage = (
    (
        passing_df.head(1000000).passing.sum()
        / passing_df.head(1000000).passing.count()
    )
) * 100
print("Test pass percentage: %f" % (test_pass_percentage))

Total number of tests: 1000000
Total number of failing tests: 3989
Test failure percentage: 0.398900
Total number of passing tests: 704558
Test pass percentage: 70.455800
