# Time to Merge Prediction Inference Service

In the previous notebook, we explored some basic machine learning models for predicting time to merge of a PR. We then deployed the model with the highest f1-score as a service using Seldon. The purpose of this notebook is to check whether this service is running as intended, and more specifically to ensure that the model performance is what we expect it to be. So here, we will use the test set from the aforementioned notebook as the query payload for the service, and then verify that the return values are the same as those obtained during training/testing locally.

In [1]:
import os
import sys
import gzip
import json
import boto3
import requests
from dotenv import load_dotenv, find_dotenv

import numpy as np
import pandas as pd

from sklearn.metrics import classification_report

metric_template_path = "../../data-sources/TestGrid/metrics"
if metric_template_path not in sys.path:
    sys.path.insert(1, metric_template_path)

from ipynb.fs.defs.metric_template import (  # noqa: E402
    CephCommunication,
)

load_dotenv(find_dotenv())

True

In [2]:
## CEPH Bucket variables
## Create a .env file on your local with the correct configs,
s3_endpoint_url = os.getenv("S3_ENDPOINT")
s3_access_key = os.getenv("S3_ACCESS_KEY")
s3_secret_key = os.getenv("S3_SECRET_KEY")
s3_bucket = os.getenv("S3_BUCKET")
s3_path = "github"
REMOTE = os.getenv("REMOTE")
INPUT_DATA_PATH = "../../../data/processed/github"

In [3]:
# read raw dataset
data_path = "../../data/raw/GitHub/PullRequest.json.gz"
OUTPUT_DATA_PATH = "../../data/processed/github"

if REMOTE:
    print("getting dataset from ceph")
    s3 = boto3.resource(
        "s3",
        endpoint_url=s3_endpoint_url,
        aws_access_key_id=s3_access_key,
        aws_secret_access_key=s3_secret_key,
    )
    content = s3.Object(s3_bucket, "thoth/mi/openshift/origin/PullRequest.json")
    file = content.get()["Body"].read().decode("utf-8")
    prs = json.loads(file)

    with gzip.open(data_path, "wb") as out_file:
        out_file.write(json.dumps(prs).encode("utf-8"))

else:
    print("getting dataset from local")
    with gzip.open(data_path, "r") as f:
        prs = json.loads(f.read().decode("utf-8"))

pr_df = pd.DataFrame(prs).T

getting dataset from ceph


In [4]:
# github pr dataset collected using thoth's mi-scheduler
pr_df.head()

Unnamed: 0,title,body,size,created_by,created_at,closed_at,closed_by,merged_at,commits_number,changed_files_number,interactions,reviews,labels,commits,changed_files
26100,bug 1949306: add e2e test to block usage of re...,Right now this is set to flake in CI and provi...,XXL,deads2k,1619104429,1619253940.0,openshift-merge-robot,1619253940.0,2,135,"{'deads2k': 1, 'openshift-ci-robot': 307, 'stt...",{},"[approved, bugzilla/severity-urgent, bugzilla/...","[13b0e99d5f35d85998af9f07eb0c5b7d6fcc0dd0, 0c3...","[go.mod, go.sum, test/extended/apiserver/api_r..."
26099,bug 1951705: allow HighOverallControlPlaneCPU ...,adding the alert in https://github.com/openshi...,XS,deads2k,1619091511,1619113470.0,openshift-merge-robot,1619113470.0,1,1,"{'openshift-ci-robot': 508, 'deads2k': 43, 'op...",{},"[approved, bugzilla/severity-high, bugzilla/va...",[7bc10c83eea29010f3b735c41847d43a992f606c],[test/extended/prometheus/prometheus.go]
26098,"refactor TestMultipleImageChangeBuildTriggers,...",So reworking this test was motivated by 2 thin...,XXL,gabemontero,1619037132,,gabemontero,,2,118,"{'gabemontero': 194, 'openshift-ci-robot': 89,...",{},"[do-not-merge/hold, vendor-update]","[2d329e88159f7af1594054ce467cfdb3c320b612, e7e...","[go.mod, go.sum, test/extended/images/imagecha..."
26097,"WIP: monitor: Move ""crashlooping pods"" test to...","Crashlooping, long pulls, and other container ...",L,smarterclayton,1619035732,,,,1,5,"{'openshift-ci-robot': 68, 'smarterclayton': 1...",{},"[approved, do-not-merge/work-in-progress]",[95756458436b0e79a7ea779c6b0f6de2e4ee5a1f],"[pkg/monitor/pod.go, pkg/synthetictests/event_..."
26096,Bug 1949050: fix images.sh script,This is a followup to https://github.com/opens...,S,soltysh,1618993979,1619069721.0,openshift-merge-robot,1619069721.0,1,2,"{'openshift-ci-robot': 279, 'soltysh': 2, 'ope...","{'641526396': {'author': 'adambkaplan', 'words...","[approved, bugzilla/severity-high, bugzilla/va...",[06830a927241a44ae24e9a4b5b2ac75f666d36bc],"[test/extended/testdata/bindata.go, test/exten..."


In [5]:
# read processed and split data created for train/test in the model training notebook
if REMOTE:
    cc = CephCommunication(s3_endpoint_url, s3_access_key, s3_secret_key, s3_bucket)
    X_test = cc.read_from_ceph(s3_path, "X_test.parquet")
    y_test = cc.read_from_ceph(s3_path, "y_test.parquet")

else:
    print(
        "The X_test.parquet and y_test.parquet files are not included in the ocp-ci-analysis github repo."
    )
    print(
        "Please set REMOTE=1 in the .env file and read this data from the S3 bucket instead."
    )

In [6]:
# endpoint from the seldon deployment
base_url = (
    "http://github-pr-ttm-ds-ml-workflows-ws.apps.smaug.na.operate-first.cloud/predict"
)

In [7]:
X_test

Unnamed: 0,size,is_reviewer,is_approver,created_at_day,created_at_month,created_at_weekday,created_at_hour,change_in_.github,change_in_docs,change_in_pkg,...,title_wordcount_fix,title_wordcount_haproxy,title_wordcount_oc,title_wordcount_publishing,title_wordcount_revert,title_wordcount_router,title_wordcount_sh,title_wordcount_staging,title_wordcount_support,title_wordcount_travis
22607,1.0,False,False,18,4,3,11,0,0,1,...,0,0,0,0,0,0,0,0,0,0
12013,1.0,False,False,23,11,2,12,0,1,1,...,1,0,0,0,0,0,0,0,0,0
16511,5.0,True,True,22,9,4,10,0,0,1,...,0,0,0,0,0,0,0,0,0,0
25449,1.0,False,False,26,8,2,10,0,0,0,...,0,0,0,0,0,0,0,0,0,0
17178,0.0,False,False,3,11,4,14,0,0,1,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16194,0.0,False,False,7,9,3,6,0,0,0,...,0,0,0,0,0,0,0,0,0,0
20466,5.0,True,True,30,7,0,8,0,1,1,...,0,0,1,0,0,0,0,0,1,0
14993,5.0,True,True,30,6,4,15,0,0,1,...,1,0,0,0,0,0,0,0,0,0
8299,0.0,False,False,30,3,2,9,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [8]:
# lets extract the raw PR data corresponding to the PRs used in the test set
sample_payload = pr_df.reindex(X_test.index)

In [9]:
# convert the dataframe into a numpy array and then to a list (required by seldon)
data = {
    "data": {
        "names": sample_payload.columns.tolist(),
        "ndarray": sample_payload.to_numpy().tolist(),
    }
}

# create the query payload
json_data = json.dumps(data)
headers = {"content-Type": "application/json"}

In [10]:
# query our inference service
response = requests.post(base_url, data=json_data, headers=headers)
response

<Response [200]>

In [11]:
# what are the names of the prediction classes
json_response = response.json()
json_response["data"]["names"]

['Class_0',
 'Class_1',
 'Class_2',
 'Class_3',
 'Class_4',
 'Class_5',
 'Class_6',
 'Class_7',
 'Class_8',
 'Class_9']

In [12]:
# probabality estimates for each of the class for a sample PR
json_response["data"]["ndarray"][0][:10]

[0.1, 0.02, 0.085, 0.06, 0.085, 0.02, 0.265, 0.14, 0.055, 0.17]

In [13]:
# get predicted classes from probabilities for each PR
preds = np.argmax(json_response["data"]["ndarray"], axis=1)
preds[:10]

array([6, 3, 8, 9, 6, 0, 7, 0, 8, 0])

In [14]:
# evaluate results
print(classification_report(y_test, preds))

              precision    recall  f1-score   support

           0       0.24      0.37      0.29       257
           1       0.21      0.06      0.09       214
           2       0.18      0.15      0.17       327
           3       0.13      0.15      0.14       275
           4       0.17      0.08      0.11       263
           5       0.14      0.12      0.13       234
           6       0.20      0.26      0.22       284
           7       0.14      0.13      0.13       281
           8       0.15      0.18      0.16       294
           9       0.18      0.22      0.20       277

    accuracy                           0.17      2706
   macro avg       0.17      0.17      0.16      2706
weighted avg       0.17      0.17      0.17      2706



# Conclusion

This notebook shows how raw PR data can be sent to the deployed Seldon service to get time-to-merge predictions. Additionally, we see that the evaluation scores in the classification report match the ones we saw in the training notebook. So, great, looks like our inference service and model are working as expected, and are ready to predict some times to merge for GitHub PRs! 