This repository contains a tool to train the Github time to merge model. This model can be trained on any repository and be used to predict the time to merge of new pull requests. To learn more about this model, please see here.
To use the Github Action tool for your own repository and train the model, you can follow these steps:
-
S3 bucket credentials: You will need an S3 bucket to store the data and the model generated as a apart of the training process. You can pass S3 bucket credentials in 2 ways. You can either set them up as Github Action Secrets or pass them as a payload from your http request.
-
Personal Acess Token: You need a personal access token to trigger the workflow and download github data. You can generate that by going here
Once you have the pre-requisites in place, add your S3 credentials to your repository action secrets if they are private and you dont want to pass them on through the http request.
To do that, go to repository "Settings" -> "Security" -> "Secrets" -> "Actions" -> "New Repository Secret" and add secrets for S3_BUCKET
, S3_ENDPOINT_URL
, AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, GITHUB_TOKEN
, CEPH_BUCKET_PREFIX
, REPO
and ORG
.
We have created a Github Action Workflow which carries out the model training process for the Github Time to Merge model. There are two steps and ways to use this github action :
- Training Mode :
For a every new repository, this is a pre-requisite. We need to first train the model on the previously made pull requests. To run the action in training mode we will need to specify the MODE
as 1
. You will need to add train-ttm.yaml
file to .github/worklows/
like this.
This mode will initiate the model training process by following the steps of data collection, feature engineering, model training on the PR data available and finally running the inference i.e. predicting the time to merge for the latest PR on the repository.
(NOTE : This workflow will fail if there are no PRs on the repository)
You can also initiate a new trigger by going to actions for your repository like here:
Go select - Run Time to Merge Model Training
and go to Run workflow
on upper right and run it like such :
This will initiate the model training and inference action.
- Inference Mode :
Similar to the train-ttm.yaml
file, you can add another file called predict-ttm.yaml
file to .github/worklows/
like this. This file has MODE
as 0
which would enable just inference on the new incoming Pull Request and add a comment to the pull request specifying the approximate time it will take to be merged.
To view your running workflow from the Github UI, go to "Actions" and click on the workflow run :
Click on pipeline
to see logs and errors :
You can also use train this model on your repository using an alternate approach without adding the workflow file to your repository. Here are the steps to follow :
-
Fork this repository and to your fork add the secrets as mentioned here. Make sure to mention the
REPO
andORG
for the repository you want to run TTM on. -
Go to
Actions
for your fork and select therun in container
workflow to train the model.
- You can also interact with this tool by POST request to Github API endpoint. From your terminal, clone your repository and run
bash run-ttm.sh
. This will run the training workflow and train the TTM model on the repo and org of your choice.
- Enter your github username
- Enter the repository you want to train the model on eg:
community
- Enter the organization the repo belongs to eg:
operate-first
- Enter the personal access token generated in the previous step eg:
ghp_xyzxyzxyz
If you are passing your S3 credentials here
- Enter your bucket name
- Enter your endpoint url
- Enter your Access Key
- Enter your Secret Key