This project classifies GitHub issues by domains and subdomains using a fine-tuned GPT model. The script processes data from an SQLite database, prepares training data, fine-tunes a GPT model, and then classifies open issues in a specified GitHub repository.
Refer to the requirements.txt file and ensure that each library is installed as well as the appropriate version
To run, change the working directory to the location of the python script and run the following command: python your_script.py --config path/to/conf.json --domains path/to/Domains.json --db path/to/main.db --method (LLM or RF)
Example: python your_script.py --config config/conf.json --domains config/Domains.json --db data/main.db --method RF
The configuration json includes data necessary to run such as the desired repo or access tokens. Example:
{
"github_token": "your_github_token",
"repo_owner": "repository_owner",
"repo_name": "repository_name",
"openAI_key": "your_openai_api_key"
}
The Domains json file includes domains, subdomains, and descriptions. This should be downloaded from the repo to ensure you work with a good set of domains.
The database file should be the outputted file from the engine code containing Issue data
The method argument controls what modle will be trained (largle language or random forest). LLM involves fine tuning a OpenAI GPT model, so an Open AI key required for this method.