# Fine-Tuning GPT3 model for Github data

In this project we will create a model which learns about the codebase from a given organization. In the *TODO* list we have:


1.   Access credentials
2.   Get data from Github
3.   Prepare data for fine-tuning
4.   Train the model
5.   Test the model

The Youtube tutorial is available [here](https://).

References:


1.   [OpenAI Finetuning Docs](https://beta.openai.com/docs/guides/fine-tuning)
2.   [PyGithub Repo](https://github.com/PyGithub/PyGithub)
3.   [PyGithub Docs](https://pygithub.readthedocs.io/en/latest/introduction.html)
4.   [Github Docs](https://docs.github.com/en/rest/overview)
5.   [Github REST Tutorial](https://www.softwaretestinghelp.com/github-rest-api-tutorial/)
6.   [Tuning Dataset](https://github.com/matiassingers/awesome-readme)



#Setup

In [None]:
PATH = r'<DIR_DRIVE>'

## Install libraries

In [None]:
!pip install PyGithub
!pip install python-dotenv
!pip install --upgrade jsonlines
!pip install --upgrade openai

## Mount Drive

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

*load environment variables*

In [None]:
from dotenv import load_dotenv

load_dotenv(PATH+'.env')

## Import libraries

In [None]:
from github import Github
import os
import jsonlines

#Get data from Github repositories

In [None]:
g = Github(os.getenv("<YOUR_GITHUB_ACCESS_TOKEN>"))

In [None]:
repos = g.get_organization("<ORG_NAME>").get_repos()

## Get readme content and associate it with repo

In [None]:
jsons = []

for repo in repos:
  try:
    file_content = repo.get_contents("README.md")
    jsons.append({"prompt": repo.full_name, "completion": file_content.decoded_content.decode()})
  except Exception:
    pass

In [None]:
print(len(jsons))

52


In [None]:
for j in jsons:
  print(j)

## Write data in JSONL file

In [None]:
with jsonlines.open(PATH+'test.jsonl', 'w') as writer:
  writer.write_all(jsons)

# Tune the OpenAI model

## Prepare data gathered from Github

In [None]:
!openai tools fine_tunes.prepare_data -f '<PATH_TO_test.jsonl>'

## Create de new model

In [None]:
!openai api fine_tunes.create -t '<PATH_TO_test_prepared.jsonl>" --no_packing --batch_size 1

## Test the new model

In [None]:
import openai
openai.Completion.create(
    model='<YOUR_MODEL_NAME>',
    prompt='<YOUR_PROMPT>')
