# Fine-Tuning GPT3
This file covers how we fine-tuned GPT-3 for our specific usecase. It also shows us testing the model after the fine-tune. Our API key has been removed for obvious reasons.

## Setup

In [2]:
import openai
import os

### Importing OPENAI API KEY 
* For importing the openai API key safely, see lik: https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety
* if you are using a conda environment, you can set the api link as an environment variable, see link:  https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#setting-environment-variables  

In [3]:
# importing api file 
openai.api_key = "INSERT KEY"

# OTHER WAY to import api file 
#openai.api_key = os.environ["OPENAI_API_KEY"]

# Finetuning in Python
* For preparing data & finetuning using OPENAI's CLI tool, see link: https://beta.openai.com/docs/guides/fine-tuning  
* For finetuning in Python, see link: https://beta.openai.com/docs/api-reference/engines#finetune and https://betterprogramming.pub/fine-tune-gpt3-for-quality-results-3f91f1ab44ea  


In [12]:
# Upload file
openai.File.create(
  file=open("./data/samf_articles.jsonl"),
  purpose='fine-tune'
)

<File file id=file-WjG5a4gyx5hexEILoas2gHK9 at 0x7fcd8107b770> JSON: {
  "bytes": 1258601,
  "created_at": 1665131164,
  "filename": "file",
  "id": "file-WjG5a4gyx5hexEILoas2gHK9",
  "object": "file",
  "purpose": "fine-tune",
  "status": "uploaded",
  "status_details": null
}

In [13]:
# Creating a FINE_TUNE 
openai.FineTune.create(training_file="file-WjG5a4gyx5hexEILoas2gHK9", n_epochs = 2, model = "davinci", suffix="finetuning_samfonly")

<FineTune fine-tune id=ft-z6L9WsaZgZJrNlY9uWudqeDN at 0x7fcd7f6801d0> JSON: {
  "created_at": 1665131649,
  "events": [
    {
      "created_at": 1665131649,
      "level": "info",
      "message": "Created fine-tune: ft-z6L9WsaZgZJrNlY9uWudqeDN",
      "object": "fine-tune-event"
    }
  ],
  "fine_tuned_model": null,
  "hyperparams": {
    "batch_size": null,
    "learning_rate_multiplier": null,
    "n_epochs": 2,
    "prompt_loss_weight": 0.01
  },
  "id": "ft-z6L9WsaZgZJrNlY9uWudqeDN",
  "model": "davinci",
  "object": "fine-tune",
  "organization_id": "org-L8N5gTHRSsmO0cHZT9riy8tQ",
  "result_files": [],
  "status": "pending",
  "training_files": [
    {
      "bytes": 1258601,
      "created_at": 1665131164,
      "filename": "file",
      "id": "file-WjG5a4gyx5hexEILoas2gHK9",
      "object": "file",
      "purpose": "fine-tune",
      "status": "processed",
      "status_details": null
    }
  ],
  "updated_at": 1665131649,
  "validation_files": []
}

In [14]:
# Get info on fine-tune
openai.FineTune.retrieve(id="ft-z6L9WsaZgZJrNlY9uWudqeDN")


<FineTune fine-tune id=ft-z6L9WsaZgZJrNlY9uWudqeDN at 0x7fcd7f680d60> JSON: {
  "created_at": 1665131649,
  "events": [
    {
      "created_at": 1665131649,
      "level": "info",
      "message": "Created fine-tune: ft-z6L9WsaZgZJrNlY9uWudqeDN",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1665131655,
      "level": "info",
      "message": "Fine-tune costs $27.24",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1665131655,
      "level": "info",
      "message": "Fine-tune enqueued. Queue number: 0",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1665131657,
      "level": "info",
      "message": "Fine-tune started",
      "object": "fine-tune-event"
    }
  ],
  "fine_tuned_model": null,
  "hyperparams": {
    "batch_size": 2,
    "learning_rate_multiplier": 0.2,
    "n_epochs": 2,
    "prompt_loss_weight": 0.01
  },
  "id": "ft-z6L9WsaZgZJrNlY9uWudqeDN",
  "model": "davinci",
  "object": "fine-tune",
  "organization_id

In [15]:
# Retrieve fine-tune
openai.File.retrieve("file-WjG5a4gyx5hexEILoas2gHK9")

<File file id=file-WjG5a4gyx5hexEILoas2gHK9 at 0x7fcd7f683220> JSON: {
  "bytes": 1258601,
  "created_at": 1665131164,
  "filename": "file",
  "id": "file-WjG5a4gyx5hexEILoas2gHK9",
  "object": "file",
  "purpose": "fine-tune",
  "status": "processed",
  "status_details": null
}

# Post-Finetuning 
After you’ve fine-tuned a model, remember that your prompt has to end with the indicator string `. ->` for the model to start generating completions, rather than continuing with the prompt. Make sure to include `stop=["\n"]` so that the generated texts ends at the expected place.

In [19]:
# storing all fine-tuned models in a list
finetunes = openai.FineTune.list()

#getting the fine-tuned model name
fine_tuned_model = finetunes.data[1].fine_tuned_model

print(fine_tuned_model)

davinci:ft-personal:finetuning-samfonly-2022-10-07-09-03-59


In [31]:
# define prompt
prompt = "Regeringen når aftale om forbud mod udbredt fiskemetode. Fra 1. januar skal det være forbudt at fange fisk i Bælthavet omkring Fyn og Langeland ved hjælp af bundtrawl.->"

In [34]:
# create completition object
completion = openai.Completion.create(
    model=fine_tuned_model,
    prompt=prompt, 
    stop = "\n", 
    temperature = 0,
    max_tokens=300,
    best_of=1,
    frequency_penalty=0.2,
    presence_penalty=0.2
    )

In [35]:
#print 
completion.choices[0].text

' Det er nu officielt. I slutningen af måneden nåede regeringen og Enhedslisten, Alternativet og SF til en aftale om forbud mod det fiskemetoden, der kaldes bundtrawl. Det skriver miljø- og fødevareminister Jakob Ellemann-Jensen (V) i en pressemeddelelse. Fra 1. januar 2020 vil det være forbudt at fange fisk i Bælthavet - også kaldet Lillebælt - omkring Fyn og Langeland ved hjælp af bundtrawl. - Vi har med dette forbud taget et stort skridt på vejen mod at sikre os mod de negative effekter ved bundtrawl, siger Jakob Ellemann-Jensen i pressemeddelelsen. Det er selskaberne, der skal administrere forbuddet, men ministeren sikrer sig via aftalen, at der er mulighed for sanktioner, hvis ikke reglerne overholdes. - Hvis de selv administrerer forbuddet, sikrer vi os gennem aftalen, at de bliver sanktioneret meget'