<a href="https://colab.research.google.com/github/ahwang16/gpt3-chatbot/blob/master/Finetune_GPT_3_for_Persona_Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-Tuning GPT-3 for a Persona Chatbot

## Install and Import Necessary Packages

In [None]:
!pip install openai

In [None]:
!pip install wandb

In [7]:
import json
import openai
import os
from getpass import getpass

## Load Data
In the left toolbar, click on the "Files" icon and then "Mount Drive."

Add this [folder](https://drive.google.com/drive/folders/1hpNqap_zPdWCYAMMz7L25w5kI7m2w9MP?usp=sharing) to your Drive and then update the path below.

In [3]:
%cd /content/drive/MyDrive/1 Penn Academics/2021-2022/CIS 810/Implementation/Data

/content/drive/MyDrive/1 Penn Academics/2021-2022/CIS 810/Implementation/Data


In [4]:
# The path here should match the path to your Data folder above.
!pwd

/content/drive/MyDrive/1 Penn Academics/2021-2022/CIS 810/Implementation/Data


In [None]:
light_dialogue_path = "light_dialogue_data_train.jsonl"

with open(light_dialogue_path, "r") as infile:
  light_dialogue = json.load(infile)

## Create Fine-Tuning Data

In [None]:
def create_prompt(idx):
  agents = light_dialogue[idx]["agents"]
  prompt = "The following is a conversation between {} and {}.\n".format(
      agents[0]["name"], agents[1]["name"]
  )

  return prompt


def create_completion(idx):
  character = light_dialogue[idx]["character"]
  speech = light_dialogue[idx]["speech"]

  dialogue = ""

  for i in range(len(speech)):
    dialogue += "{}: {}\n".format(character[i], speech[i])

  dialogue += "###\n"

  return dialogue


def create_finetuning_data():
  finetuning_data = []

  for i in range(len(light_dialogue)):
    if len(light_dialogue[i]["speech"]) > 1:
      data = {}
      data["prompt"] = create_prompt(i)
      data["completion"] = create_completion(i)
      finetuning_data.append(data)

  print(len(finetuning_data))
  for example in finetuning_data[:10]:
    print(example["prompt"])
    print(example["completion"])

  with open("light_dialogue_finetuning.jsonl", "w") as outfile:
    for data in finetuning_data:
      outfile.write(json.dumps(data) + "\n")

In [None]:
create_finetuning_data()

9807
The following is a conversation between court wizard and soldier.

court wizard: A quiet night this evening...
soldier: yes it is
court wizard: Have any else come up this eve? I had hoped for a quiet night to examine the stars
soldier: Yes, a few came through, but it is a cold night for me, I am used to warmer weather
court wizard: Well, you are but a common soldier.  No doubt you are used to such a lot.  Thankfully I have my spells to keep me warm.
soldier: I am a soldier doing my job
court wizard: Yes... well... Very well then.  See that you do!  No slacking off while your betters are about.
soldier: No sir
court wizard: When, for example, was this horn last tested?  It looks dented.  How can we be sure it will work?
soldier: A year ago, Test it out or cause a need to use it
court wizard: Mayhap I will speak to the king about such lackness.  Or perhaps I can sell him a spell that will serve just as well.
soldier: Good idea, I agree, go do that
court wizard: Get off of me, you fo

## Fine-Tune the Model with the OpenAI API

### First, enter your OpenAI API key.
You can find your key [here](https://beta.openai.com/account/api-keys). You will be prompted to enter your API key when you run the next cell.

In [9]:
print('Enter OpenAI API key:')
openai.api_key = getpass()

os.environ['OPENAI_API_KEY']=openai.api_key

Enter OpenAI API key:
··········


### Next, run the fine-tuning command.
In our case, we are specifying `light_dialogue_finetuning.jsonl` as the fine-tuning data file and `curie` as the base model we want to fine-tune.

In [None]:
!openai api fine_tunes.create -t light_dialogue_finetuning.jsonl -m curie

Upload progress:   0% 0.00/12.5M [00:00<?, ?it/s]Upload progress: 100% 12.5M/12.5M [00:00<00:00, 20.2Git/s]
Uploaded file from light_dialogue_finetuning.jsonl: file-EGwZldxTV3nryjY0ou8PFESi
Created fine-tune: ft-REphHUyNJeAIUCSdHnI82xzW
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2022-07-31 21:42:39] Created fine-tune: ft-REphHUyNJeAIUCSdHnI82xzW
[2022-07-31 21:42:49] Fine-tune costs $37.56
[2022-07-31 21:42:50] Fine-tune enqueued. Queue number: 0
[2022-07-31 21:42:51] Fine-tune started

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i ft-REphHUyNJeAIUCSdHnI82xzW



### Optional: Restart streaming.
Sometimes, the streaming output cuts off before fine-tuning is done. You can restart streaming by running the following command with the fine-tuning ID. All of this information is included in the output of the above fine-tuning command, too.

In [10]:
!openai api fine_tunes.follow -i ft-REphHUyNJeAIUCSdHnI82xzW

[2022-07-31 21:42:39] Created fine-tune: ft-REphHUyNJeAIUCSdHnI82xzW
[2022-07-31 21:42:49] Fine-tune costs $37.56
[2022-07-31 21:42:50] Fine-tune enqueued. Queue number: 0
[2022-07-31 21:42:51] Fine-tune started
[2022-07-31 21:54:06] Completed epoch 1/4
[2022-07-31 22:04:25] Completed epoch 2/4
[2022-07-31 22:14:46] Completed epoch 3/4
[2022-07-31 22:25:05] Completed epoch 4/4
[2022-07-31 22:25:29] Uploaded model: curie:ft-ccb-lab-members-2022-07-31-22-25-29
[2022-07-31 22:25:30] Uploaded result file: file-UPDNOfeEyr261DngZLU9o5xN
[2022-07-31 22:25:30] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m curie:ft-ccb-lab-members-2022-07-31-22-25-29 -p <YOUR_PROMPT>


## You're done!
Great, you're now done fine-tuning GPT-3 for a persona chatbot! Head over to the [OpenAI Playground](https://beta.openai.com/playground) to try your new model.