**IFSC 7325 Final Project NLP Task - J. Dady**

This notebook contains the implementation for the NLP task of my final project. This task utilizes GPT-3 and Python to create human-like text that mimics the seven main characters from the video games Xenoblade Chronicles: Shulk, Reyn, Fiora, Dunban, Sharla, Riki, and Melia. My implementation for this task consists of the following sections:

1.   Importing the dataset and formatting it into JSON files consisting of prompt-completion pairs
2.   Creating a fine-tuned GPT-3 Babbage model that generates text which mimics the characters
3.   Creating another fine-tuned model that generates text which mimics the characters with the more powerful Curie model **(Final Implementation)**
4.   Creating a fine-tuned Babbage model that classifies quotes based on which character would be most likely to say it
5.   My initial testing using the text-davinci-002 model

Section 2 serves as a point of comparison to the final implementation in Section 3 as they use the same training set but use different GPT-3 models. The Babbage model is 1/5 the price of the Curie model, so it should create lesser quality results.

Section 3 serves as the final implementation as the task because it is the most powerful fine-trained model that I created. The Curie model used in this section was a good compromise between price and performance because it is the most powerful model that I could fine tune without costing a significant amount.

Section 4 was created as a way to test how well GPT-3 could accomplish the task in the opposite direction. However, the results file generated for this model was not very useful for validating it. The training data for this model consists of the same prompt-completion pairs as the models in sections 2 and 3, but the pairs and completions are inversed.

Section 5 used the text-davinci-002 model, which I decided to not use after my initial testing because of its large cost and its inability to be fine tuned.









**Data used in this Task**

The dataset used for this NLP task was the story script of the video game Xenoblade Chronicles. All character quotes for the training sets are included in this script. This script is linked here: https://gamefaqs.gamespot.com/wii/960564-xenoblade-chronicles/faqs/70300. This script was named "script.txt" for use with my Python code.

Additionally, the following files were created using Python in Google Colab for the implementation of this task: "lines.json", "inverseLines.json", "lines_prepared.jsonl" (renamed to "generationTraining.jsonl"), "inverseLines_prepared_train.jsonl" (renamed to "classificationTraining.jsonl"), and "inverseLines_prepared_valid.jsonl" (remaned to "classificationValidation.jsonl"). These files will be submitted along with the script file to Blackboard.



**These are the fine tune IDs for the 4 fine-tuned models which I created during the implementation of the task. The implementation of the Ada model is not shown in this notebook as it used a less clean training set.**

Unused Generative Ada model: ft-u2P97XOj8cC9neVPNbqgPfpP

Generative Baggage Model: ft-yKWOw7d5CY1Lf0nG3peebmqh

**Generative Curie Model: ft-DMkFf59JEyZTCFfjXMBQEJnT (Final Implementation)**

Classification Babbage Model: ft-Kzwb684gcKj0dZaxQnp0wISt

**Section 1: Importing the dataset and formatting it into JSON files consisting of prompt-completion pairs**

In [None]:
#Uploads "script.txt".
from google.colab import files
files.upload()

Saving script.txt to script.txt




In [None]:
#This cell uses "script.txt" to create two different JSON files which will serve as the training data for fine tuning the GPT-3 models.
#"lines.json" consists of prompt-completions pairs, where the prompts ask for a quote that a certain chracter would say, and the completions contain the matching quote.
#"inverseLines.json" consists of prompt-completion pairs, where the prompts ask who would say a certain quote, and the completions contain the character who said the quote.
import json

#Initialize temporary variables for creating the JSON files
foundCharacterLine = False
characterNames = ["Shulk", "Reyn", "Fiora", "Dunban", "Sharla", "Riki", "Melia"]
lineArray = []              #Array to temporarily store the data for "lines.json"
inverseLineArray = []       #Array to temporarily store the data for "inverseLines.json"

#Open the text file ("script.txt") and loop through each line
with open("script.txt", "r") as file:
    for line in file:
      #If the line is empty, then the current character's dialogue is over since each line of dialogue in "script.txt" is seprated by an empty line.
      if (line.strip() == ""):
        foundCharacterLine = False

      #This statment will only return true when the current line contains dialogue for a character.
      if foundCharacterLine:
        #If the previous line has the same character name as this one, then append this line to the previous line's "completion" field. This allows lines of dialogue that span mutliple lines in "script.txt" to be extracted correctly.
        if (not len(lineArray) == 0 and lineArray[-1]["prompt"].startswith(characterName)):
          lineArray[-1]["completion"] += " " + line.strip()
        #If the previous line does not have the same character name as the current one, then the current line is the beginning of a different character's line of dialogue, so it gets a new entry in lineArray.
        else:
          data = {"prompt": f"{characterName} would say: ", "completion": line.strip()}
          lineArray.append(data)

        #If the previous line has the same character name as this one, then append this line to the previous line's "prompt" field. This also allows lines of dialogue that span multiple lines to be extracted correctly.
        if (not len(inverseLineArray) == 0 and inverseLineArray[-1]["completion"] == characterName):
          inverseLineArray[-1]["prompt"] += " " + line.strip()
        #If the previous line does not have the same character name as the current one, then the current line is the beginning of a different character's line of dialogue, so it gets a new entry in inverseLineArray.
        else:
          data = {"prompt": f"Who would say this? {line.strip()}", "completion": characterName}
          inverseLineArray.append(data)

      #If the line consists only of the character's name or a character's name followed by a bracket, then this means the next line will contain dialogue for that character because "script.txt" is formatted that way.
      #The brackets are used to indicated certain characteristics about the line of dialogue.
      if (line.strip() in characterNames or any(line.startswith(name + " [") for name in characterNames)):
        foundCharacterLine = True                       #This is set to true so that the next line will be used to output dialogue to the .json files.
        characterName = line.split(None, 1)[0]          #The character name is extracted from the line for use in creating the .json files.

#The first two with statements write lineArray and inverseLineArray to their respective .json files.
with open("lines.json", "w") as characterLines:
  json.dump(lineArray, characterLines)

with open("inverseLines.json", "w") as inverseCharacterLines:
  json.dump(inverseLineArray, inverseCharacterLines)

#The last two with statements print out the new .json files to the console.
with open("lines.json", "r") as characterLines:
  print(characterLines.read())

with open("inverseLines.json", "r") as inverseCharacterLines:
  print(inverseCharacterLines.read())



In [None]:
#IMPORTANT: THIS CELL MUST BE RUN FOR COMMANDS IN SUBSEQUENT CELLS TO WORK.
#Installs the OpenAI API that is used to access GPT-3 models and functions.
!pip install --upgrade openai
...

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting openai
  Downloading openai-0.27.5-py3-none-any.whl (71 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.6/71.6 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Collecting aiohttp
  Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
Collecting aiosignal>=1.1.2
  Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting frozenlist>=1.1.1
  Downloading frozenlist-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (149 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m149.6/149.6 kB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
Collecting async-timeout<5.0,>=4.0.0a3
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Colle

Ellipsis

**Section 2: Creating a fine-tuned GPT-3 Babbage model that generates text which mimics the characters**

In [None]:
#This command prepares the training data for GPT-3 fine-tuning by deleting duplicate rows, adding prefixes and suffixes that improve performance of the model, and converting the file to .jsonl format.
!openai tools fine_tunes.prepare_data -f lines.json

Analyzing...

- Your file contains 2651 prompt-completion pairs
- There are 112 duplicated prompt-completion sets. These are rows: [48, 214, 245, 267, 268, 269, 270, 272, 284, 292, 309, 339, 351, 352, 353, 372, 373, 377, 382, 387, 447, 456, 492, 516, 517, 523, 527, 533, 540, 576, 591, 661, 668, 768, 770, 798, 813, 901, 947, 1016, 1046, 1224, 1291, 1315, 1316, 1323, 1328, 1329, 1332, 1368, 1372, 1400, 1419, 1481, 1485, 1487, 1495, 1504, 1506, 1532, 1545, 1555, 1565, 1570, 1572, 1588, 1621, 1641, 1715, 1718, 1725, 1746, 1830, 1841, 1876, 1903, 1959, 1962, 1985, 2029, 2059, 2091, 2095, 2126, 2134, 2137, 2154, 2174, 2188, 2193, 2195, 2196, 2204, 2236, 2238, 2245, 2310, 2341, 2351, 2366, 2374, 2381, 2427, 2452, 2458, 2488, 2492, 2539, 2544, 2599, 2628, 2633]
- All prompts end with suffix ` would say: `. This suffix seems very long. Consider replacing with a shorter suffix, such as ` ->`
- Your data does not contain a common ending at the end of your completions. Having a common ending strin

In [None]:
import openai
import os
os.environ["OPENAI_API_KEY"] = "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"                      #Sets my OpenAI API key as an environment variable so that the API requests are authenticated.
!openai api fine_tunes.create -t generationTraining.jsonl -m babbage --suffix "Voice Line Generation"     #Creates a fine-tuned GPT-3 Babbage generative model using generationTraining.jsonl as the training data.

Upload progress:   0% 0.00/284k [00:00<?, ?it/s]Upload progress: 100% 284k/284k [00:00<00:00, 300Mit/s]
Uploaded file from generationTraining.jsonl: file-hNXN6GZYz5d60AsijvDNFi6g
Created fine-tune: ft-yKWOw7d5CY1Lf0nG3peebmqh
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-04-15 05:47:12] Created fine-tune: ft-yKWOw7d5CY1Lf0nG3peebmqh
[2023-04-15 05:47:28] Fine-tune costs $0.14
[2023-04-15 05:47:29] Fine-tune enqueued. Queue number: 0
[2023-04-15 05:47:29] Fine-tune started



In [None]:
!openai api fine_tunes.follow -i ft-yKWOw7d5CY1Lf0nG3peebmqh      #Shows the progress of the fine-tuning job created in the last cell.

[2023-04-15 05:47:12] Created fine-tune: ft-yKWOw7d5CY1Lf0nG3peebmqh
[2023-04-15 05:47:28] Fine-tune costs $0.14
[2023-04-15 05:47:29] Fine-tune enqueued. Queue number: 0
[2023-04-15 05:47:29] Fine-tune started
[2023-04-15 05:51:57] Completed epoch 1/4
[2023-04-15 05:56:06] Completed epoch 2/4
[2023-04-15 06:00:16] Completed epoch 3/4
[2023-04-15 06:04:42] Uploaded model: babbage:ft-personal:voice-line-generation-2023-04-15-06-04-42
[2023-04-15 06:04:43] Uploaded result file: file-uAxt76Pdfvbxh3a57K8UhFSX
[2023-04-15 06:04:43] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m babbage:ft-personal:voice-line-generation-2023-04-15-06-04-42 -p <YOUR_PROMPT>


In [None]:
import openai
openai.api_key = "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"        #Set up API credentials.
#This command uses the fine-tuned generative Babbage model to generate text which mimics the character which is inputted as part of the prompt.
response = openai.Completion.create(
    model="babbage:ft-personal:voice-line-generation-2023-04-15-06-04-42",
    prompt="Reyn would say: ",            #The prompt should consist of one of the seven character names followed by " would say: ".
    temperature=1,                        #A higher temperature value like 1 allows the model to select words with lower probabilities, which leads to more variation in its output.
    max_tokens=50,                        #This reduces the maximum length of responses to something that makes sense for a quote.
    stop=["\n"])                          #This makes the model only generate one line at a time.
generated_text = response.choices[0].text.strip()

#Print the text generated by the model.
print(generated_text)



A year ago, the Bionis reached its peak civilisation. We used advanced technology to develop the Military in the process. We eradicated the Bionis, and destroyed most of the Machina. But there was a tiny scar: If one single


In [None]:
import openai
openai.api_key = "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"
characterNames = ["Shulk", "Reyn", "Fiora", "Dunban", "Sharla", "Riki", "Melia"]

#Seven different text outputs for each character are generated, so the model can be evaluated by using human judgement to assess the quality of the generated responses and the degree of similarity between the AI generated clips and actual text lines of the characters.
#This validation step is run for both the Babbage and Curie models, so their performance can be compared to see whether the more expensive model was worth using for this task.
for i in range(49):
  response = openai.Completion.create(
    model="babbage:ft-personal:voice-line-generation-2023-04-15-06-04-42",
    prompt=f"{characterNames[i % 7]} would say: ",       #Using %7 ensures that each character is used as input an equal number of times without accessing indexes of the characterNames array which do not exist.
    temperature=1,
    max_tokens=50,
    stop=["\n"])
  generated_text = response.choices[0].text.strip()
  print(f"{characterNames[i % 7]} would say: {generated_text}\n")

Shulk would say: Yeah, I guess so.

Reyn would say: Hey, Shulk. If she didn't look much like you, she'd be fine.

Fiora would say: No, I really can't. But please try and find me if you do. I'll be here. If anything happens to you... Promise?

Dunban would say: What?!

Sharla would say: By the way, does Egil have plans to use Telethia as a bridge?

Riki would say: Dickson, let me borrow your cell phone.

Melia would say: There is no help for it. I must go fetch Lord Dunga.

Shulk would say: Give me a minute. I'll be right out.

Reyn would say: Hey! What was that?!

Fiora would say: Inside your heart, Shulk, a hidden Machina is growing stronger. Do not let it control you! When you closed the Mechanical Heart, a dark power rose to the surface. You can still see it?

Dunban would say: And that is?

Sharla would say: It's not right. We can't just kill everyone.

Riki would say: Reyn sure he fit inside?

Melia would say: Shulk, go back to strictly observing!

Shulk would say: He was like a f

**Analysis of Babbage Results**

**Prompt: “Shulk would say: ”**
1.	Yeah, I guess so.
a.	Quality: 1/5 (Very generic and short)
b.	Similarity: 2/5 (This sounds like something Shulk would say and sounds human, but it does not reference the game.
2.	Give me a minute. I'll be right out.
a.	Quality: 2/5 (Generic and a bit longer)
b.	Similarity: 3/5 (This sounds like something Shulk would say and sounds human, but it does not reference the game.
3.	He was like a father to us. We feel as if we're betraying him if we don't succeed.
a.	Quality: 5/5 (Quite unique and optimal length)
b.	Similarity: 4/5 (This definitely sounds like Shulk, it sounds very human, and this references the game because Shulk’s father figure betrays him.)
4.	Who would have thought we'd end up in prison just because of a world gone wrong.
a.	Quality: 5/5 (Unique optimal length dialogue)
b.	Similarity: 4/5 (This sounds a lot like Shulk, it sounds human, and this references Prison Island, which is a location in the game.
5.	I can feel it. I know it.
a.	Quality: 2/5 (Slightly generic and short)
b.	Similarity: 3/5 (This sounds like Shulk and sounds human, but it does not reference the game.
6.	I'm in! I think! Zanza! It wants us to kill Egil. Everyone, show these eyeless faced trash just how pointless their quest to murdering us is! Fiora, Reyn? Wait! Fiora! What?! I can
a.	Quality: 5/5 (Very unique and long)
b.	Similarity: 5/5 (This sounds very similar to Shulk, sounds very human, and references four characters from the game: Zanza, Egil, Fiora, and Reyn.
7.	If we want to explore further, we'll have to catch up with her.[9]
a.	Quality: 3/5 (Pretty unique and good length, but there is a weird number at the end)
b.	Similarity: 3/5 (This sounds kind of like Shulk, sounds human, and sounds like a conversation that would happen in the game.

**Prompt: “Reyn would say: ”**
1.	Hey, Shulk. If she didn't look much like you, she'd be fine.
a.	Quality: 5/5 (Unique and optimal length)
b.	Similarity: 5/5 (This sounds just like Reyn, sounds human, and it references Shulk.
2.	Hey! What was that?!
a.	Quality: 2/5 (Pretty generic and short)
b.	Similarity: 2/5 (This sounds somewhat like Reyn, but it does not reference the game.)
3.	Why none of us? You got what you wanted. Shulk?
a.	Quality: 4/5 (Pretty unique and long)
b.	Similarity: 4/5 (This sounds like Reyn and references Shulk.
4.	OK.
a.	Quality: 1/5 (Very Generic and short)
b.	Similarity: 1/5 (Too short to tell)
5.	Oi! What're ya on about?
a.	Quality: 5/5 (Very unique and long enough)
b.	Similarity: 5/5 (This sounds exactly like Reyn because he often uses the words “oi” and “ya”.
6.	Yeah! How 'bout us?
a.	Quality: 3/5 (Pretty unique but short)
b.	Similarity: 4/5 (This sounds a lot like Reyn because he often says “’bout”.
7.	That was cool.
a.	Quality: 1/5 (Generic and short)
b.	Similarity: 2/5 (Sounds vaguely like Reyn)

**Prompt: “Fiora would say: ”**
1.	No, I really can't. But please try and find me if you do. I'll be here. If anything happens to you... Promise?
a.	Quality: 5/5 (Very unique and long)
b.	Similarity: 4/5 (This sounds a lot like Fiora because she often acts concerned about others.)
2.	Inside your heart, Shulk, a hidden Machina is growing stronger. Do not let it control you! When you closed the Mechanical Heart, a dark power rose to the surface. You can still see it?
a.	Quality: 5/5 (Very unique and long)
b.	Similarity: 5/5 (This sounds just like Fiora and references Shulk and the Machina, which is a race in the game.
3.	Shulk? I'm just...
a.	Quality: 3/5 (Unique but short)
b.	Similarity: 4/5 (It sounds like Fiora and references Shulk.)
4.	I replace my body with ether, but I derive full usage from my Monado.
a.	Quality: 5/5 (Unique and good length)
b.	Similarity: 5/5 (This sounds a lot like Fiora, and it references ether and the Monado, which are from the game.
5.	Enemy seeds! We have to destroy the pots! The seeds are creating a barrier on top of the Ka Free View in Assist
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 4/5 (This sounds like Fiora but does not overtly reference the game.)
6.	Are you sure? They're like part of you now.
a.	Quality: 4/5 (Unique and pretty good length)
b.	Similarity: 3/5 (This sounds like Fiora but does not reference the game.)
7.	Her voice... Is it true that she cries because of the Mechonis?
a.	Quality: 5/5 (Unique and good length)
b.	Similarity: 5/5 (This sounds like Fiora and references the Mechonis.)

**Prompt: “Dunban would say: ”**
1.	What?!
a.	Quality: 1/5 (Generic and short)
b.	Similarity: 1/5 (Too short to tell)
2.	And that is?
a.	Quality: 1/5 (Generic and short)
b.	Similarity: 2/5 (Sounds vaguely like Dunban)
3.	Get down! Everyone, hold on to him! Everyone, get down! Everyone, get back! Everyone! What's wrong with you?! That shouldn't happen! We're Guild Army! Nobody let their ego get in the way ever! They're
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 4/5 (It sounds a lot like Dunban but does not directly reference the game.)
4.	The imperial capital?
a.	Quality: 3/5 (Unique but short)
b.	Similarity: 3/5 (It references the imperial capital and sounds human.
5.	What was that?!
a.	Quality: 1/5 (Generic and short)
b.	Similarity: 2/5 (Sounds vaguely like Dunban)
6.	Why would Tyrea want to control the Monado? I mean, Monado is a weapon of prosperity and peace, not war and destruction.
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 5/5 (Sounds a lot like Dunban and references Tyrea and the Monado)
7.	Where am I? At home? Does that mean I'm Egil's heir? It's just a feeling, but...
a.	Quality: 5/5 (Unique and optimal length)
b.	Similarity: 5/5 (Sounds like Dunban and references Egil)

**Prompt: “Sharla would say: ”**
1.	By the way, does Egil have plans to use Telethia as a bridge?
a.	Quality: 5/5 (Unique and optimal length)
b.	Similarity: 5/5 (This sounds a lot like Sharla and references Egil and Telethia.)
2.	It's not right. We can't just kill everyone.
a.	Quality: 4/5 (Pretty unique and okay length)
b.	Similarity: 3/5 (This sounds like Sharla but does not reference the game.)
3.	You Mr Smith!
a.	Quality: 1/5 (Short and generic)
b.	Similarity: 1/5 (This does not sound like Sharla.)
4.	OK, everyone. I'll keep watch while you all rest.
a.	Quality: 3/5 (Slightly generic but nice length)
b.	Similarity: 3/5 (Sounds like Sharla but does not reference game)
5.	This arrow will kill the Telethia and weaken it. We can use the same technique to make the arrow disappear.
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 5/5 (This sounds like Sharla and references Telethia.)
6.	As long as I have you I will fight and compete with Egil. There's no need to be afraid. Yes, I will go with Zanza to the foot of Prison Island. I will face him there.
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 5/5 (Sounds a lot like Sharla and references Egil, Zanza, and Prison Island.
7.	Majority of people in Colony 9 think you did it.
a.	Quality: 4/5 (Pretty unique and long)
b.	Similarity: 5/5 (Sounds like Sharla and references Colony 9)

**Prompt: "Riki would say: ”**
1.	Dickson, let me borrow your cell phone.
a.	Quality: 3/5 (Pretty unique and okay length)
b.	Similarity: 3/5 (This does not sound like Riki, but it references Dickson.
2.	Reyn sure he fit inside?
a.	Quality: 3/5 (Pretty unique but a little short)
b.	Similarity: 4/5 (Sounds a bit like Riki and references Reyn(
3.	Reyn bad poni bird! Riki want to watch!
a.	Quality: 5/5 (Good length and unique)
b.	Similarity: 5/5 (This sounds exactly like Riki and references Reyn.)
4.	Heropon much better than Riki! Heropon ready to take on swarm! When Riki see, he say 'yea'!
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 5/5 Tthis sounds exactly like Riki as he talks in third person and calls himself Heropon.)
5.	Fishing village. Folk say Chef nopon have special fish finger. So Riki go look.
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 5/5 (This sounds a lot like Riki and references Nopon.
6.	Riki all villagers come to see!
a.	Quality: 4/5 (Very unique but short)
b.	Similarity: 5/5 (Sounds just like Riki and references Frontier Village)
7.	Bugglemon. Famished. And thirsty. And cold. Me and Riki want a warm Cave Civi bath.
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 3/5 (Sounds similar to Riki but sounds off)

**Prompt: "Melia would say: ”**
1.	There is no help for it. I must go fetch Lord Dunga.
a.	Quality: 5/5 (Very unique and long enough)
b.	Similarity: 5/5 (Sounds posh like Melia and references Chief Dunga)
2.	Shulk, go back to strictly observing!
a.	Quality: 4/5 (Unique but a bit short)
b.	Similarity: 5/5 (Sounds just like Melia and references Shulk)
3.	Our forefathers lived in peace with the Nopon. Their land and ours were one.
a.	Quality: 5/5 (Very unique and good length)
b.	Similarity: 5/5 (Sounds just like Melia as it sounds posh and references her forefathers and Nopon)
4.	It looks like the formerly submerged island is becoming accessible again. Perhaps that is the reason for all the recent unrest. And it looks like High Entia have been sighted near the temple.
a.	Quality: 5/5 (Very unique and long)
b.	Similarity: 5/5 (Sounds exactly like Melia and references the High Entia and islands)
5.	If it weren't for him I wouldn't have been able to guard the Bionis.
a.	Quality: 4/5 A bit generic but good length)
b.	Similarity: 4/5 (Sounds like Melia and references the Bionis)
6.	How can this be? The Bionis has fundamentally changed! What is this?! (Close-up) Shulk?
a.	Quality: 5/5 (Very unique and long)
b.	Similarity: 5/5 (Sounds just like Melia and references the Bionis and Shulk
7.	Kallian... More important than even your kingdom or life is the choice you made. To call it by any other name would not be enough. Kallian...
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 5/5 (Sounds just like Melia and references her brother, Kallian)


**Section 3: Creating another fine-tuned model that generates text which mimics the characters with the more powerful Curie model (Final Implementation)**

In [None]:
import openai
import os
os.environ["OPENAI_API_KEY"] = "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"                                        #Sets my OpenAI API key as an environment variable so that the API requests are authenticated.
!openai api fine_tunes.create -t generationTraining.jsonl -m curie --suffix "Voice Line Generation Curie Version"           #Creates a fine-tuned GPT-3 Curie generative model using generationTraining.jsonl as the training data.

Found potentially duplicated files with name 'generationTraining.jsonl', purpose 'fine-tune' and size 283768 bytes
file-hNXN6GZYz5d60AsijvDNFi6g
Enter file ID to reuse an already uploaded file, or an empty string to upload this file anyway: file-hNXN6GZYz5d60AsijvDNFi6g
Reusing already uploaded file: file-hNXN6GZYz5d60AsijvDNFi6g
Created fine-tune: ft-DMkFf59JEyZTCFfjXMBQEJnT
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-04-17 03:24:16] Created fine-tune: ft-DMkFf59JEyZTCFfjXMBQEJnT
[2023-04-17 03:24:59] Fine-tune costs $0.71
[2023-04-17 03:24:59] Fine-tune enqueued. Queue number: 0
[2023-04-17 03:25:00] Fine-tune started



In [None]:
!openai api fine_tunes.follow -i ft-DMkFf59JEyZTCFfjXMBQEJnT          #Shows the progress of the fine-tuning job created in the last cell.

[2023-04-17 03:24:16] Created fine-tune: ft-DMkFf59JEyZTCFfjXMBQEJnT
[2023-04-17 03:24:59] Fine-tune costs $0.71
[2023-04-17 03:24:59] Fine-tune enqueued. Queue number: 0
[2023-04-17 03:25:00] Fine-tune started
[2023-04-17 03:31:06] Completed epoch 1/4
[2023-04-17 03:41:08] Completed epoch 3/4
[2023-04-17 03:46:35] Uploaded model: curie:ft-personal:voice-line-generation-curie-version-2023-04-17-03-46-35
[2023-04-17 03:46:36] Uploaded result file: file-8D2z6jH412s43SEPmcSjUrLM
[2023-04-17 03:46:36] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m curie:ft-personal:voice-line-generation-curie-version-2023-04-17-03-46-35 -p <YOUR_PROMPT>


In [None]:
import openai
openai.api_key = "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"      #Set up API credentials.
#This command uses the fine-tuned generative Curie model to generate text which mimics the character which is inputted as part of the prompt.
response = openai.Completion.create(
    model="curie:ft-personal:voice-line-generation-curie-version-2023-04-17-03-46-35",
    prompt="Shulk would say: ",                #The prompt should consist of one of the seven character names followed by " would say: ".
    temperature=1,                             #A higher temperature value like 1 allows the model to select words with lower probabilities, which leads to more variation in its output.
    max_tokens=50,                             #This reduces the maximum length of responses to something that makes sense for a quote.
    stop=["\n"])                               #This makes the model only generate one line at a time.
generated_text = response.choices[0].text.strip()

#Print the text generated by the model.
print(generated_text)

Egil! Everyone! Get back! Egil! Dickson! No! Egil! What are you?! Stop this! No! I won't let you! Anyone! Anyone?! Anyone?! Wake up! It's not possible! Stop it


In [None]:
import openai
openai.api_key = "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"
characterNames = ["Shulk", "Reyn", "Fiora", "Dunban", "Sharla", "Riki", "Melia"]

#Seven different text outputs for each character are generated, so the model can be evaluated by using human judgement to assess the quality of the generated responses and the degree of similarity between the AI generated clips and actual text lines of the characters.
#This validation step is run for both the Babbage and Curie models, so their performance can be compared to see whether the more expensive model was worth using for this task.
for i in range(49):
  response = openai.Completion.create(
    model="curie:ft-personal:voice-line-generation-curie-version-2023-04-17-03-46-35",
    prompt=f"{characterNames[i % 7]} would say: ",       #Using %7 ensures that each character is used as input an equal number of times without accessing indexes of the characterNames array which do not exist.
    temperature=1,
    max_tokens=50,
    stop=["\n"])
  generated_text = response.choices[0].text.strip()
  print(f"{characterNames[i % 7]} would say: {generated_text}\n")


Shulk would say: What did I just...?

Reyn would say: But... What made you decide to live on Bionis?

Fiora would say: Is that how you see me?

Dunban would say: My head!

Sharla would say: What do we do?!

Riki would say: Oki! Just like Heropon Riki! Riki happy if listen to Melly's voice!

Melia would say: Fiora, is that?

Shulk would say: Then we will oppose them. To protect the ones we love. The ones we stand with. That is why we fight! Riki! What's wrong?! You idiot! You did it! You stopped the blast! We're alive and we have

Reyn would say: That was...different. You know, I never noticed before how pale your ears are.

Fiora would say: Dunban.

Dunban would say: Reyn, Melia. Shulk the High Entia must be a Homs too, then.

Sharla would say: Um, Reyn?

Riki would say: Riki say yes!

Melia would say: Are you sure?

Shulk would say: I think so. It seems to me that they are leading us somewhere.

Reyn would say: You all right, kid?

Fiora would say: But don't worry. I'll be with you th

**Analysis of Curie Results**

Prompt: “Shulk would say: ”
1.	What did I just...?
a.	Quality: 2/5 (Pretty generic and short)
b.	Similarity: 3/5 (Sounds like Shulk but does not reference the game)

2.	Then we will oppose them. To protect the ones we love. The ones we stand with. That is why we fight! Riki! What's wrong?! You idiot! You did it! You stopped the blast! We're alive and we have
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 5/5 (Sounds a lot like Shulk and references Riki)

3.	I think so. It seems to me that they are leading us somewhere.
a.	Quality:  4/5 (Pretty unique and optimal length)
b.	Similarity: 4/5 (Sounds a lot like Shulk but does not reference the game)
4.	You made us laugh and cry for days.
a.	Quality: 3/5 (Kind of unique and kind of short)
b.	Similarity: 3/5 (Sounds like Shulk)
5.	A vision! I know! I've gotta try something! Keep your eye on that bronze clock! If we can get the time on the clock, we're home and dry!
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 5/5 (Sounds just like Shulk and references his visions)
6.	One day... maybe one day we can be together...
a.	Quality: 4/5 (Pretty unique and okay length)
b.	Similarity: 4/5 (Sounds a lot like Shulk)
7.	It's me. Reyn.
a.	Quality: 3/5 (Kinda unique but short)
b.	Similarity: 4/5 (Sounds like Shulk and references Reyn)

Prompt: “Reyn would say: ”
1.	But... What made you decide to live on Bionis?
a.	Quality: 5/5 (Very unique and good length)
b.	Similarity: 5/5 (Sounds a lot like Reyn and references the Bionis)
2.	That was...different. You know, I never noticed before how pale your ears are.
a.	Quality: 5/5 (Unique and optimal length)
b.	Similarity: 4/5 (Sounds a lot like Reyn)
3.	You all right, kid?
a.	Quality: 3/5 (Pretty unique but short)
b.	Similarity: 4/5 (Sounds a lot like Reyn as he often calls other characters “kid”)
4.	A Mechon?! But there were three of you guys a minute ago.
a.	Quality: 5/5 (Unique and good length)
b.	Similarity: 5/5 (Sounds like Reyn and references Mechon)
5.	It's a floating city!
a.	Quality: 3/5 (Unique but short)
b.	Similarity: 5/5 (Sounds like Reyn and references Alcamoth, the floating city)
6.	That black bugger was right!
a.	Quality: 3/5 (Unique but short)
b.	Similarity: 4/5 (Sounds a lot like Reyn as he often says “bugger”)
7.	Shulk, Melia needs our help.
a.	Quality: 5/5 (Very Unique)
b.	Similarity: 5/5 (Sounds like Reyn and references Shulk and Melia)

Prompt: “Fiora would say: ”
1.	Is that how you see me?
a.	Quality: 3/5 (Kinda generic and short)
b.	Similarity: 3/5 (Sounds like Fiora but does not reference the game)
2.	Dunban.
a.	Quality: 2/5 (Unique but very short)
b.	Similarity: 3/5 (References Dunban, Fiora’s brother)
3.	But don't worry. I'll be with you the whole way.
a.	Quality: 4/5 (Pretty unique and good length)
b.	Similarity: 4/5 (Sounds a lot like Fiora)
4.	There's not enough ether in the world to save me now!
a.	Quality: 5/5 (Unique and good length)
b.	Similarity: 5/5 (Sounds a lot like Fiora and references ether)
5.	A machine that controls bronze Paper of Life. Is that what you called it?
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 4/5 (Sounds a lot like Fiora)
6.	I know, but I don't want to lose you too. I don't want to ever see you in pain again. So just hold on for a little longer.
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 5/5 (Sounds just like Fiora)
7.	That voice. It's Zanza! But I haven't finished yet. I will use every last scrap of my life to protect all of you! What is that sound?! It's coming from outside! What's happening?! It's dark.
a.	Quality: 5/5 (Very unique and long)
b.	Similarity: 5/5 (Sounds just like Fiora and references Zanza)

Prompt: “Dunban would say: ”
1.	My head!
a.	Quality: 1/5 (Generic and short)
b.	Similarity: 2/5 (Sounds vaguely like Dunban)
2.	Reyn, Melia. Shulk the High Entia must be a Homs too, then.
a.	Quality: 5/5 (Very unique and good length)
b.	Similarity: 5/5 (Sounds just like Dunban and references Reyn, Melia, Shulk, and the High Entia)
3.	It's just an incident. No big deal.
a.	Quality: 3/5 (Kinda unique and kinda short
b.	Similarity: 3/5 (Sounds like Dunban)
4.	And if he dies, what difference does that make to you? You wont lose anything. What reason do you have to risk your life? Where is the true Emperor?! Show yourself! You have the Nopon blood in you? Show yourself,
a.	Quality: 5/5 (Very unique and long)
b.	Similarity: 5/5 (Sounds a lot like Dunban and references the Emperor and Nopon)
5.	Thanks to you.
a.	Quality: 2/5 (Generic and short)
b.	Similarity: 2/5 (Sounds vaguely like Dunban)
6.	My sister is crying out for help. Please, Heropon! You must save her! A mature man would know the difference between affection and obsession.
a.	Quality: 5/5 (Very unique and long)
b.	Similarity: 5/5 (Sounds just like Dunban and references his sister, Fiora, and the Heropon, Riki)
7.	Right! Off we go!
a.	Quality: 2/5 (Generic and short)
b.	Similarity: 2/5 (Sounds vaguely like Dunban)

Prompt: “Sharla would say: ”
1.	What do we do?!
a.	Quality: 2/5 (Generic and short)
b.	Similarity: 2/5 (Sounds vaguely like Sharla)
2.	Um, Reyn?
a.	Quality: 3/5 (Unique but short)
b.	Similarity: 4/5 (Sounds like Sharla and references Reyn)
3.	She does look worn out. It's been a hard day for her.
a.	Quality: 5/5 (Unique and good length)
b.	Similarity: 4/5 (Sounds a lot like Sharla)
4.	Have you gone deaf?! Shulk?! A Telethia?! We have to get out of here! Right now! The entrance to the tomb... It's gonna crumble! The back of the tomb's gonna... No! Something's happening! The force
a.	Quality: 5/5 (Very unique and long)
b.	Similarity: 5/5 (Sounds a lot like Sharla and references Shulk and Telethia)
5.	He's right! Juju would never have gone along with this without some good reason!
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 5/5 (Sounds a lot like Sharla and references Juju, Sharla’s brother)
6.	Mr. Zanza!
a.	Quality: 2/5 (Unique but short)
b.	Similarity: 3/5 (References Zanza)
7.	It wasn't my fault! Gadolt saved me. Then the Mechon attacked. And Gadolt... He left me with no choice.
a.	Quality: 5/5 (Unique and long)
b.	Similarity: 5/5 (Sounds just like Sharla and references Mechon and Gadolt)

Prompt: “Riki would say: ”
1.	Oki! Just like Heropon Riki! Riki happy if listen to Melly's voice!
a.	Quality: 5/5 (Very unique and optimal length)
b.	Similarity: 5/5 (Sounds just like Riki and references him being Heropon and Melia)
2.	Riki say yes!
a.	Quality: 2/5 (Unique but short)
b.	Similarity: 4/5 (Sounds just like Riki)
3.	Emore!
a.	Quality: 1/5 (Short and weird)
b.	Similarity: 1/5 (Too short to tell)
4.	Village empty! Soldiers leave signals!
a.	Quality: 5/5 (Very unique and good length)
b.	Similarity: 5/5 (Sounds just like Riki and references Frontier Village)
5.	They vill see soon enough!
a.	Quality: 4/5 (Very unique but short)
b.	Similarity: 4/5 (Sounds a lot like Riki)
6.	Shulk! Friends come! Everyone come see Riki's fire power! Friends see Riki's fire power! Everyone see Riki's fire power! Shulk! Friends come! Everyone come see Riki's fire power! Riki very happy
a.	Quality: 5/5 (Very unique and long)
b.	Similarity: 5/5 (Sounds just like Riki as he references himself in third person and says “friends” a lot; also references Shulk
7.	Riki do!
a.	Quality: 3/5 (Unique but short)
b.	Similarity: 4/5 (Sounds a lot like Riki)

Prompt: “Melia would say: ”
1.	Fiora, is that?
a.	Quality: 3/5 (Unique but short)
b.	Similarity: 4/5 (Sounds like Melia and references Fiora)
2.	Are you sure?
a.	Quality: 1/5 (Very generic and short)
b.	Similarity: 2/5 (Sounds vaguely like Melia)
3.	Do not mock me. My homeland is Eryth Sea. That is, High Entia and Homs territory. High Entia are cruel by nature. They are nothing like us!
a.	Quality: 5/5 (Very unique and long)
b.	Similarity: 5/5 (Sounds just like Melia as it sounds posh and references Eryth Sea, High Entia, and Homs)
4.	Is that my birth-father?!
a.	Quality: 4/5 (Very unique but short)
b.	Similarity: 4/5 (Sounds a lot like Melia and references her father)
5.	Speak, old man.
a.	Quality: 4/5 (Very unique but short)
b.	Similarity: 3/5 (Sounds a lot like Melia as she often says “speak” in an authoritative way)
6.	Dalexia!
a.	Quality: 1/5 (Short and weird)
b.	Similarity: 1/5 (Too short to tell)
7.	Brother!
a.	Quality: 3/5 (Very unique but very short)
b.	Similarity: 4/5 (Sounds a lot like Melia and references her brother)


**Section 4: Creating a fine-tuned Babbage model that classifies quotes based on which character would be most likely to say it**

In [None]:
#This command prepares the training data for GPT-3 fine-tuning by deleting duplicate rows, adding/removing prefixes and suffixes that improve performance of the model, splitting the data into a training and validation set, and converting the file to .jsonl format.
#Classification models allow more options for fine-tuning as seen with the addition of a validation set.
!openai tools fine_tunes.prepare_data -f inverseLines.json

Analyzing...

- Your file contains 2651 prompt-completion pairs
- Based on your data it seems like you're trying to fine-tune a model for classification
- For classification, we recommend you try one of the faster and cheaper models, such as `ada`
- For classification, you can estimate the expected model performance by keeping a held out dataset, which is not used for training
- There are 112 duplicated prompt-completion sets. These are rows: [48, 214, 245, 267, 268, 269, 270, 272, 284, 292, 309, 339, 351, 352, 353, 372, 373, 377, 382, 387, 447, 456, 492, 516, 517, 523, 527, 533, 540, 576, 591, 661, 668, 768, 770, 798, 813, 901, 947, 1016, 1046, 1224, 1291, 1315, 1316, 1323, 1328, 1329, 1332, 1368, 1372, 1400, 1419, 1481, 1485, 1487, 1495, 1504, 1506, 1532, 1545, 1555, 1565, 1570, 1572, 1588, 1621, 1641, 1715, 1718, 1725, 1746, 1830, 1841, 1876, 1903, 1959, 1962, 1985, 2029, 2059, 2091, 2095, 2126, 2134, 2137, 2154, 2174, 2188, 2193, 2195, 2196, 2204, 2236, 2238, 2245, 2310, 2341, 2351

In [None]:
import openai
import os
os.environ["OPENAI_API_KEY"] = "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"         #Sets my OpenAI API key as an environment variable so that the API requests are authenticated.
#Creates a fine-tuned GPT-3 Babbage classification model using "classificationTraining.jsonl" as the training data and "classificationValidation.json" as the validation data.
#The two commands at the end of the line run accuracy and weighted F1-score evaluation metrics on the validation set.
!openai api fine_tunes.create -t classificationTraining.jsonl -v classificationValidation.jsonl -m babbage --suffix "Voice Line Classification" --compute_classification_metrics --classification_n_classes 7

Upload progress:   0% 0.00/203k [00:00<?, ?it/s]Upload progress: 100% 203k/203k [00:00<00:00, 252Mit/s]
Uploaded file from classificationTraining.jsonl: file-TutHDcpIdw8XHZ9P2l32s6pS
Upload progress: 100% 52.6k/52.6k [00:00<00:00, 103Mit/s]
Uploaded file from classificationValidation.jsonl: file-emP1S9PcSe3Po9RFynynebf0
Created fine-tune: ft-Kzwb684gcKj0dZaxQnp0wISt
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-04-16 04:52:14] Created fine-tune: ft-Kzwb684gcKj0dZaxQnp0wISt
[2023-04-16 04:52:26] Fine-tune costs $0.09
[2023-04-16 04:52:26] Fine-tune enqueued. Queue number: 0
[2023-04-16 04:52:30] Fine-tune started



In [None]:
!openai api fine_tunes.follow -i ft-Kzwb684gcKj0dZaxQnp0wISt        #Shows the progress of the fine-tuning job created in the last cell.

[2023-04-16 04:52:14] Created fine-tune: ft-Kzwb684gcKj0dZaxQnp0wISt
[2023-04-16 04:52:26] Fine-tune costs $0.09
[2023-04-16 04:52:26] Fine-tune enqueued. Queue number: 0
[2023-04-16 04:52:30] Fine-tune started
[2023-04-16 04:57:23] Completed epoch 1/4
[2023-04-16 05:01:04] Completed epoch 2/4
[2023-04-16 05:04:43] Completed epoch 3/4
[2023-04-16 05:08:22] Completed epoch 4/4
[2023-04-16 05:09:02] Uploaded model: babbage:ft-personal:voice-line-classification-2023-04-16-05-09-02
[2023-04-16 05:09:04] Uploaded result file: file-4UGMPDm3KN99j4BChzhekOtN
[2023-04-16 05:09:04] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m babbage:ft-personal:voice-line-classification-2023-04-16-05-09-02 -p <YOUR_PROMPT>


In [None]:
import openai
import os
os.environ["OPENAI_API_KEY"] = "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"
#Shows the results file for the classification model. This file includes the accuracy score and weighted F1 score.
#However, the results file only shows these metrics for each individual line and not the model overall, so this file did not prove very useful.
!openai api fine_tunes.results -i ft-Kzwb684gcKj0dZaxQnp0wISt

step,elapsed_tokens,elapsed_examples,training_loss,training_sequence_accuracy,training_token_accuracy,validation_loss,validation_sequence_accuracy,validation_token_accuracy,classification/accuracy,classification/weighted_f1_score
1,68,4,0.8187460209010168,0.0,0.0,0.16824201847764197,0.0,0.25,,
2,136,8,0.8188135071354918,0.0,0.0,,,,,
3,236,12,0.4765089602714094,0.0,0.14285714285714285,,,,,
4,336,16,0.3654776818441072,0.0,0.16666666666666666,,,,,
5,468,20,0.23077282275032301,0.0,0.375,,,,,
6,600,24,0.20328934001458948,0.25,0.5,,,,,
7,732,28,0.17986376824235778,0.0,0.42857142857142855,,,,,
8,1120,32,0.06826222266492551,0.0,0.2857142857142857,,,,,
9,1220,36,0.21895465038536108,0.0,0.2857142857142857,0.22574133440974295,0.25,0.5,,
10,1288,40,0.2594690138628573,0.5,0.7142857142857143,,,,,
11,1356,44,0.17873079873833092,0.5,0.7142857142857143,,,,,
12,1424,48,0.0989909355167805,0.75,0.8,,,,,
13,1492,52,0.21956196304449463,0.25,0.42857142857142855,,,,,
14,1688,56,0.13671910986664024,0.0,0.42857

In [None]:
import openai
openai.api_key = "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"      #Set up API credentials.
response = openai.Completion.create(
    model="babbage:ft-personal:voice-line-classification-2023-04-16-05-09-02",
    prompt="What's wrong, Gadolt?! Why would you say that?! Speak to me, Gadolt! Gadolt! Stop! Gadolt.. ->",        #The prompt should consist of a quote that sounds like one of the characters followed by " ->".
    temperature=1,                                                    #A higher temperature value like 1 allows the model to select words with lower probabilities, which leads to more variation in its output.
    max_tokens=2,)                                                    ##This reduces the maximum length of responses to 2 tokens, which is always enough to output one of the characters' names.
generated_text = response.choices[0].text.strip()

#Print the classification output of the model.
print(generated_text)


Sharla


**Section 5: My initial testing using the text-davinci-002 model**

In [None]:
import openai
import os

# set up API credentials
openai.api_key = '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'

# define a prompt that Shulk might say
prompt = [
    "I'm really feeling it!",
    "This is the Monado's power!",
    "We can definitely do this!",
    "Back Slash!",
    "Now it's Shulk time!"
]

# set up the parameters for the GPT-3 model
model = "text-davinci-002"
temperature = 0.7
max_tokens = 100

# use the GPT-3 model to generate text
response = openai.Completion.create(
    model=model,
    prompt= prompt,
    temperature=temperature,
    max_tokens=max_tokens
)

# extract the generated text
generated_text = response.choices[0].text.strip()

# print the generated text
print(generated_text)



I'm really enjoying this!
