# Prototype for GPT Model
This prototype was tested with the AP Euro textbook, which is in the AP Euro folder. This notebook will work when moved in the AP Euro folder. It has been moved to the model directory in order to serve as a template to train all future models on other textbooks.

Saved: <br>
https://platform.openai.com/docs/guides/fine-tuning (also has JS stuff)

On JS: <br>
https://platform.openai.com/docs/api-reference/fine-tunes

In [20]:
import openai
import pandas as pd
import numpy as np
import pickle

with open("../api_key.txt", "r") as f:
    openai.api_key = f.read()

COMPLETIONS_MODEL = "text-davinci-003"

### Making textbook into finetuning data

##### Preprocessing text

In [2]:
with open("APUSH.txt", "r", encoding="utf-8", errors="ignore") as f:
    text = f.read()

In [9]:
import re

input_file = 'APUSH.txt'
output_file = 'APUSH1.txt'

with open(input_file, 'r', errors="ignore") as file_in, open(output_file, 'w') as file_out:
    content = file_in.read()
    modified_content = re.sub(r'\n{2,}', '\n\n', content)  # Replace multiple new lines with two new lines
    # modified_content = re.sub(r'\n(?!\n)', ' ', modified_content)  # Replace a single new line with a space
    file_out.write(modified_content)

##### Putting new text file into valid finetuning dataframe

In [12]:
with open("APUSH1.txt", "r", encoding="utf-8", errors="ignore") as f:
    text = f.read()

df = pd.DataFrame(text.split("\n"), columns=["completion"])
df["prompt"] = ""

df = df.reindex(columns=["prompt", "completion"])

df

Unnamed: 0,prompt,completion
0,,By the People
1,,A History of the United States
2,,AP ® Edition
3,,Boston Columbus Indianapolis New York San Fran...
4,,Amsterdam Cape Town Dubai London Madrid Milan ...
...,...,...
74159,,"Zuni, 4 , 7"
74160,,"Zuni Pueblo, 43"
74161,,"Zutucapan, 48"
74162,,


##### Removing invalid characters from dataframe

In [13]:
pattern = r"[^A-Za-z0-9\n,. ]"

df_cleaned = df.applymap(lambda x: re.sub(pattern, "", str(x)))

df = df_cleaned
df

Unnamed: 0,prompt,completion
0,,By the People
1,,A History of the United States
2,,AP Edition
3,,Boston Columbus Indianapolis New York San Fran...
4,,Amsterdam Cape Town Dubai London Madrid Milan ...
...,...,...
74159,,"Zuni, 4 , 7"
74160,,"Zuni Pueblo, 43"
74161,,"Zutucapan, 48"
74162,,


In [14]:
df.to_csv("APUSH.csv", index=False)

In [22]:
df = pd.read_csv("APUSH.csv")
df

Unnamed: 0.1,Unnamed: 0,To the Student
0,,I hope you enjoy reading By the People and tha...
1,,as a result of reading it.
2,,Th e title of this book By the People describe...
3,,shaped the United States as it is today. Whene...
4,,who have created this country. In a survey of ...
...,...,...
73011,,"Zuni, 4 , 7"
73012,,"Zuni Pueblo, 43"
73013,,"Zutucapan, 48"
73014,,


### Finetuning model

Now, to create a fine-tuning jsonl file with the above saved dataframe, run the following command in this directory:
```
openai tools fine_tunes.prepare_data -f "APUSH.csv"
```
Then, run this command to train an openai model on the find tuned data:
```
openai api fine_tunes.create -t "APUSH_prepared.jsonl" -m "curie"
```
Run if stream interrupted:<br>
```
openai api fine_tunes.follow -i <YOUR_FINE_TUNE_JOB_ID>
```

### Testing model

In [28]:
def get_model(prompt):
    response = openai.Completion.create(
        model="curie:ft-personal-2023-06-02-17-24-33",
        prompt=prompt,
        max_tokens=500,
        temperature=0.7,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0.3,
        stop=["\n"]
    )
    return response.choices[0].text

get_model("What happened during the Civil War?")


''