# Prototype for GPT Model
This prototype was tested with the AP Euro textbook, which is in the AP Euro folder. This notebook will work when moved in the AP Euro folder. It has been moved to the model directory in order to serve as a template to train all future models on other textbooks.

Saved: <br>
https://platform.openai.com/docs/guides/fine-tuning (also has JS stuff)

On JS: <br>
https://platform.openai.com/docs/api-reference/fine-tunes

In [2]:
import openai
import pandas as pd
import numpy as np
import pickle

with open("../api_key.txt", "r") as f:
    openai.api_key = f.read()

COMPLETIONS_MODEL = "text-davinci-003"

### Making textbook into finetuning data

##### Preprocessing text

In [1]:
with open("Euro.txt", "r", encoding="utf-8", errors="ignore") as f:
    text = f.read()

In [9]:
import re

input_file = 'Euro.txt'
output_file = 'Euro1.txt'

with open(input_file, 'r', errors="ignore") as file_in, open(output_file, 'w') as file_out:
    content = file_in.read()
    modified_content = re.sub(r'\n{2,}', '\n\n', content)  # Replace multiple new lines with two new lines
    modified_content = re.sub(r'\n(?!\n)', ' ', modified_content)  # Replace a single new line with a space
    file_out.write(modified_content)

##### Putting new text file into valid finetuning dataframe

In [10]:
with open("Euro1.txt", "r", encoding="utf-8", errors="ignore") as f:
    text = f.read()

df = pd.DataFrame(text.split("\n"), columns=["completion"])
df["prompt"] = ""

df = df.reindex(columns=["prompt", "completion"])

df

Unnamed: 0,prompt,completion
0,,This page intentionally left blank
1,,A History of Western Society A History of W...
2,,"Since 1300 cover image: Juan de Pareja, 1650,..."
3,,*“AP and “Advanced Placement Program are regi...
4,,Library of Congress Control Number: 200792773...
...,...,...
11337,,"Reduced spending on Big Science, 1980s Compu..."
11338,,"U.S. Genome Project begins, 1990 First WWW s..."
11339,,"Solzhenitsyn returns to Russia, 1994 Author ..."
11340,,"Calatrava, Tenerife Concert Hall, 2003"


##### Removing invalid characters from dataframe

In [12]:
pattern = r"[^A-Za-z0-9\n,. ]"

df_cleaned = df.applymap(lambda x: re.sub(pattern, "", str(x)))

df = df_cleaned
df

Unnamed: 0,prompt,completion
0,,This page intentionally left blank
1,,A History of Western Society A History of We...
2,,"Since 1300 cover image Juan de Pareja, 1650, ..."
3,,AP and Advanced Placement Program are registe...
4,,Library of Congress Control Number 2007927730...
...,...,...
11337,,"Reduced spending on Big Science, 1980s Compu..."
11338,,"U.S. Genome Project begins, 1990 First WWW s..."
11339,,"Solzhenitsyn returns to Russia, 1994 Author ..."
11340,,"Calatrava, Tenerife Concert Hall, 2003"


In [13]:
df.to_csv("Euro.csv", index=False)

### Finetuning model

Now, to create a fine-tuning jsonl file with the above saved dataframe, run the following command in this directory:
```
openai tools fine_tunes.prepare_data -f "Euro.csv"
```
Then, run this command to train an openai model on the find tuned data:
```
openai api fine_tunes.create -t "Euro_prepared.jsonl" -m "curie"
```
Run if stream interrupted:<br>
```
openai api fine_tunes.follow -i <YOUR_FINE_TUNE_JOB_ID>
```

### Testing model

In [17]:
def get_model(prompt):
    response = openai.Completion.create(
        model="curie:ft-personal-2023-05-19-18-07-57",
        prompt=prompt,
        max_tokens=1000,
        temperature=0.7,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0.3,
        stop=["\n"]
    )
    return response.choices[0].text

get_model("What caused the rise of fascism in WW2?")


'  What was Hitlers answer to the crisis of modern society  In Hitlers Germany, a number of scholars have argued  that the general crisis of western European society in the  1920s and 1930s played a major role in bringing fascism  to power in Germany. After the Great War, these crises  included unemployment, inflation, and the breakdown  of political parties and traditional political systems.  Unemployment in Germany reached a high of 8 million  in 1932, more than 10 percent of the labor force. In the  1920s, many middleclass people and intellectuals also  lost much of their real income because they were unable to  keep up with the price increases of everyday items.  Hitlers answer to this general crisis of society was to attack  it from above and from below. He sought to create a new  elite of wellborn, ambitious Nazis who would commandeer  all the privileges and opportunities afforded by the  modern state. Hitler recruited poor and lowermiddle class people for his SA Brownshirts, and