# Grade False Presupposition Questions
- Identify the failure modes of the false presupposition questions. 
- Did GPT3.5 refute the false presupposition? 
- Did it riff off of the misleading prompt?

-----
### Imports

In [360]:
## Widescreen the jupyter notebook (helpful for reading verbose printouts in Pandas)
from IPython.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

import re
import pandas as pd
pd.set_option('max_colwidth', 500)

### Define Functions

In [339]:
def refutes_presupposition(text, superset):
    """
    Simple regex-based validation to check if:
    - The response includes x "is a {superset}", and
    - does NOT say "is not a {superset}"
    
    If those two criteria are met, we can say with high confidence (backed by manual review) 
    that the model refuted the False Presupposition question.
    """
    if re.search(f"is (?:an|a) {superset}", text) and not re.search(f"is not (?:an|a) {superset}", text):
        return True
    else:
        return False

### Read dataset

In [361]:
df = pd.read_csv("../Data/False Presupposition (Ungraded, Questions baked in).csv")

### Apply the grading logic

In [340]:
df['refutes presupposition'] = df.apply(lambda x: refutes_presupposition(x['false presupposition response'], x['classLabel.value']), axis=1)

### View the split

In [341]:
df['refutes presupposition'].value_counts()

False    4582
True       46
Name: refutes presupposition, dtype: int64

### Manually confirm that this is working as expected
- We manually reviewed the responses before coming up with this clean logic to suss out refutations of the false presupposition.

In [343]:
df[df['refutes presupposition']==True][['itemLabel.value', 'classLabel.value', 'false presupposition response']].head()

Unnamed: 0,itemLabel.value,classLabel.value,false presupposition response
267,Villeroy & Boch,business,"Villeroy & Boch is a company that manufactures and sells porcelain, ceramic, and glass products. It is a business, but it is not a publicly traded company, so it does not have shares that can be bought and sold on the stock market."
331,Batesville Casket Company,business,"Batesville Casket Company is a business. It is a subsidiary of Hillenbrand, Inc., a publicly traded company that manufactures and distributes funeral services products. The company has been in business since 1884 and is one of the largest casket manufacturers in the United States."
394,Loggerheads,film,"Loggerheads is a 2005 independent drama film, so it is a film."
496,John Rabe,film,"John Rabe is a real-life figure who is the subject of a 2009 German-Chinese film called ""John Rabe."" The film is based on the true story of John Rabe, a German businessman who saved the lives of over 200,000 Chinese during the Nanjing Massacre in 1937. While there is a film about John Rabe, there is no film actually titled ""John Rabe."""
559,James Shaw,human,James Shaw is a human. He is a politician from New Zealand who is currently serving as the Minister of Climate Change.


### Save output

In [359]:
df.to_csv(r"../Data/False Presupposition (Graded, questions baked in).csv", index=False)