# What does GPT-3 really understand about negation?

In [1]:
__author__ = "Christopher Potts"

__date__ = "2022-10-09"

## Set-up

In [2]:
import openai
import pandas as pd
import re

In [3]:
# Add you API key here to re-run the experiments:

openai.api_key = None

In [4]:
pd.set_option('max_colwidth', 999)
pd.set_option('display.max_rows', 999)
pd.set_option('display.max_columns', 999)

## Dataset

This is a sample from a dataset created by Selena She and Atticus Geiger, extending the MoNLI dataset of https://aclanthology.org/2020.blackboxnlp-1.16/

In [5]:
DATASET = pd.read_csv("extended-monli-sample.csv")

In [6]:
DATASET.shape[0]

200

In [7]:
overview = DATASET.groupby('split').apply(lambda s: s.groupby('gold_label').sample(1, random_state=1)).reset_index(drop=True)
overview = overview[['split', 'sentence1', 'gold_label', 'sentence2']]

In [8]:
overview

Unnamed: 0,split,sentence1,gold_label,sentence2
0,both not scoping,The not allergic person did like cranberries but prefers not to eat vegetables.,entailment,The not allergic person did like fruit but prefers not to eat vegetables.
1,both not scoping,"A man who is not here is holding something, but not in his hands.",neutral,"A man who is not here is holding stamps, but not in his hands."
2,one not scoping,The skier noticed a large elm branch dragging behind him but did not move it.,entailment,The skier noticed a large tree branch dragging behind him but did not move it.
3,one not scoping,"the boy plays an instrument, but he does not want to play anymore.",neutral,"the boy plays a fiddle, but he does not want to play anymore."
4,one scoping,The guests do not have food.,entailment,The guests do not have rice.
5,one scoping,A child is not tossing blocks,neutral,A child is not tossing anything
6,"one scoping, one not",People who are not wealthy do not have any computers,entailment,People who are not wealthy do not have any laptops
7,"one scoping, one not",The woman who did not like clothes did not like meatloaf.,neutral,The woman who did not like clothes did not like meat.


In [9]:
# LaTex version of the above for the write-up:

print(overview.to_latex(index=None))

\begin{tabular}{llll}
\toprule
                split &                                                                        sentence1 &  gold\_label &                                                                       sentence2 \\
\midrule
     both not scoping &  The not allergic person did like cranberries but prefers not to eat vegetables. &  entailment &       The not allergic person did like fruit but prefers not to eat vegetables. \\
     both not scoping &                A man who is not here is holding something, but not in his hands. &     neutral &                  A man who is not here is holding stamps, but not in his hands. \\
      one not scoping &    The skier noticed a large elm branch dragging behind him but did not move it. &  entailment &  The skier noticed a large tree branch dragging behind him but did not move it. \\
      one not scoping &               the boy plays an instrument, but he does not want to play anymore. &     neutral &                   the 

In [10]:
DATASET.gold_label.value_counts()

entailment    100
neutral       100
Name: gold_label, dtype: int64

In [11]:
print(DATASET.split.value_counts().to_latex())

\begin{tabular}{lr}
\toprule
{} &  split \\
\midrule
one scoping, one not &     50 \\
one not scoping      &     50 \\
one scoping          &     50 \\
both not scoping     &     50 \\
\bottomrule
\end{tabular}



In [12]:
print(pd.crosstab(DATASET.split, DATASET.gold_label).to_latex())

\begin{tabular}{lrr}
\toprule
gold\_label &  entailment &  neutral \\
split                &             &          \\
\midrule
both not scoping     &          25 &       25 \\
one not scoping      &          25 &       25 \\
one scoping          &          25 &       25 \\
one scoping, one not &          25 &       25 \\
\bottomrule
\end{tabular}



## Prompts

In [13]:
toy_example = {"sentence1": "We didn't eat pizza", "sentence2": "We didn't eat food", 'split': 'one scoping'}

### Conditional questions

In [14]:
def conditional_question_prompt(row, i=None):   
    prem = _mid_sentence_normalization(row['sentence1'])
    hyp = _mid_sentence_normalization(row['sentence2'])    
    if i is None:
        prompt = f"Is it true that if {prem}, then {hyp}?"
    # Few-shot demonstration case:
    else:
        ans = "Yes" if row['gold_label'] == 'entailment' else "Maybe"
        prompt = f"Q{i}: Is it true that if {prem}, then {hyp}?\nA{i}: {ans}"
    return prompt


def _mid_sentence_normalization(s):
    s = s[0].lower() + s[1: ]
    s = s.rstrip(".")
    return s

In [15]:
print(conditional_question_prompt(toy_example))

Is it true that if we didn't eat pizza, then we didn't eat food?


### Few-shot conditional questions

In [16]:
def few_shot_conditional_question_prompt(row, n=3):
    splitname = row['split']
    demos = _sample_demos(row, n)
    strs = [conditional_question_prompt(row, i+1) for i, row in demos.iterrows()]
    strs.append("Q: " + conditional_question_prompt(row) + "\nA:")
    return "\n\n".join(strs)


def _sample_demos(row, n):
    # Demonstrations are different examples from the same split,
    # and we get 2 per label:
    split = DATASET[
        (DATASET.split == row['split']) & 
        (DATASET.sentence1 != row['sentence1']) & 
        (DATASET.sentence2 != row['sentence2'])
    ]        
    demos = split.groupby('gold_label').sample(2).reset_index()
    return demos

In [17]:
print(few_shot_conditional_question_prompt(toy_example))

Q1: Is it true that if the girls are cutting the cake and not letting any man cut it, then the girls are cutting the cake and not letting any boyfriends cut it?
A1: Yes

Q2: Is it true that if the man is not listening to music, then the man is not listening to reggae?
A2: Yes

Q3: Is it true that if the three children are not holding ferns, then the three children are not holding plants?
A3: Maybe

Q4: Is it true that if there is not a tugboat nearby, then there is not a boat nearby?
A4: Maybe

Q: Is it true that if we didn't eat pizza, then we didn't eat food?
A:


### Hypothesis questions

In [18]:
def hypothesis_question_prompt(row, i=None):   
    prem = _mid_sentence_normalization(row['sentence1'])
    hyp = _mid_sentence_normalization(row['sentence2'])    
    if i is None:
        prompt = f"Assume that {prem}. Is it then definitely true that {hyp}? Answer Yes or No."
    # Few-shot demonstration case:
    else:
        ans = "Yes" if row['gold_label'] == 'entailment' else "No"
        prompt = f"Q{i}: Assume that {prem}. Is it then definitely true that {hyp}? Answer Yes or No.\nA{i}: {ans}"
    return prompt

In [19]:
print(hypothesis_question_prompt(toy_example))

Assume that we didn't eat pizza. Is it then definitely true that we didn't eat food? Answer Yes or No.


### Few-shot hypothesis questions

In [20]:
def few_shot_hypothesis_question_prompt(row, n=3):
    splitname = row['split']
    demos = _sample_demos(row, n)
    strs = [hypothesis_question_prompt(row, i+1) for i, row in demos.iterrows()]
    strs.append("Q: " + hypothesis_question_prompt(row) + "\nA:")
    return "\n\n".join(strs)

In [21]:
print(few_shot_hypothesis_question_prompt(toy_example))

Q1: Assume that the men were outside trying to keep their voices down so as not to waken any woman indoors. Is it then definitely true that the men were outside trying to keep their voices down so as not to waken any bachelorette indoors? Answer Yes or No.
A1: Yes

Q2: Assume that the woman is not wearing jewelry. Is it then definitely true that the woman is not wearing rings? Answer Yes or No.
A2: Yes

Q3: Assume that there is not a single Democrat walking in the city. Is it then definitely true that there is not a single person walking in the city? Answer Yes or No.
A3: No

Q4: Assume that the man is not listening to polka. Is it then definitely true that the man is not listening to music? Answer Yes or No.
A4: No

Q: Assume that we didn't eat pizza. Is it then definitely true that we didn't eat food? Answer Yes or No.
A:


### Conditional truth evaluation

In [22]:
def conditional_truth_evaluation_prompt(row, i=None):
    prem = _mid_sentence_normalization(row['sentence1'])
    hyp = _mid_sentence_normalization(row['sentence2'])
    if i is None:    
        prompt = f"If {prem}, then {hyp}. Is this true?" 
    # Few-shot demonstration case:
    else:
        ans = "Yes" if row['gold_label'] == 'entailment' else "Maybe"
        prompt = f"C{i}: If {prem}, then {hyp}. Is this true?\nA{i}: {ans}"
    return prompt

In [23]:
print(conditional_truth_evaluation_prompt(toy_example))

If we didn't eat pizza, then we didn't eat food. Is this true?


In [24]:
def few_shot_conditional_truth_evaluation_prompt(row, n=3):
    splitname = row['split']
    demos = _sample_demos(row, n)
    strs = [conditional_truth_evaluation_prompt(row, i+1) for i, row in demos.iterrows()]
    strs.append("C:" + conditional_truth_evaluation_prompt(row) + "\nA:")
    return "\n\n".join(strs)

### Few-shot conditional truth evaluation

In [25]:
print(few_shot_conditional_truth_evaluation_prompt(toy_example))

C1: If the guests do not have food, then the guests do not have rice. Is this true?
A1: Yes

C2: If there is not a single person walking in the city, then there is not a single official walking in the city. Is this true?
A2: Yes

C3: If there is not a single Democrat walking in the city, then there is not a single person walking in the city. Is this true?
A3: Maybe

C4: If the diver has not seen any tuna on his dive, then the diver has not seen any fish on his dive. Is this true?
A4: Maybe

C:If we didn't eat pizza, then we didn't eat food. Is this true?
A:


### Brown et al.-style

Adapted from https://arxiv.org/abs/2005.14165

In [26]:
def brown_et_al_style_prompt(row, i=None):
    prem = row['sentence1']
    hyp = row['sentence2']
    hyp = hyp.rstrip(".")
    if i is None:
        prompt = f"C: {prem}\nQ: {hyp}. Yes, No, or Maybe?"
    # Few-shot demonstration case:
    else:
        ans = "Yes" if row['gold_label'] == 'entailment' else "Maybe"
        prompt = f"C{i}: {prem}\nQ{i}: {hyp}. Yes, No, or Maybe?\nA{i+1}: {ans}"
    return prompt

In [27]:
print(brown_et_al_style_prompt(toy_example))

C: We didn't eat pizza
Q: We didn't eat food. Yes, No, or Maybe?


### Few-shot Brown et al.-style

In [28]:
def few_shot_brown_et_al_style_prompt(row, n=3):
    demos = _sample_demos(row, n) 
    strs = [brown_et_al_style_prompt(row, i+1) for i, row in demos.iterrows()]
    strs.append(brown_et_al_style_prompt(row) + "\nA:")
    return "\n\n".join(strs)   

In [29]:
print(few_shot_brown_et_al_style_prompt(toy_example))

C1: The woman is not wearing jewelry
Q1: The woman is not wearing necklaces. Yes, No, or Maybe?
A2: Yes

C2: A woman did not like any food.
Q2: A woman did not like any pudding. Yes, No, or Maybe?
A3: Yes

C3: The man is not listening to polka.
Q3: The man is not listening to music. Yes, No, or Maybe?
A4: Maybe

C4: A child is not tossing blocks
Q4: A child is not tossing anything. Yes, No, or Maybe?
A5: Maybe

C: We didn't eat pizza
Q: We didn't eat food. Yes, No, or Maybe?
A:


### Structured

In [30]:
def structured_prompt(row, i=None):
    prem = row['sentence1']
    hyp = row['sentence2']    
    if i is None:    
        prompt = f"P: {prem}\nH: {hyp}\nL:"
    # Few-shot demonstration case:
    else:
        prompt = f"P{i}: {prem}\nH{i}: {hyp}\nL{i}: {row['gold_label']}"
    return prompt

In [31]:
print(structured_prompt(toy_example))

P: We didn't eat pizza
H: We didn't eat food
L:


### Few-shot structured

In [32]:
def few_shot_structured_prompt(row, n=3):
    demos = _sample_demos(row, n)    
    strs = [structured_prompt(row, i+1) for i, row in demos.iterrows()]
    strs.append(structured_prompt(row))
    return "\n\n".join(strs) 

In [33]:
print(few_shot_structured_prompt(toy_example))

P1: The person did not like any fruit.
H1: The person did not like any grapes.
L1: entailment

P2: The woman is not wearing jewelry
H2: The woman is not wearing rings
L2: entailment

P3: Two people are not fishing from a skiff
H3: Two people are not fishing from a boat
L3: neutral

P4: The man is not listening to polka.
H4: The man is not listening to music.
L4: neutral

P: We didn't eat pizza
H: We didn't eat food
L:


### Reasoning

Loosely inspired by https://arxiv.org/pdf/2102.07350.pdf, https://arxiv.org/pdf/2201.11903.pdf, and others.

In [34]:
def reasoning_prompt(row):
    prem = row['sentence1']
    hyp = row['sentence2']
    prem = _mid_sentence_normalization(prem)
    hyp = _mid_sentence_normalization(hyp)
    prompt = f"""Logical and commonsense reasoning exam.

Explain your reasoning in detail, then answer with Yes or No. Your answers should follow this 4-line format:

Premise: <a tricky logical statement about the world>.
Question: <question requiring logical deduction>.
Reasoning: <an explanation of what you understand about the possible scenarios>.
Answer: <Yes or No>.

Premise: {prem}
Question: Can we logically conclude for sure that {hyp}?
Reasoning: Let's think logically step by step. The premise basically tells us that"""
    return prompt

In [35]:
print(reasoning_prompt(toy_example))

Logical and commonsense reasoning exam.

Explain your reasoning in detail, then answer with Yes or No. Your answers should follow this 4-line format:

Premise: <a tricky logical statement about the world>.
Question: <question requiring logical deduction>.
Reasoning: <an explanation of what you understand about the possible scenarios>.
Answer: <Yes or No>.

Premise: we didn't eat pizza
Question: Can we logically conclude for sure that we didn't eat food?
Reasoning: Let's think logically step by step. The premise basically tells us that


## Label inference

In [36]:
def run_gpt3(prompts, engine="text-davinci-002", temperature=0.0, batch_size=10, max_tokens=200, **gpt3_kwargs):    
    all_responses = []    
    for i in range(0, len(prompts), batch_size):        
        response = openai.Completion.create(
            engine=engine,       
            prompt=prompts[i: i+batch_size],
            temperature=temperature,
            echo=False,
            max_tokens=max_tokens,
            n=1,
            **gpt3_kwargs)
        # We'll keep just the response texts:
        all_responses += [d['text'].strip() for d in response['choices']]
    return all_responses

## Inferring labels

In [37]:
def infer_label(s):
    yes = re.compile(r"\byes\b", re.I | re.M)     
    if yes.search(s):
        return "entailment"
    else:
        return "neutral" 

## Experiment wrapper

In [38]:
def run_experiment(samp, prompt_func, **gpt3_kwargs):
    prompt_func_name = prompt_func.__name__        
    samp[prompt_func_name] = samp.apply(prompt_func, axis=1)    
    responses = run_gpt3(list(samp[prompt_func_name].values), **gpt3_kwargs)    
    samp[prompt_func_name + "_response"] = responses    
    samp[prompt_func_name + "_prediction"] = samp[prompt_func_name + "_response"].apply(infer_label)    
    return samp

## Results reporting

In [39]:
def report_results(samp, prompt_func):
    accurate = samp[prompt_func.__name__ + "_prediction"] == samp.gold_label
    results = pd.crosstab(samp.split, accurate)
    results.loc['All'] = results.sum(axis=0)    
    acc = results.apply(lambda row: row[True] / row.sum(), axis=1)        
    results['Accuracy'] = acc
    # Some clean-up for the LaTeX output:
    results = results.rename(columns={True: "Correct", False: "Incorrect"})
    results.columns.name = None
    results.index.name = None
    return results

## Experiments

### Prompt conditions

In [40]:
prompt_funcs = (
    conditional_question_prompt,
    few_shot_conditional_question_prompt,
    hypothesis_question_prompt,
    few_shot_hypothesis_question_prompt,
    conditional_truth_evaluation_prompt,
    few_shot_conditional_truth_evaluation_prompt,
    brown_et_al_style_prompt,
    few_shot_brown_et_al_style_prompt,
    structured_prompt,
    few_shot_structured_prompt,
    reasoning_prompt
)

### Experiment loop

In [41]:
for prompt_func in prompt_funcs:
    print(prompt_func.__name__)
    DATASET = run_experiment(DATASET, prompt_func, max_tokens=200)

conditional_question_prompt
few_shot_conditional_question_prompt
hypothesis_question_prompt
few_shot_hypothesis_question_prompt
conditional_truth_evaluation_prompt
few_shot_conditional_truth_evaluation_prompt
brown_et_al_style_prompt
few_shot_brown_et_al_style_prompt
structured_prompt
few_shot_structured_prompt
reasoning_prompt


### Results

These are formatted for dirct use in the report.

In [42]:
for prompt_func in prompt_funcs:
    print("%" * 70, end="\n")
    sechead = prompt_func.__name__.replace("_", " ").title().replace("Few Shot", "Few-Shot")    
    print(f"\\subsection{{{sechead}}}",  end="\n\n") 
    p = prompt_func(toy_example)
    p = re.sub(r"\n", r"\\mynewline\n", p, re.M)         
    print(f"\\promptExample{{{p}}}", end="\n\n")        
    print("\\begin{center}")
    print(report_results(DATASET, prompt_func).to_latex(float_format="%.2f"), end="")
    print("\\end{center}", end="\n\n")

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Conditional Question Prompt}

\promptExample{Is it true that if we didn't eat pizza, then we didn't eat food?}

\begin{center}
\begin{tabular}{lrrr}
\toprule
{} &  Incorrect &  Correct &  Accuracy \\
\midrule
both not scoping     &         22 &       28 &      0.56 \\
one not scoping      &         23 &       27 &      0.54 \\
one scoping          &         25 &       25 &      0.50 \\
one scoping, one not &         25 &       25 &      0.50 \\
All                  &         95 &      105 &      0.53 \\
\bottomrule
\end{tabular}
\end{center}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Few-Shot Conditional Question Prompt}

\promptExample{Q1: Is it true that if the men were outside trying to keep their voices down so as not to waken any woman indoors, then the men were outside trying to keep their voices down so as not to waken any bachelorette indoors?\mynewline
A1

## Error analysis

In [43]:
def error_analysis(samp, prompt_func, splitname):
    prompt_func_name = prompt_func.__name__
    pred_colname = prompt_func_name + "_prediction"
    response_colname = prompt_func_name + "_response"
    err_df = samp[(samp[pred_colname] != samp.gold_label) & (samp.split == splitname)]
    return err_df[['sentence1', 'gold_label', 'sentence2', prompt_func_name, pred_colname, response_colname]]

In [44]:
error_analysis(DATASET, reasoning_prompt, "one scoping")

Unnamed: 0,sentence1,gold_label,sentence2,reasoning_prompt,reasoning_prompt_prediction,reasoning_prompt_response
0,The man is not allergic to any snakes.,entailment,The man is not allergic to any pythons.,"Logical and commonsense reasoning exam.\n\nExplain your reasoning in detail, then answer with Yes or No. Your answers should follow this 4-line format:\n\nPremise: <a tricky logical statement about the world>.\nQuestion: <question requiring logical deduction>.\nReasoning: <an explanation of what you understand about the possible scenarios>.\nAnswer: <Yes or No>.\n\nPremise: the man is not allergic to any snakes\nQuestion: Can we logically conclude for sure that the man is not allergic to any pythons?\nReasoning: Let's think logically step by step. The premise basically tells us that",neutral,"the man has no allergies to snakes in general. That doesn't mean that he can't be allergic to a specific type of snake, such as a python. So the answer is No, we cannot logically conclude for sure that the man is not allergic to any pythons."
1,The person did not like any fruit.,entailment,The person did not like any grapes.,"Logical and commonsense reasoning exam.\n\nExplain your reasoning in detail, then answer with Yes or No. Your answers should follow this 4-line format:\n\nPremise: <a tricky logical statement about the world>.\nQuestion: <question requiring logical deduction>.\nReasoning: <an explanation of what you understand about the possible scenarios>.\nAnswer: <Yes or No>.\n\nPremise: the person did not like any fruit\nQuestion: Can we logically conclude for sure that the person did not like any grapes?\nReasoning: Let's think logically step by step. The premise basically tells us that",neutral,"the person did not like any fruit. Grapes are a type of fruit, so it's possible that the person did not like grapes. \nAnswer: No"
2,A woman did not like any food.,entailment,A woman did not like any pudding.,"Logical and commonsense reasoning exam.\n\nExplain your reasoning in detail, then answer with Yes or No. Your answers should follow this 4-line format:\n\nPremise: <a tricky logical statement about the world>.\nQuestion: <question requiring logical deduction>.\nReasoning: <an explanation of what you understand about the possible scenarios>.\nAnswer: <Yes or No>.\n\nPremise: a woman did not like any food\nQuestion: Can we logically conclude for sure that a woman did not like any pudding?\nReasoning: Let's think logically step by step. The premise basically tells us that",neutral,"the woman did not like any food. So, we can't really say for sure if she liked pudding or not since the premise didn't mention anything about pudding specifically. \nAnswer: No"
3,The girls are cutting the cake and not letting any man cut it.,entailment,The girls are cutting the cake and not letting any boyfriends cut it.,"Logical and commonsense reasoning exam.\n\nExplain your reasoning in detail, then answer with Yes or No. Your answers should follow this 4-line format:\n\nPremise: <a tricky logical statement about the world>.\nQuestion: <question requiring logical deduction>.\nReasoning: <an explanation of what you understand about the possible scenarios>.\nAnswer: <Yes or No>.\n\nPremise: the girls are cutting the cake and not letting any man cut it\nQuestion: Can we logically conclude for sure that the girls are cutting the cake and not letting any boyfriends cut it?\nReasoning: Let's think logically step by step. The premise basically tells us that",neutral,"the girls are cutting the cake, and not letting any man cut it. This could be interpreted in two ways: either the girls are cutting the cake and not letting any boyfriends cut it, or the girls are cutting the cake and not letting any men cut it (including men who are not their boyfriends). We cannot logically conclude for sure which one of these scenarios is true, so the answer is No."
5,The person did not like fruit.,entailment,The person did not like raspberries.,"Logical and commonsense reasoning exam.\n\nExplain your reasoning in detail, then answer with Yes or No. Your answers should follow this 4-line format:\n\nPremise: <a tricky logical statement about the world>.\nQuestion: <question requiring logical deduction>.\nReasoning: <an explanation of what you understand about the possible scenarios>.\nAnswer: <Yes or No>.\n\nPremise: the person did not like fruit\nQuestion: Can we logically conclude for sure that the person did not like raspberries?\nReasoning: Let's think logically step by step. The premise basically tells us that",neutral,"the person did not like fruit in general, but it does not mention raspberries specifically. Therefore, we cannot logically conclude that the person did not like raspberries.\nAnswer: No"
7,The two people are not fishing from a boat,entailment,The two people are not fishing from a gondola,"Logical and commonsense reasoning exam.\n\nExplain your reasoning in detail, then answer with Yes or No. Your answers should follow this 4-line format:\n\nPremise: <a tricky logical statement about the world>.\nQuestion: <question requiring logical deduction>.\nReasoning: <an explanation of what you understand about the possible scenarios>.\nAnswer: <Yes or No>.\n\nPremise: the two people are not fishing from a boat\nQuestion: Can we logically conclude for sure that the two people are not fishing from a gondola?\nReasoning: Let's think logically step by step. The premise basically tells us that",neutral,"the two people are not on a boat, so we can't say for sure that they're not on a gondola. All we know is that they're not on a boat.\n\nAnswer: No"
8,the boy does not play any instrument,entailment,the boy does not play any saxophone,"Logical and commonsense reasoning exam.\n\nExplain your reasoning in detail, then answer with Yes or No. Your answers should follow this 4-line format:\n\nPremise: <a tricky logical statement about the world>.\nQuestion: <question requiring logical deduction>.\nReasoning: <an explanation of what you understand about the possible scenarios>.\nAnswer: <Yes or No>.\n\nPremise: the boy does not play any instrument\nQuestion: Can we logically conclude for sure that the boy does not play any saxophone?\nReasoning: Let's think logically step by step. The premise basically tells us that",neutral,the boy in question doesn't play any instrument. So we can't say for sure that he doesn't play the saxophone because we don't have any information about whether or not he plays the saxophone. All we know is that he doesn't play any instrument in general.\n\nAnswer: No
9,The person did not like any fruit.,entailment,The person did not like any cranberries.,"Logical and commonsense reasoning exam.\n\nExplain your reasoning in detail, then answer with Yes or No. Your answers should follow this 4-line format:\n\nPremise: <a tricky logical statement about the world>.\nQuestion: <question requiring logical deduction>.\nReasoning: <an explanation of what you understand about the possible scenarios>.\nAnswer: <Yes or No>.\n\nPremise: the person did not like any fruit\nQuestion: Can we logically conclude for sure that the person did not like any cranberries?\nReasoning: Let's think logically step by step. The premise basically tells us that",neutral,"the person did not like any fruit. This does not mean that the person did not like any cranberries, because cranberries are not fruit. The person could have still liked cranberries, even if they did not like any fruit. Therefore, we cannot logically conclude for sure that the person did not like any cranberries.\nAnswer: No"
11,There is not a single person walking in the city.,entailment,There is not a single hunter walking in the city.,"Logical and commonsense reasoning exam.\n\nExplain your reasoning in detail, then answer with Yes or No. Your answers should follow this 4-line format:\n\nPremise: <a tricky logical statement about the world>.\nQuestion: <question requiring logical deduction>.\nReasoning: <an explanation of what you understand about the possible scenarios>.\nAnswer: <Yes or No>.\n\nPremise: there is not a single person walking in the city\nQuestion: Can we logically conclude for sure that there is not a single hunter walking in the city?\nReasoning: Let's think logically step by step. The premise basically tells us that",neutral,"there is no one walking in the city. If we want to know if there is a hunter walking in the city, then we need to know if the hunter is a person. If the hunter is not a person, then we can logically conclude that there is not a single hunter walking in the city.\nAnswer: No"
13,The men were outside trying to keep their voices down so as not to waken any woman indoors,entailment,The men were outside trying to keep their voices down so as not to waken any bachelorette indoors,"Logical and commonsense reasoning exam.\n\nExplain your reasoning in detail, then answer with Yes or No. Your answers should follow this 4-line format:\n\nPremise: <a tricky logical statement about the world>.\nQuestion: <question requiring logical deduction>.\nReasoning: <an explanation of what you understand about the possible scenarios>.\nAnswer: <Yes or No>.\n\nPremise: the men were outside trying to keep their voices down so as not to waken any woman indoors\nQuestion: Can we logically conclude for sure that the men were outside trying to keep their voices down so as not to waken any bachelorette indoors?\nReasoning: Let's think logically step by step. The premise basically tells us that",neutral,"the men were outside trying to keep their voices down. The question is asking us to logically deduce whether or not the men were trying to keep their voices down so as not to waken any bachelorette indoors. We know that if the men were trying to keep their voices down so as not to waken any bachelorette indoors, then that would mean that there was a bachelorette indoors. However, the premise does not mention anything about there being a bachelorette indoors. Therefore, we cannot logically conclude for sure that the men were outside trying to keep their voices down so as not to waken any bachelorette indoors.\nAnswer: No."


In [45]:
# DATASET.to_json("extended-monli-sample-davinci-results.json", orient='records', indent=4)