# Final ROSCOE scoring of CausalBench

## Merging datasets

In [11]:
# common imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import sys
import re
from typing import Union, List, Tuple, Dict, Any, Optional

In [12]:
# read two cvs files
overall_result = pd.read_csv('raw_datasets/bern_cna_35_gpt4cot.csv')
overall_result.head()


Unnamed: 0,ID,descriptive_id,sensical,query_type,rung,phenomenon,simpson,truth,prompt,truth_norm,pred
0,4669,confounding_a_ett_gender_pay,-1,ett,3,confounding,False,yes,"Imagine a self-contained, hypothetical world w...",1,No
1,4046,collision_n_nie_nonsense0,0,nie,3,collision,False,no,"Imagine a self-contained, hypothetical world w...",0,No
2,1597,arrowhead_c_ett_floor_wet,1,ett,3,arrowhead,False,yes,"Imagine a self-contained, hypothetical world w...",1,Yes
3,2995,chain_a_nie_smoking_gene_cancer,-1,nie,3,chain,False,yes,"Imagine a self-contained, hypothetical world w...",1,No
4,9475,nondet-diamondcut_c_nde_smoking_frontdoor,1,nde,3,nondet-diamondcut,False,yes,"Imagine a self-contained, hypothetical world w...",1,Yes


In [13]:
# keep just ID and prompt columns
overall_result = overall_result[['ID', 'prompt']]
overall_result.head()

Unnamed: 0,ID,prompt
0,4669,"Imagine a self-contained, hypothetical world w..."
1,4046,"Imagine a self-contained, hypothetical world w..."
2,1597,"Imagine a self-contained, hypothetical world w..."
3,2995,"Imagine a self-contained, hypothetical world w..."
4,9475,"Imagine a self-contained, hypothetical world w..."


In [14]:
prompt2gpt_response_lookup = pd.read_csv('raw_datasets/cache_gpt4_responses.csv')
prompt2gpt_response_lookup.head()


Unnamed: 0,pred,query
0,"No, the prisoner being alive cannot be guarant...",We know that captain commands causes rifleman1...
1,pred,query
2,"No, we cannot definitively conclude that the s...",principal's direction makes teacher1 and teach...
3,"No, based on the information provided, we cann...",principal's anger makes teacher1 and teacher2 ...
4,"No, the student might not necessarily be alive...",We know that captain commands causes professor...


In [15]:
# print full contennt of thrid row
print(prompt2gpt_response_lookup.iloc[3,1])

principal's anger makes teacher1 and teacher2 to blame the student. If either teacher1 of teacher2 blame the student, the student become emotionally damaged. We observed the principal wasn't angry. Would the student be emotionally damaged if teacher2 blame the student?  
Start your answer with "Yes", or "No", followed by step-by-step reasoning to support your explanation.


In [16]:
# Using the "prompt" from overall_result df, look up the GPT4’s reasoning chain in the .cache_gpt4_responses.csv file (use "query" to look up "pred").
# If the "pred" is not empty, then add the "pred" to the overall_result df.

overall_result['pred'] = overall_result['prompt'].apply(lambda x: prompt2gpt_response_lookup[prompt2gpt_response_lookup['query'] == x]['pred'].values[0] if len(prompt2gpt_response_lookup[prompt2gpt_response_lookup['query'] == x]['pred'].values) > 0 else np.nan)
overall_result.head()

Unnamed: 0,ID,prompt,pred
0,4669,"Imagine a self-contained, hypothetical world w...",Step 1) Extract the causal graph: The causal g...
1,4046,"Imagine a self-contained, hypothetical world w...",Step 1) Extract the causal graph: The causal g...
2,1597,"Imagine a self-contained, hypothetical world w...",Step 1) Extract the causal graph: The causal g...
3,2995,"Imagine a self-contained, hypothetical world w...",Step 1) Extract the causal graph: The causal g...
4,9475,"Imagine a self-contained, hypothetical world w...",Step 1) Extract the causal graph: The causal g...


In [17]:
# print full contennt of first row
print(overall_result.iloc[45,1])
print("#"*100)
print(overall_result.iloc[45,2])

Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Male gender has a direct effect on non-competitive department application and in-state residency. Non-competitive department application has a direct effect on admission. In-state residency has a direct effect on admission. For individuals who are not male, the probability of admission is 64.83%. For individuals who are male, the probability of admission is 43.35%. For individuals who are male, would it be more likely to see admission if the individual had not been male?

Hint: You can answer the question by following the subquestions below:

Step 1) Extract the causal graph: What is the causal graph expressed in the context? Use "X" to denote "gender". Use "V2" to denote "department applying to". Use "V3" to denote "residency status". Use "Y" to denote "admission status". Answer nothing else but each edge one by one, in the format of "var1 -> va

In [18]:
# print len of overall_result
print(len(overall_result))
# new df with no null values
overall_result_no_null = overall_result.dropna()
# print len of new df
print(len(overall_result_no_null))


612
612


In [19]:
# Check if there are two identical entries in pred column
overall_result_no_null['pred'].value_counts()

Step 1) Extract the causal graph: The causal graph expressed in the context is as follows: "V1 -> X, V1 -> V3, X -> Y, V3 -> Y".\n\nStep 2) Identify the query type: The query type of the above question is "natural indirect effect".\n\nStep 3) Translate the query to an estimand: Based on the type of the causal query, the question can be translated to a formal estimand as "E(Y_x | do(V1)) - E(Y_x | do(not V1))", where Y_x represents the counterfactual outcome of Y when X is set to x.\n\nStep 4) Collect all the available data: Unfortunately, the problem statement does not provide any specific data in the form of marginal probabilities or conditional probabilities. Therefore, we cannot proceed with this step.\n\nStep 5) Solve for the estimand: Given the lack of available data, we cannot solve for the estimand using causal inference skills such as do-calculus, counterfactual prediction, and the basics of probabilities.                                                                         

In [20]:
# Check if there are two identical entries in prompt column
overall_result_no_null['prompt'].value_counts()

Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Rujx has a direct effect on snov. Nyrp has a direct effect on snov.  If we disregard the mediation effect through snov, would rujx still positively affect nyrp?\n\nHint: You can answer the question by following the subquestions below:\n\nStep 1) Extract the causal graph: What is the causal graph expressed in the context? Use "X" to denote "rujx". Use "Y" to denote "nyrp". Use "V3" to denote "snov". Answer nothing else but each edge one by one, in the format of "var1 -> var2", and use "," to separate the edges.\n\nStep 2) Identify the query type: What is the query type of the above question? Choose one from "marginal probability", "conditional probability", "explaining away effect", "backdoor adjustment set", "average treatment effect", "collider bias", "normal counterfactual question", "average treatment effect on treated", "natural direct effect

In [21]:
# Check if there are two identical entries in ID column
overall_result_no_null['ID'].value_counts()

9527    1
690     1
683     1
3756    1
5805    1
       ..
5468    1
7517    1
5471    1
3424    1
9214    1
Name: ID, Length: 612, dtype: int64

## Formatting for ROSCOE

In [24]:
def prepare_json_for_roscoe(df: pd.DataFrame, prompt_column: str, gpt_column: str, out_name:str, gt_column: Union[str, None] = None, id_column: Union[str, None] = None) -> None:
    df_out = df[[prompt_column, gpt_column]]
    df_out.rename(columns={gpt_column: 'gpt-3', prompt_column : 'premise'}, inplace=True)
    # As strange as it sounds, hypotesis column is used for ground truth, left empty otherwise
    if gt_column is not None:
        df_out['hypothesis'] = "IGNORE THIS. Ground truth here for reference. " + df[gt_column]
    else:
        df_out['hypothesis'] = " "
    if id_column is not None:
        df_out['ID'] = df[id_column] 
    
    df_out.to_json("datasets/" + out_name + '.jsonl', orient='records', lines=True)
    return None

In [25]:
prepare_json_for_roscoe(overall_result_no_null, 'prompt', 'pred', 'cb1', gt_column=None, id_column='ID')

## Run ROSCOE

```python
python projects/roscoe/roscoe.py -t sim_sce -m facebook/roscoe-512-roberta-base --dataset-path ../CausalLLMs/roscoe_exp/datasets --datasets cb1
```