# Qualitative Evaluation Picker

In this notebook, some instances of the produced results are picked for qualitative evaluation

In [126]:
import pandas as pd

In [127]:
all_original_results = pd.read_csv('../results/allResults.csv')
all_custom_results = pd.read_csv('../results/allResults-custom.csv')

all_original_results.rename(columns={'index': 'id'}, inplace=True)
all_custom_results.rename(columns={'index': 'id'}, inplace=True)

There are 6 models: 
1. explanation
2. label
3. label-explanation

4. explanation-custom
5. label-custom
6. label-explanation-custom

In [128]:
def sampler(dataset, column_to_sort, samples_per_quartile = 2):
    '''Sample examples from each quartile of a dataset based on a column_to_sort
        
    Args:
        dataset: pandas dataframe with the results
        column_to_sort: column to sort the dataset by to use for quartile sampling
        samples_per_quartile: number of samples to take from each quartile
    
    Returns:
        pandas dataframe with the sampled examples
    '''
    # Quartile 1
    q1 = dataset[dataset[column_to_sort] <=
                    dataset[column_to_sort].quantile(0.25)]
    q1 = q1.sample(n=samples_per_quartile)

    # Quartile 2
    q2 = dataset[(dataset[column_to_sort] > dataset[column_to_sort].quantile(
        0.25)) & (dataset[column_to_sort] <= dataset[column_to_sort].quantile(0.5))]
    q2 = q2.sample(n=samples_per_quartile)

    # Quartile 3

    q3 = dataset[(dataset[column_to_sort] > dataset[column_to_sort].quantile(
        0.5)) & (dataset[column_to_sort] <= dataset[column_to_sort].quantile(0.75))]
    q3 = q3.sample(n=samples_per_quartile)

    # Quartile 4
    q4 = dataset[dataset[column_to_sort] >
                    dataset[column_to_sort].quantile(0.75)]
    q4 = q4.sample(n=samples_per_quartile)

    # Return concatenated dataframe

    return pd.concat([q1, q2, q3, q4])
    

In [129]:
# Pick num_samples_per_quartile examples from each quartile based on neural score
# We use the explanation model instead of label-explanation. Hopefully that does not have a big impact for the sorting.
custom_sampled_results = sampler(all_custom_results, 'rug-nlp-nli/flan-base-nli-explanation-custom_neural_score', samples_per_quartile = 10)
#Pick the examplees with the same id from the original results
original_sampled_results = all_original_results[all_original_results['id'].isin(custom_sampled_results['id'])]    

masked_original_results = original_sampled_results[['id', 'premise', 'hypothesis',
                                      'label',
                                      'rug-nlp-nli/flan-base-nli-explanation_prediction', 
                                      'rug-nlp-nli/flan-base-nli-label-explanation_prediction'
                                      ]].copy()

masked_custom_results = custom_sampled_results[['id', 'premise', 'hypothesis',
                                      'label',
                                      'rug-nlp-nli/flan-base-nli-explanation-custom_prediction', 
                                      'rug-nlp-nli/flan-base-nli-label-explanation-custom_prediction'
                                      ]].copy()

# Rename the label column as correct_label for clarity
masked_original_results.rename(columns={'label': 'correct_label'}, inplace=True)
# Hide model names, to reduce bias in the qualitative evaluation
masked_original_results.rename(columns={'rug-nlp-nli/flan-base-nli-explanation_prediction': 'prediction_1'}, inplace=True)
masked_original_results.rename(columns={'rug-nlp-nli/flan-base-nli-label-explanation_prediction': 'prediction_2'}, inplace=True)

# Rename the label column as correct_label for clarity
masked_custom_results.rename(columns={'label': 'correct_label'}, inplace=True)
# Hide model names, to reduce bias in the qualitative evaluation
masked_custom_results.rename(columns={'rug-nlp-nli/flan-base-nli-explanation-custom_prediction': 'prediction_3'}, inplace=True)
masked_custom_results.rename(columns={'rug-nlp-nli/flan-base-nli-label-explanation-custom_prediction': 'prediction_4'}, inplace=True)

# Order based on id, so that it's easy to compare the two tables.
masked_original_results.sort_values(by=['id'], inplace=True)
masked_custom_results.sort_values(by=['id'], inplace=True)

In [130]:
# Make a copy of the original results to add the custom results to.
all_qualitative_results = masked_original_results.copy()

# Copy prediction_3 and Prediction_4 to the original results
all_qualitative_results['prediction_3'] = masked_custom_results['prediction_3'].to_numpy()
all_qualitative_results['prediction_4'] = masked_custom_results['prediction_4'].to_numpy()

In [131]:
# Export the two dataframes to two csvs
all_qualitative_results.to_csv('../results/results_for_qual_evaluation.csv')

In [132]:
# Print the results from the original models, that use the full dataset
masked_original_results

Unnamed: 0,id,premise,hypothesis,correct_label,prediction_1,prediction_2
4579,148,A person with a purple shirt is painting an im...,A woman paints a portrait of a person.,neutral,Just because a person with a purple shirt is p...,neutral: Just because a person is painting an ...
4278,691,A busy street with numerous people interacting...,The morning rush hour fills the streets with b...,neutral,Just because a busy street with numerous peopl...,neutral: A busy street with numerous people in...
1890,793,Two people talking on a dock.,people outside,entailment,People on a dock are outside.,"entailment: Two people are people, and a dock ..."
1123,1289,A man dances with a woman in a white egyptian ...,A man is disguise as he dances with someone.,neutral,Just because a man dances with a woman in a wh...,entailment: A man dances with a woman in a whi...
8036,1296,A tan dog chases a black and white soccer ball.,A dog is at the park with it's owner.,neutral,Just because a dog chases a soccer ball does n...,neutral: Just because a dog chases a soccer ba...
6421,1545,Man sitting on bench with a suitcase in front ...,A man taking a break,neutral,Just because a man is sitting on a bench does ...,neutral: Just because a man is sitting on a be...
1112,1625,A solo shot of a man riding a four wheeler in ...,The man is riding fast on the four wheeler.,neutral,Just because a man is riding a four wheeler do...,neutral: Just because a man is riding a four w...
6130,1780,A concert with a blue shirted boy in the front...,The moment has it's fan,contradiction,A concert with a blue shirted boy in the front...,neutral: A concert with a blue shirted boy in ...
1480,1847,Africans are walking down a dirt road with str...,The people are walking in the dirt,entailment,Africans are people.,entailment: Africans are people.
7955,2187,"Two skateboarders, one wearing a black t-shirt...",Two skateboarders race.,entailment,"""Two skateboarders race each other"" is a rephr...",entailment: Two skateboarders race each other ...


In [133]:
# Print the results from the custom models, that use the cleaned dataset
masked_custom_results

Unnamed: 0,id,premise,hypothesis,correct_label,prediction_3,prediction_4
5343,148,A person with a purple shirt is painting an im...,A woman paints a portrait of a person.,neutral,Just because a person is painting an image of ...,contradiction: A person painting an image of a...
6135,691,A busy street with numerous people interacting...,The morning rush hour fills the streets with b...,neutral,Just because a busy street with numerous peopl...,neutral: A busy street with numerous people in...
4210,793,Two people talking on a dock.,people outside,entailment,People are on a dock so they must be outside.,entailment: People on a dock are outside.
1144,1289,A man dances with a woman in a white egyptian ...,A man is disguise as he dances with someone.,neutral,Just because a man dances with a woman in a wh...,entailment: A man dances with a woman in a whi...
8568,1296,A tan dog chases a black and white soccer ball.,A dog is at the park with it's owner.,neutral,A dog chases a ball does not imply it is at th...,neutral: Just because a dog chases a ball does...
8228,1545,Man sitting on bench with a suitcase in front ...,A man taking a break,neutral,Just because a man is sitting on a bench does ...,neutral: Sitting on a bench does not imply tak...
1168,1625,A solo shot of a man riding a four wheeler in ...,The man is riding fast on the four wheeler.,neutral,Just because the man is riding a four wheeler ...,neutral: Just because a man is riding a four w...
6082,1780,A concert with a blue shirted boy in the front...,The moment has it's fan,contradiction,A concert with a blue shirted boy in the front...,neutral: A concert with a blue shirted boy in ...
1491,1847,Africans are walking down a dirt road with str...,The people are walking in the dirt,entailment,Africans are people.,entailment: Africans are people.
7966,2187,"Two skateboarders, one wearing a black t-shirt...",Two skateboarders race.,entailment,Two skateboarders race each other is the same ...,entailment: Two skateboarders race each other ...
