## Kendal Tau Rank Distance

In this notebook we will calculate the Kendal Tau Rank Distance for 2 cases:
- The distance between the Lexical Similarity and Eigenvalue Laplacia for model GPT-3.5
- The distance between the Lexical Similarity and Eigenvalue Laplacia for model GPT-4

in this way, we will find out which methods are closer in case of ranking.

Note: since the datasets with Lexical Similarity are significant longer with 129 rows of uncertaity scores and datasets with Eigenvalue Laplacian with 19 rows of uncertainty scores, we will calculate the distance within the common questions.

In [5]:
#! pip install openpyxl

We will start by calculating the distance between the Lexical Similarity and Eigenvalue Laplacia for model GPT-3.5

In [4]:
import pandas as pd
from scipy.stats import kendalltau

# Step 1: Read lists from Excel
excel_file_LS_3 = 'results_LS_3 - Sorted.xlsx'
excel_file_EVL_3 = 'results_EVL_3 - Sorted.xlsx'
df_LS_3 = pd.read_excel(excel_file_LS_3)
df_EVL_3 = pd.read_excel(excel_file_EVL_3)

# Step 2: Identify common 'Input Prompt' elements
common_prompts = df_EVL_3['Input Prompt'].dropna().tolist()

# Step 3: Filter and order the longer DataFrame (df_LS_3) to match the shorter list (df_EVL_3)
filtered_df_LS_3 = df_LS_3[df_LS_3['Input Prompt'].isin(common_prompts)]
ordered_filtered_df_LS_3 = filtered_df_LS_3.set_index('Input Prompt').loc[common_prompts].reset_index()

# Step 4: Ensure the order of the shorter DataFrame (df_EVL_3) follows the common prompts
ordered_df_EVL_3 = df_EVL_3.set_index('Input Prompt').loc[common_prompts].reset_index()

# Step 5: Extract the 'Uncertainty' values from both ordered DataFrames
uncertain_LS_3 = ordered_filtered_df_LS_3['Uncertainty'].tolist()
uncertain_EVL_3 = ordered_df_EVL_3['Uncertainty'].tolist()

# Step 6: Calculate Kendall tau rank distance
tau, p_value = kendalltau(uncertain_LS_3, uncertain_EVL_3)

# Output the result
print(f"Kendall tau rank distance: {1 - tau}")
print(f"P-value: {p_value}")

Kendall tau rank distance: 0.9558823529411765
P-value: 0.7654110627827273


And now the distance between the Lexical Similarity and Eigenvalue Laplacia for model GPT-4

In [6]:
import pandas as pd
from scipy.stats import kendalltau

# Step 1: Read lists from Excel
excel_file_LS_4 = 'results_LS_4 - Sorted.xlsx'
excel_file_EVL_4 = 'results_EVL_4 - Sorted.xlsx'
df_LS_4 = pd.read_excel(excel_file_LS_4)
df_EVL_4 = pd.read_excel(excel_file_EVL_4)

# Step 2: Identify common 'Input Prompt' elements
common_prompts = df_EVL_4['Input Prompt'].dropna().tolist()

# Step 3: Filter and order the longer DataFrame (df_LS_4) to match the shorter list (df_EVL_4)
filtered_df_LS_4 = df_LS_4[df_LS_4['Input Prompt'].isin(common_prompts)]
ordered_filtered_df_LS_4 = filtered_df_LS_4.set_index('Input Prompt').loc[common_prompts].reset_index()

# Step 4: Ensure the order of the shorter DataFrame (df_EVL_4) follows the common prompts
ordered_df_EVL_4 = df_EVL_4.set_index('Input Prompt').loc[common_prompts].reset_index()

# Step 5: Extract the 'Uncertainty' values from both ordered DataFrames
uncertain_LS_4 = ordered_filtered_df_LS_4['Uncertainty'].tolist()
uncertain_EVL_4 = ordered_df_EVL_4['Uncertainty'].tolist()

# Step 6: Calculate Kendall tau rank distance
tau, p_value = kendalltau(uncertain_LS_4, uncertain_EVL_4)

# Output the result
print(f"Kendall tau rank distance: {1 - tau}")
print(f"P-value: {p_value}")

Kendall tau rank distance: 1.0073529411764706
P-value: 0.9603371858957213
