# Analysis Overview of Advanced RAG Techniques

This notebook serves as a detailed analytical resource within a larger experimental study on advanced Retrieval-Augmented Generation (RAG) techniques, as detailed in our repository. It utilizes experimental outputs generated by tonic_main.py to analyze the performance and implications of these techniques.

## Highlights:

- **Data Insight**: Initial processing and visualization of experimental data provide a foundation for understanding the performance landscape of various RAG techniques.
- **Statistical Validation**: We employ ANOVA and Tukey's HSD tests to statistically evaluate the performance differences across techniques, ensuring our conclusions are robust and reliable.
- **Focused Analysis**: While encompassing a broad spectrum of RAG techniques, specific attention is given to high-impact findings and their significance within the larger context of LLM enhancement.



# 你需要更换为自己的领域数据和prompt

## Data preparation

In [None]:
import pandas as pd

# Load the experimental data from an Excel file
df = pd.read_excel('checkpoint_2.xlsx')


In [None]:
import ast

# Parse the 'OverallScores' column to convert string representations of dictionaries into actual dictionaries
try:
    df['OverallScores'] = df['OverallScores'].apply(ast.literal_eval)
except ValueError as e:
    print(f"Error encountered: {e}")

# Extract these dictionaries into separate columns and combine with the original DataFrame
metrics_df = df['OverallScores'].apply(pd.Series)
expanded_data = pd.concat([df.drop(columns=['OverallScores']), metrics_df], axis=1)

# Display the first few rows to verify the transformation
expanded_data.head()


## Visualization

### Boxplot of Retrieval Precision

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Visualization of Retrieval Precision across different experiments
plt.figure(figsize=(10, 8))
sns.boxplot(x='retrieval_precision', y='Experiment', data=expanded_data, palette="Set3", orient='h')
plt.title('Boxplot of Retrieval Precision by Experiment')
plt.xlabel('Retrieval Precision')
plt.ylabel('Experiment')
plt.tight_layout()
plt.show()


### Boxplot of Answer Similarity

In [None]:
# Visualization of Answer Similarity across different experiments
plt.figure(figsize=(10, 8))
sns.boxplot(x='answer_similarity', y='Experiment', data=expanded_data, palette="Set3", orient='h')
plt.title('Boxplot of Answer Similarity by Experiment')
plt.xlabel('Answer Similarity')
plt.ylabel('Experiment')
plt.tight_layout()
plt.show()


## Statistical Analysis

To statistically evaluate the differences in `retrieval_precision` and `answer_similarity` across various RAG techniques, we perform ANOVA and Tukey's Honestly Significant Difference (HSD) tests. These tests help us determine if the observed differences in metrics are statistically significant, providing a robust basis for comparing the performance of each technique.


### ANOVA

In [None]:
from scipy.stats import f_oneway

# ANOVA for retrieval precision and answer similarity
groups_rp = expanded_data.groupby('Experiment')['retrieval_precision'].apply(list)
groups_as = expanded_data.groupby('Experiment')['answer_similarity'].apply(list)

anova_rp = f_oneway(*groups_rp)
anova_as = f_oneway(*groups_as)

anova_results = {
    'Retrieval Precision': {'statistic': anova_rp.statistic, 'p-value': anova_rp.pvalue},
    'Answer Similarity': {'statistic': anova_as.statistic, 'p-value': anova_as.pvalue}
}

anova_results


### Tukey - all

In [None]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import pandas as pd

tukey_rp = pairwise_tukeyhsd(endog=expanded_data['retrieval_precision'], groups=expanded_data['Experiment'], alpha=0.05)
tukey_result_df = pd.DataFrame(data=tukey_rp.summary().data[1:], columns=tukey_rp.summary().data[0])
tukey_result_df


### Tukey - Naive RAG

In [None]:
# Filter Tukey HSD results to focus on specific group comparisons
filtered_results = tukey_result_df[
    (tukey_result_df['group1'] == "Classic VDB + Naive RAG") |
    (tukey_result_df['group2'] == "Classic VDB + Naive RAG")
]
filtered_results

### Tukey - best classic vdb techniques


In [None]:
# Focus on experiments for a detailed comparison
experiments_focus_group1 = [
    "Classic VDB + HyDE",
    "Classic VDB + HyDE + Cohere Rerank",
    "Classic VDB + HyDE + LLM Rerank",
    "Classic VDB + LLM Rerank"
]

filtered_results = tukey_result_df[
    tukey_result_df['group1'].isin(experiments_focus_group1) &
    (tukey_result_df['group2'] == "Classic VDB + Naive RAG")
]
filtered_results


In [None]:
# Focus on experiments for a detailed comparison
experiments_focus_group1 = [
    "Classic VDB + HyDE",
    "Classic VDB + HyDE + Cohere Rerank",
    "Classic VDB + HyDE + LLM Rerank",
    "Classic VDB + LLM Rerank"
]

filtered_results = tukey_result_df[
    tukey_result_df['group1'].isin(experiments_focus_group1) &
    tukey_result_df['group2'].isin(experiments_focus_group1)
]
filtered_results

### Tukey - best classic vdb vs worst sentence window

In [None]:
# Filter the original Tukey HSD results DataFrame to include only comparisons between the two experiments of interest
filtered_results = tukey_result_df[
    ((tukey_result_df['group1'] == "Classic VDB + HyDE + LLM Rerank") & (tukey_result_df['group2'] == "Sentence window retrieval + Cohere rerank")) |
    ((tukey_result_df['group1'] == "Sentence window retrieval + Cohere rerank") & (tukey_result_df['group2'] == "Classic VDB + HyDE + LLM Rerank"))
]

# Display the filtered results
filtered_results


### Tukey - Sentence Window

In [None]:
# Filter the Tukey HSD results for comparisons involving "Sentence window retrieval"
filtered_results = tukey_result_df[
    (tukey_result_df['group1'] == "Sentence window retrieval")
]

filtered_results

### Tukey - Doc summary vs Classic VDB vs Sentence window

In [None]:
# Filter for broader comparisons across selected experiments
filtered_results = tukey_result_df[
    tukey_result_df['group1'].isin([
        "Sentence window retrieval",
        "Classic VDB + Naive RAG",
        "Document summary index + Cohere Rerank"
    ]) & tukey_result_df['group2'].isin([
        "Sentence window retrieval",
        "Classic VDB + Naive RAG",
        "Document summary index + Cohere Rerank"
    ])
]
print(filtered_results)

### Tukey - Doc summary techniques comparison

In [None]:
# Specific pairwise comparison
filtered_results = tukey_result_df[
    (tukey_result_df['group1'] == "Document summary index + Cohere Rerank") &
    (tukey_result_df['group2'] == "Document summary index + HyDE + Cohere Rerank")
]
filtered_results
