# Evaluation Notebook

This notebook is designed to **repeat the experimental tests** conducted to evaluate the performance of the system.

Before running the tests, make sure you have the following prerequisites correctly configured:

### Requirements
1. **A running instance of a Neo4j database**, either local or remote (e.g. AuraDB).
2. **A valid OpenAI API key** to access the language and embedding models.
3. **A properly configured `.env` file**, located at the root of the project or in the same folder as this notebook, with the following format:
- `NEO4J_URI`
- `NEO4J_USERNAME`
- `NEO4J_PASSWORD`
- `OPENAI_API_KEY`
- `ONTOLOGY_FILE=../models/TAMOntology.ttl `

### Notebook Structure

The notebook is organized into the following main sections:

1. **Import and Setup**
   - Loads environment variables and dependencies
   - Connects to Neo4j and configures the models

2. **Knowledge Graph Population**
   - Loads user interaction data
   - Processes and adds normalized information into the graph
   - Resolves entity duplication

3. **Evaluation and Testing**
   - Loads predefined QA datasets
   - Runs tests using both GraphRAG and RAG
   - Computes and compares performance metrics 

---

> Run the notebook sequentially to ensure correct state and results.

## Imports and Setup

In [None]:
import sys, os
sys.path.append(os.path.abspath("../src"))

import KG_construction
import utils
import RAGAS_test
import nest_asyncio
import pandas as pd
import json
import csv

### Configuration: User and Paths
Defines the username and base paths for data used throughout the notebook. \
Update these values if you work with different test profiles or directory structures.

In [None]:
# Name of the user to be used in the test
user = "Mateo"

# Path to the data files
path_data_profiles = "./data/profiles/"  # Synthetic user profiles
path_data_qa = "./data/qa/"              # QA pairs for evaluation
path_data_results = "./data/results/"    # Output folder for saving results

## Knowldege Graph Population

### Load User Interaction Dataset (CSV)

In [None]:
import os
import pandas as pd

# Construct full path to the user's CSV profile
csv_filename = f"{user}.csv"
csv_path = os.path.join(path_data_profiles, csv_filename)

# Check file existence
if not os.path.exists(csv_path):
    raise FileNotFoundError(f"CSV not found at: {csv_path}")

# Load the CSV into a DataFrame
df = pd.read_csv(csv_path)

# Preview the first few rows
print("Loaded profile:")
display(df.head())

### Clear the exiting graph 

In [None]:
# Step 1: Clear the existing graph
print("Resetting the Knowledge Graph...")
utils.reset_knowledge_graph()
print("Graph cleared.")

### Process and insert each interaction

In [None]:
import asyncio

# Step 2: Process and insert each interaction from the CSV
print("Processing and inserting user interactions into the graph...")

for index, row in df.iterrows():
    raw_input = row["interaction"]
    reference_date = row["date"]
    user_name = row["user"]

    processed_input = utils.process_text(
        text=raw_input,
        current_date=reference_date,
        user_name=user_name
    )

    # Insert into the Knowledge Graph
    result = asyncio.run(KG_construction.add_user_input_to_kg(processed_input))
    print(f"Inserted interaction [{index + 1}]: {result}")

# Run entity resolution after bulk insertion
asyncio.run(KG_construction.resolve_kg_entities())
print("Entity resolution completed.")

## Evaluation and Testing

### Load QA dataset

In [None]:
# Load QA dataset
qa_filename = user + "_qa.json"
qa_path = os.path.join(path_data_qa, qa_filename)

with open(qa_path, "r", encoding="utf-8") as f:
    qa_dataset = json.load(f)

print(f"Loaded {len(qa_dataset)} QA pairs for evaluation.")

### GraphRAG Evaluation

In [None]:
# Run GraphRAG evaluation
print("Running evaluation with GraphRAG (graph context)...")
graphRAG_results = RAGAS_test.evaluate_graphRAG(qa_dataset)

with open(path_data_results + "graphRAG_results.csv", mode='a', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(["Mateo", str(graphRAG_results)])

### RAG Evaluation

In [None]:
# Run RAG evaluation
print("Running evaluation with standard RAG (text chunks)...")
rag_results = RAGAS_test.evaluate_RAG(qa_dataset)

with open(path_data_results + "rag_results.csv", mode='a', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(["Mateo", str(rag_results)])

### Display and compare results

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Convert RAGAS results to DataFrames
#df_graphRAG = pd.DataFrame(graphRAG_results.scores)
#df_RAG = pd.DataFrame(rag_results.scores)

# Compute average scores for each metric
mean_graphrag = df_graphRAG.mean()
mean_rag = df_RAG.mean()

# Create a comparison DataFrame
comparison_df = pd.DataFrame({
    'GraphRAG': mean_graphrag,
    'Standard RAG': mean_rag
})

# Display the average values
display(comparison_df.round(4))

# Plot comparison as a bar chart
ax = comparison_df.plot(kind='bar', figsize=(10, 6), rot=45, color=['#0f8b8d', '#07435d'])

plt.title('Average Metric Comparison: GraphRAG vs RAG')
plt.ylabel('Score (0–1)')
plt.ylim(0, 1)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.legend(loc='lower right')
plt.tight_layout()
plt.show()


## Aggregate Metric Comparison Across Profiles
After running the evaluation for each test profile individually the results are saved in CSV files—one for GraphRAG and one for RAG. 

By reading these CSV files, we can:
- load the results for all tested profiles,
- **compute the average of each metric across users**, for both GraphRAG and RAG,
- **visualize the aggregated comparison** using a grouped bar chart.

In [None]:
import ast
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

# === File paths for result CSVs ===
CSV_GRAPH = Path(path_data_results + "graphRAG_results.csv")
CSV_RAG   = Path(path_data_results + "rag_results.csv")

# === Plot styling ===
COLOR_GRAPH = "#0f8b8d"
COLOR_RAG   = "#07435d"
BAR_WIDTH   = 0.30

# === Metric name mapping for display ===
METRIC_LABELS = {
    "llm_context_precision_with_reference": "Context precision",
    "context_recall":                      "Context recall",
    "answer_relevancy":                    "Answer relevancy",
    "faithfulness":                        "Faithfulness",
    "semantic_similarity":                 "Answer similarity",
}

# === Plot title and config ===
TITLE = "GraphRAG vs RAG"
YLABEL = "Mean value"
ANNOTATE_BARS = True


def load_metrics(csv_path: Path) -> pd.DataFrame:
    """
    Load metrics from a CSV file with rows like:
        "username","{'metricA': 0.5, 'metricB': 0.7, ...}"
    
    Returns:
        A DataFrame with:
            - index = profile names
            - columns = individual numeric metrics
    """
    raw = pd.read_csv(
        csv_path,
        header=None,
        names=["profile", "metrics_str"],
        quotechar='"',
        skipinitialspace=True,
        engine="python",
    )

    # Convert stringified dict into a dictionary, then to columns
    expanded = raw["metrics_str"].apply(ast.literal_eval).apply(pd.Series)
    expanded.index = raw["profile"]
    return expanded


def bar_labels(ax, bars):
    """
    Annotate bars with their height values.
    """
    for bar in bars:
        height = bar.get_height()
        ax.annotate(
            f"{height:.2f}",
            xy=(bar.get_x() + bar.get_width() / 2, height),
            xytext=(0, 4),
            textcoords="offset points",
            ha="center",
            va="bottom",
            fontsize=9,
        )
        
# Load evaluation results
df_graph = load_metrics(CSV_GRAPH)
df_rag   = load_metrics(CSV_RAG)

# Compute average metric scores
mean_graph = df_graph.mean().rename("GraphRAG")
mean_rag   = df_rag.mean().rename("RAG")

# Combine into a comparison DataFrame
comparison = pd.DataFrame([mean_graph, mean_rag]).T
comparison.index.name = "Metric"

# Show comparison table
print("\nMean metric scores (GraphRAG vs RAG):")
print(comparison.round(4))
print()


# Apply basic plot styling
plt.rcParams.update({
    "font.size": 11,
    "axes.spines.right": False,
    "axes.spines.top":   False,
})

# Metric selection and label formatting
metrics = list(METRIC_LABELS.keys())
labels = [METRIC_LABELS[m] for m in metrics]
x = np.arange(len(metrics))

# Create plot
fig, ax = plt.subplots(figsize=(9, 5))

# Plot GraphRAG bars
bars_graph = ax.bar(
    x - BAR_WIDTH / 2,
    comparison.loc[metrics, "GraphRAG"],
    BAR_WIDTH,
    label="GraphRAG",
    color=COLOR_GRAPH,
)

# Plot RAG bars
bars_rag = ax.bar(
    x + BAR_WIDTH / 2,
    comparison.loc[metrics, "RAG"],
    BAR_WIDTH,
    label="RAG",
    color=COLOR_RAG,
)

# Configure x-axis
ax.set_xticks(x)
ax.set_xticklabels(labels, rotation=20, ha="right")

# Axis labels and styling
ax.set_ylim(0, 1)
ax.set_title(TITLE, pad=15, weight="bold")
ax.set_ylabel(YLABEL)
ax.grid(True, axis="y", linestyle="--", linewidth=0.5, alpha=0.7)
ax.legend(frameon=False)

# Optional: annotate bar values
if ANNOTATE_BARS:
    bar_labels(ax, list(bars_graph) + list(bars_rag))

fig.tight_layout()
plt.show()