# Human Evaluation (W4)

This notebook consolidates human scores from `docs/human_eval_template.csv`.

Steps:
1) Fill the CSV with scores (1â€“5).
2) Run the cells below to compute summaries and visuals.

Notes:
- If plots do not render, install dependencies: `pip install pandas matplotlib`.


In [5]:
from pathlib import Path

import pandas as pd

csv_path = Path("../docs/human_eval_template.csv")
print("CSV:", csv_path.resolve())

score_fields = [
    "coherence_1_5",
    "creativity_1_5",
    "faithfulness_1_5",
    "overall_1_5",
]

# Load data
scores_df = pd.read_csv(csv_path)
print("Rows:", len(scores_df))

# Coerce score columns to numeric
for field in score_fields:
    scores_df[field] = pd.to_numeric(scores_df[field], errors="coerce")

# Quick preview
scores_df.head(3)

CSV: C:\Users\gemim\OneDrive\Bureau\M1-cours-Data engineer\MSC 1 AI\Semestre 2\Foundations of machine learning and datascience\Project\docs\human_eval_template.csv
Rows: 49


Unnamed: 0,prompt,baseline_response,tuned_response,coherence_1_5,creativity_1_5,faithfulness_1_5,overall_1_5,notes,evaluator,date
0,"Once upon a time, there was a girl named Ali. ...","Once upon a time, there was a girl named Ali. ...","Once upon a time, there was a girl named Ali. ...",,,,,,,
1,Once upon a time there was a bald man. He want...,Once upon a time there was a bald man. He want...,Once upon a time there was a bald man. He want...,,,,,,,
2,"Once upon a time, there was a little boy named...","Once upon a time, there was a little boy named...","Once upon a time, there was a little boy named...",,,,,,,


In [6]:
# Summary statistics per score field
summary_rows = []
for field in score_fields:
    series = scores_df[field].dropna()
    summary_rows.append(
        {
            "metric": field,
            "count": int(series.count()),
            "mean": round(series.mean(), 2) if not series.empty else None,
            "median": round(series.median(), 2) if not series.empty else None,
            "min": round(series.min(), 2) if not series.empty else None,
            "max": round(series.max(), 2) if not series.empty else None,
        }
    )

summary_df = pd.DataFrame(summary_rows)
summary_df

Unnamed: 0,metric,count,mean,median,min,max
0,coherence_1_5,0,,,,
1,creativity_1_5,0,,,,
2,faithfulness_1_5,0,,,,
3,overall_1_5,0,,,,


In [7]:
# Overall score by evaluator (if provided)
if "evaluator" in scores_df.columns:
    evaluator_df = (
        scores_df.groupby(scores_df["evaluator"].fillna("(unknown)"))["overall_1_5"]
        .agg(["count", "mean"])
        .sort_values("mean", ascending=False)
    )
    evaluator_df
else:
    print("No evaluator column in CSV.")

In [8]:
import matplotlib.pyplot as plt

# Bar chart of mean scores
means = summary_df.set_index("metric")["mean"].dropna()
if not means.empty:
    ax = means.plot(kind="bar", title="Average score by metric", ylim=(1, 5))
    ax.set_xlabel("Metric")
    ax.set_ylabel("Mean score")
    plt.tight_layout()
    plt.show()
else:
    print("No scores yet to plot.")

ModuleNotFoundError: No module named 'matplotlib'

In [None]:
# Distribution plots (boxplot + histogram)
plot_df = scores_df[score_fields].dropna(how="all")
if not plot_df.empty:
    ax = plot_df.plot(kind="box", title="Score distributions")
    ax.set_ylabel("Score")
    plt.tight_layout()
    plt.show()

    overall_series = scores_df["overall_1_5"].dropna()
    if not overall_series.empty:
        ax = overall_series.plot(kind="hist", bins=10, title="Overall score histogram")
        ax.set_xlabel("Overall score")
        plt.tight_layout()
        plt.show()
else:
    print("No scores yet to plot.")

In [None]:
# Correlation heatmap between metrics
corr = scores_df[score_fields].corr()
if corr.notna().any().any():
    fig, ax = plt.subplots(figsize=(4, 3))
    cax = ax.imshow(corr, vmin=-1, vmax=1, cmap="coolwarm")
    ax.set_xticks(range(len(score_fields)))
    ax.set_xticklabels(score_fields, rotation=45, ha="right")
    ax.set_yticks(range(len(score_fields)))
    ax.set_yticklabels(score_fields)
    fig.colorbar(cax, ax=ax, shrink=0.8)
    ax.set_title("Metric correlations")
    plt.tight_layout()
    plt.show()
else:
    print("Not enough data to compute correlations.")

## Interpretation guide

Use the visuals and tables above to write your report:
- Compare means across metrics to spot strengths/weaknesses.
- Inspect the boxplot for variability and outliers.
- Use the histogram to assess overall score concentration.
- Check correlations to see if metrics move together.

If scores are missing, ask evaluators to complete the CSV before summarizing.
