# Experiment 2: Error Analysis

__Objective:__ Qualitative analysis of bad predictions, noting confusing categories (i.e., missing objects, hallucinations, grammatical errors).

__Research Question(s):__ 
* Where do modern generative captioning models hallucinate or misinterpret visual content?
* What failure modes occur when captioning under visual ambiguity (e.g., occlusions, cluttered scenes, etc.)?

In [1]:
import logging
# Configure logging
logging.basicConfig(
    level=logging.INFO, # Set the minimum logging level (e.g., INFO, DEBUG, WARNING, ERROR, CRITICAL)
    format="%(asctime)s - %(levelname)s - %(message)s",
    datefmt="%Y:%m:%d %H:%M"
)
# Get a logger instance for this notebook
logger = logging.getLogger(__name__) 

logger.info("Notebook execution started.")

2025:07:30 23:23 - INFO - Notebook execution started.


In [2]:
%load_ext autoreload
%autoreload 2

In [12]:
import os
import pandas as pd
from evaluation_sheet_reader import read_single_evaluation_sheet

In [13]:
excel_filename = 'error_taxonomy_sheet.xlsx'
excel_filepath = os.path.abspath(excel_filename)

# Log full path
logger.info(f"Looking for evaluation file at: {excel_filepath}")

# Check file existence and load
if not os.path.isfile(excel_filepath):
    logger.warning(f"Evaluation file not found: {excel_filepath}")
else:
    logger.info(f"Loading evaluation data from: {excel_filepath}")
    evaluation_df = read_single_evaluation_sheet(excel_filepath)

    if not evaluation_df.empty:
        logger.info(f"Loaded evaluation data with shape: {evaluation_df.shape}")
    else:
        logger.warning("Evaluation DataFrame is empty.")

2025:07:30 23:40 - INFO - Looking for evaluation file at: /Users/shonie/Documents/MS Data Analytics Engineering/Github_Repositories/GenAI_Project/experiments/experiment2/error_taxonomy_sheet.xlsx
2025:07:30 23:40 - INFO - Loading evaluation data from: /Users/shonie/Documents/MS Data Analytics Engineering/Github_Repositories/GenAI_Project/experiments/experiment2/error_taxonomy_sheet.xlsx
2025:07:30 23:40 - INFO - Loaded evaluation data with shape: (80, 13)


In [14]:
evaluation_df.info()
evaluation_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Image Filename        80 non-null     object
 1   Ground Truth Caption  80 non-null     object
 2   Search Type           80 non-null     object
 3   Generated Caption     80 non-null     object
 4   Hallucination         79 non-null     Int64 
 5   Omission              80 non-null     Int64 
 6   Ambiguity             80 non-null     Int64 
 7   Grammatical Error     79 non-null     Int64 
 8   Repetition            79 non-null     Int64 
 9   Wrong Action          79 non-null     Int64 
 10  Fine-Grained Error    79 non-null     Int64 
 11  Correct (No Error)    79 non-null     Int64 
 12  source_file           80 non-null     object
dtypes: Int64(8), object(5)
memory usage: 8.9+ KB


Unnamed: 0,Image Filename,Ground Truth Caption,Search Type,Generated Caption,Hallucination,Omission,Ambiguity,Grammatical Error,Repetition,Wrong Action,Fine-Grained Error,Correct (No Error),source_file
0,862054277_34b5a6f401.jpg,a young girl in a pink swimsuit,Greedy,a man in a wetsuit is riding a dirt bike,1,1,0,0,0,1,0,0,error_taxonomy_sheet.xlsx
1,862054277_34b5a6f401.jpg,a young girl in a pink swimsuit,Beam,a black and white dog is running through the w...,1,1,0,0,0,1,0,0,error_taxonomy_sheet.xlsx
2,3181701312_70a379ab6e.jpg,a man covered with a blanket is asleep on the ...,Greedy,a man in a red shirt is standing in front of a...,1,1,0,0,0,1,0,0,error_taxonomy_sheet.xlsx
3,3181701312_70a379ab6e.jpg,a man covered with a blanket is asleep on the ...,Beam,a man in a blue shirt is standing in front of ...,1,1,0,0,0,1,0,0,error_taxonomy_sheet.xlsx
4,624742559_ff467d8ebc.jpg,a child swinging on a playground play set,Greedy,a woman is sitting on a bench reading a newspaper,1,1,0,0,0,1,0,0,error_taxonomy_sheet.xlsx
