## Project Overview: *Static Images Network Analyzer*

This is the final project for the Cognitive Learning course, titled *Static Images Network Analyzer*.  
The goal is to analyze **cognitive biases** in image classification models, focusing on **spurious correlations**—for example, when the model is influenced more by background context than by the actual object in the image.

The project has two main phases:
1. **Controlled dataset generation**: we created artificial images combining neutral objects with potentially bias-inducing visual contexts, using *Stable Diffusion v1.5*.  
   > The image generation code is located in the file: `generate_dataset_diff_v1_5.py.py`.
2. **Model analysis**: we evaluated how three pretrained classifiers (AlexNet, ResNet-18, ViT-18) responded to these images and measured whether their predictions aligned with the original prompt or were misled by context.  
   > All analysis and evaluation steps are implemented in this notebook.

We extract the **top-10 logits** for each image-model pair, and send both the original prompt and the predictions to a **language model (LLM)** for semantic auditing. The LLM provides a coherence score (0–1), a short explanation, and optional confidence. These evaluations are stored in a `.jsonl` file.

Finally, we build an aggregated **bias report** using precomputed statistics and LLM-generated justifications. This final Markdown report includes:
- Aggregate performance metrics
- Recurring error patterns
- Detailed list of incoherent predictions
- Class-specific logit behavior
- Overall model verdict

This workflow allows us to systematically study the effect of **spurious visual cues** and assess the **robustness and reliability** of vision models through a cognitively informed pipeline.

In [39]:
! pip3 install -q openai pandas pyarrow pillow tqdm urllib3 pycocotools requests torch torchvision python-dotenv

You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m


### CONFIGURATION: load environment variables

In [40]:

import os
from pathlib import Path
from dotenv import load_dotenv

# Carica da file .env se presente
load_dotenv(override=True)

# Variabili configurabili
VISION_MODEL = os.getenv("VISION_MODEL", "alexnet")
LLM_MODEL = os.getenv("LLM_MODEL", "gpt-4o-mini")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
IMG_DIR = Path(os.getenv("IMG_DIR", "dataset_/images"))
META_CSV = Path(os.getenv("META_CSV", "dataset_/dataset_metadata.csv"))
OUTPUT_DIR = Path(os.getenv("OUTPUT_DIR", "analysis_out_"))
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
COHERENCE_THRESHOLD = float(os.getenv("COHERENCE_TH", 0.5))
TARGET_CLASSES = os.getenv("TARGET_CLASSES", "pillow,toilet seat,park bench,laptop,fox squirrel,tennis ball").split(",")

print(f"VISION_MODEL: {VISION_MODEL}")
print(f"LLM_MODEL: {LLM_MODEL}")
print(f"OPENAI_API_KEY: {OPENAI_API_KEY}")
print(f"IMG_DIR: {IMG_DIR}")
print(f"META_CSV: {META_CSV}")
print(f"OUTPUT_DIR: {OUTPUT_DIR}")
print(f"COHERENCE_THRESHOLD: {COHERENCE_THRESHOLD}")

VISION_MODEL: alexnet
LLM_MODEL: gpt-4o-mini
OPENAI_API_KEY: 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
IMG_DIR: dataset/images
META_CSV: dataset/dataset_metadata.csv
OUTPUT_DIR: analysis_alexnet_1
COHERENCE_THRESHOLD: 0.3


### Load view model and ImageNet classes

In [None]:
import torch
import torchvision.models as models
import torchvision.transforms as T

from PIL import Image

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# model dynamic loading
model = getattr(models, VISION_MODEL)(pretrained=True).eval().to(device)

#  ImageNet labels
import urllib.request
labels_url = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"
imagenet_labels = urllib.request.urlopen(labels_url).read().decode().splitlines()
idx2label = {i: l for i, l in enumerate(imagenet_labels)}

transform = T.Compose([
    T.Resize(256),
    #T.CenterCrop(224),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225]),
])



### Top-10 Extraction

The choice of 10 is not arbitrary.  
Studies on ImageNet show that the **top ten logits** account for, on average, **over 95% of the softmax probability mass** in models such as **ResNet-50** or **ViT-B/16**.  
This means that, in most cases, the remaining classes beyond the 10th position **contribute minimally to the overall semantic representation**.

> Source: [arXiv:2206.07290](https://arxiv.org/pdf/2206.07290)

In [42]:
import csv
import json
from tqdm.auto import tqdm
from collections import defaultdict
import os 

def get_topk(logits, k=10):
    probs = torch.softmax(logits, dim=-1)
    top_p, top_i = torch.topk(probs, k)
    return [(int(i), (idx2label[int(i)], float(p))) for p, i in zip(top_p.cpu(), top_i.cpu())]
# Softmax is monotonic, so it can be used to rank logits, is more interpretable than raw logits for us and for LLM.

def get_class_logits(logits, target_ids):
    return {i: float(logits[i].cpu()) for i in target_ids}

# load prompts from CSV
prompts = {}
with open(META_CSV, newline='') as f:
    reader = csv.DictReader(f)
    for row in reader:
        prompts[Path(row["file_name"].strip()).name] = row["prompt"]

########### 
# target classes, for pt. 4 of the report  
target_classes = TARGET_CLASSES
labels_url = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"
imagenet_labels = urllib.request.urlopen(labels_url).read().decode().splitlines()
idx2label = {i: l for i, l in enumerate(imagenet_labels)}
label2idx = {l: i for i, l in enumerate(imagenet_labels)}
target_ids = {label2idx[c]: c for c in target_classes}
per_class_logit = defaultdict(list)
#########


results = []

# Image analysis
for img_path in tqdm(sorted(IMG_DIR.glob("*.png"))):
    prompt = prompts.get(img_path.name)
    if not prompt:
        continue

    image = Image.open(img_path).convert("RGB")
    with torch.no_grad():
        logits = model(transform(image).unsqueeze(0).to(device))[0]

    # Softmax top‑10 logits
    top_logits = get_topk(logits, k=10)

    # Raw Logits of 6 target classes
    selected_logits = get_class_logits(logits, target_ids.keys())

    results.append({
        "file_name": str(img_path),
        "prompt": prompt,
        "top_logits": top_logits,
        "class_logits": selected_logits
    })

    # aggregate per class (for the dedicated report)
    for cls_id, val in selected_logits.items():
        per_class_logit[cls_id].append({
            "file_name": str(img_path),
            "prompt": prompt,
            "logit": val
        })


# Percorso del file in cui salvare i risultati
LOGIT_RESULTS = Path(os.getenv("OUTPUT_DIR", "analysis_out")) / "logits.json"

# Salva i risultati in JSON
with open(LOGIT_RESULTS, "w", encoding="utf-8") as f:
    json.dump(results, f, indent=2, ensure_ascii=False)

print(f"\n💾 Risultati salvati in: {LOGIT_RESULTS}")
# Debug/preview
print("\n✅ Esempi di top‑10 logits:")
for r in results[:10]:
    print(f"\n📌 {r['file_name']}")
    print(f"Prompt: {r['prompt']}")
    print("Top‑10 logits (softmax):")
    for i, (label, p) in r["top_logits"]:
        print(f"  - {i}: {label} ({p:.3f})")
    print("Logits classi target:")
    for i, val in r["class_logits"].items():
        print(f"  - {i}: {idx2label[i]} = {val:.2f}")

100%|██████████| 121/121 [00:01<00:00, 79.52it/s]


💾 Risultati salvati in: analysis_alexnet_1/logits.json

✅ Esempi di top‑10 logits:

📌 dataset/images/bookjacket__classroom__001.png
Prompt: A neutral bookjacket in a classroom background
Top‑10 logits (softmax):
  - 549: envelope (0.234)
  - 696: paintbrush (0.070)
  - 605: iPod (0.067)
  - 620: laptop (0.047)
  - 813: spatula (0.036)
  - 692: packet (0.035)
  - 831: studio couch (0.035)
  - 767: rubber eraser (0.034)
  - 418: ballpoint (0.034)
  - 446: binder (0.034)
Logits classi target:
  - 721: pillow = 3.52
  - 861: toilet seat = 5.36
  - 703: park bench = 3.45
  - 620: laptop = 8.96
  - 335: fox squirrel = -5.22
  - 852: tennis ball = 2.74

📌 dataset/images/bookjacket__classroom__002.png
Prompt: A neutral bookjacket in a classroom background
Top‑10 logits (softmax):
  - 910: wooden spoon (0.392)
  - 605: iPod (0.129)
  - 642: marimba (0.055)
  - 868: tray (0.041)
  - 620: laptop (0.027)
  - 823: stethoscope (0.025)
  - 813: spatula (0.025)
  - 681: notebook (0.023)
  - 767: rubb




### LLM Coherence Audit 

This script audits the coherence between a prompt and a vision model's top-10 predictions using an LLM.

- **`query_llm()`** sends the prompt and logits to an OpenAI model, which returns a JSON with a coherence `score`, a short `explanation`, and optional `confidence`.
- A prediction is considered coherent if the score ≥ `COHERENCE_THRESHOLD` (default: 0.3).
- Results are saved to `.jsonl` and `.txt` files.
- The final report includes total examples, coherence rate, and breakdowns by object and context.

This allows for automated semantic auditing of vision-language model outputs.

In [43]:
import openai, json, re, os, sys
from tqdm.auto import tqdm
from collections import Counter
from typing import List, Optional
import csv
from pathlib import Path 

# OpenAI API configuration
openai.api_key = os.getenv("OPENAI_API_KEY")

# Parameters
COHERENCE_THRESHOLD = float(os.getenv("COHERENCE_TH", 0.3))
META_CSV = Path(os.getenv("META_CSV", "dataset/dataset_metadata.csv"))
LOG_JSONL_PATH = Path(os.getenv("OUTPUT_DIR", "analysis_out")) / "llm_audit.jsonl"
LIVE_TXT_PATH = Path(os.getenv("OUTPUT_DIR", "analysis_out")) / "llm_live_output.txt"
LOGIT_RESULTS = Path(os.getenv("OUTPUT_DIR", "analysis_out")) / "logits.json"

print(f"COHERENCE_THRESHOLD: {COHERENCE_THRESHOLD}")

# Load logits JSON file
with open(LOGIT_RESULTS, "r", encoding="utf-8") as f:
    logits_data = json.load(f)

# Normalize file names to match those in metadata CSV
def normalize_filename(path: str) -> str:
    return Path(path).name  # estrae solo "bookjacket__classroom__001.png"

logits_lookup = {
    normalize_filename(entry["file_name"]): entry["top_logits"]
    for entry in logits_data
}

# Function to extract JSON from text
def extract_json_from_text(text: str) -> dict:
    m = re.search(r"\{.*?\}", text, re.DOTALL)
    if not m:
        raise ValueError("⚠️ No valid JSON found")
    return json.loads(m.group(0))

# Function to query the LLM
def query_llm(prompt: str, top_logits, vision_model: str) -> dict:
    def safe_label_prob(item):
        try:
            label = str(item[0])
            prob_raw = item[1]
            prob = float(prob_raw[0]) if isinstance(prob_raw, (tuple, list)) else float(prob_raw)
            return f"{label} ({prob:.3f})"
        except Exception:
            return f"[MALFORMED: {item}]"

    top_str = "; ".join([safe_label_prob(it) for it in top_logits])

    user_msg = f"""
You are auditing the output of **{vision_model}** to assess alignment with the prompt.

Prompt:
\"{prompt}\"

Top-10 predictions with probabilities:
{top_str}

Return JSON only:
{{
  "score": <float 0-1>,
  "explanation": <≤25 words>,
  "confidence": <float 0-1 (optional)>
}}
Be lenient; score ≥ 0.3 is considered coherent.
"""

    res = openai.chat.completions.create(
        model=os.getenv("LLM_MODEL", "gpt-4o-mini"),
        messages=[
            {"role": "system", "content": "Return strict JSON only."},
            {"role": "user", "content": user_msg}
        ],
        temperature=0.0
    )
    return extract_json_from_text(res.choices[0].message.content.strip())

# Load dataset_metadata.csv
metadata = []
with open(META_CSV, newline='', encoding='utf-8') as f:
    reader = csv.DictReader(f)
    for row in reader:
        metadata.append(row)

# Variables for counting
tot = tot_incoh = 0
per_obj_tot = Counter()
per_obj_incoh = Counter()
per_ctx_tot = Counter()
per_ctx_incoh = Counter()
incoherent_cases = []

# Main loop
with open(LOG_JSONL_PATH, "w") as fout, open(LIVE_TXT_PATH, "w") as live:
    for row in tqdm(metadata, desc="LLM analysis"):
        tot += 1
        prompt = row["prompt"]
        obj = row["object"]
        ctx = row["background"]

        file_key = Path(row["file_name"]).name
        top_logits = logits_lookup.get(file_key, [])

        per_obj_tot[obj] += 1
        per_ctx_tot[ctx] += 1

        # Query LLM
        llm_out = query_llm(prompt, top_logits, os.getenv("VISION_MODEL", "alexnet"))
        record = {
            **row,
            **llm_out,
            "subject": obj,
            "background": ctx,
            "top_logits": top_logits 
        }

        fout.write(json.dumps(record, ensure_ascii=False) + "\n")
        live.write(json.dumps({
            "id": tot,
            "score": llm_out.get("score"),
            "explanation": llm_out.get("explanation")
        }, ensure_ascii=False) + "\n")

        if llm_out.get("score", 0.0) < COHERENCE_THRESHOLD:
            incoherent_cases.append(record)
            tot_incoh += 1
            per_obj_incoh[obj] += 1
            per_ctx_incoh[ctx] += 1

# Final report
print("\n========== SUMMARY ==========")
pct = 100 * tot_incoh / tot if tot else 0
print(f"Total images:   {tot}")
print(f"Incoherent (<{COHERENCE_THRESHOLD}): {tot_incoh}  ({pct:.1f} %)")

print("\n-- Incoherence by *object* --")
for o in sorted(per_obj_tot):
    pct = 100 * per_obj_incoh[o] / per_obj_tot[o] if per_obj_tot[o] else 0
    print(f"  {o:35s}: {per_obj_incoh[o]}/{per_obj_tot[o]}  ({pct:.1f} %)")

print("\n-- Incoherence by *context* --")
for c in sorted(per_ctx_tot):
    pct = 100 * per_ctx_incoh[c] / per_ctx_tot[c] if per_ctx_tot[c] else 0
    print(f"  {c:35s}: {per_ctx_incoh[c]}/{per_ctx_tot[c]}  ({pct:.1f} %)")

COHERENCE_THRESHOLD: 0.3


LLM analysis: 100%|██████████| 121/121 [03:09<00:00,  1.57s/it]


Total images:   121
Incoherent (<0.3): 59  (48.8 %)

-- Incoherence by *object* --
  bookjacket                         : 12/15  (80.0 %)
  ceramiccoffeemug                   : 5/16  (31.2 %)
  grannysmith                        : 11/15  (73.3 %)
  notebookwithkraftcover             : 9/15  (60.0 %)
  opaquemetalwaterbottle             : 8/20  (40.0 %)
  softcouchpillow                    : 11/20  (55.0 %)
  tablelampwithshadeoff              : 3/20  (15.0 %)

-- Incoherence by *context* --
  bathroom                           : 4/8  (50.0 %)
  classroom                          : 7/12  (58.3 %)
  garage                             : 9/10  (90.0 %)
  green                              : 4/12  (33.3 %)
  hotel                              : 7/14  (50.0 %)
  kitchen                            : 5/12  (41.7 %)
  minimalist                         : 5/14  (35.7 %)
  modern                             : 8/14  (57.1 %)
  plain                              : 5/14  (35.7 %)
  science         




### Bias Report & Model Verdict (Uses Pre-computed Metrics)

This cell generates a bias analysis and verdict for the vision model based on previously computed LLM coherence scores and raw logits.

- Loads all results from the audit (`llm_audit.jsonl`) and filters incoherent cases (`score` < threshold).
- Computes global statistics: total images, mean/median/stdev scores, incoherence rates by object and context.
- Analyzes raw logits per target class: average activations, top-5 examples.
- Constructs a structured prompt for the LLM including:
  - (A) global metrics
  - (B) incoherent examples
  - (C) target class activation stats
- The LLM returns a detailed Markdown report with six required sections, including bias patterns and an overall model verdict.
- Final report is saved as `report.md`.

This step automates bias evaluation and model reliability assessment.

In [44]:
import json, openai, statistics, os
from pathlib import Path
from statistics import mean, stdev

openai.api_key = OPENAI_API_KEY

# ── load all the records from previous cell ───────────────────
with open(LOG_JSONL_PATH, "r", encoding="utf-8") as f:
    records = [json.loads(l) for l in f]

# Limit top_logits to 5 for each record
for rec in records:
    rec["top_logits"] = rec.get("top_logits", [])[:5]

# List of incoherent records (those with score < COHERENCE_THRESHOLD)
incoherent_recs = [
    {k: rec[k] for k in ("file_name", "prompt", "top_logits", "score", "explanation")}
    for rec in records if rec.get("score", 0) < COHERENCE_THRESHOLD
]

# ── global metrics already computed in previous cell ─────────────────────────
scores = [rec.get("score", 0.0) for rec in records]
metrics_summary = {
    "total_images": tot,
    "mean_score": statistics.mean(scores) if scores else 0.0,
    "median_score": statistics.median(scores) if scores else 0.0,
    "stdev_score": statistics.pstdev(scores) if len(scores) > 1 else 0.0,
    "percent_incoherent": 100 * tot_incoh / tot if tot else 0.0,
    "object_stats": {
        obj: {
            "total": per_obj_tot[obj],
            "incoherent": per_obj_incoh[obj],
            "percent_incoherent": 100 * per_obj_incoh[obj] / per_obj_tot[obj]
            if per_obj_tot[obj] else 0.0
        }
        for obj in per_obj_tot
    },
    "context_stats": {
        ctx: {
            "total": per_ctx_tot[ctx],
            "incoherent": per_ctx_incoh[ctx],
            "percent_incoherent": 100 * per_ctx_incoh[ctx] / per_ctx_tot[ctx]
            if per_ctx_tot[ctx] else 0.0
        }
        for ctx in per_ctx_tot
    }
}

# Save metrics to file (may be useful)
Path(OUTPUT_DIR /  "metrics.json").write_text(json.dumps(metrics_summary, indent=2), encoding="utf-8")


logit_report_section = "\n## Target Class Analysis (Raw Logits)\n"
for cls_id, cls_name in target_ids.items():
    values = [x["logit"] for x in per_class_logit[cls_id]]
    if not values:
        continue

    logit_report_section += f"\n### Class `{cls_name}` (ImageNet #{cls_id})\n"
    logit_report_section += f"- Average logit: {mean(values):.2f} (std: {stdev(values):.2f})\n"
    logit_report_section += "- Top‑5 activations:\n"
    top5 = sorted(per_class_logit[cls_id], key=lambda x: -x["logit"])[:5]
    for e in top5:
       logit_report_section += f"  - `{e['file_name']}` → logit={e['logit']:.2f}\n"

print(logit_report_section)

# ── Prompt for LLM ──────
prompt_header = f"""
You are an AI-bias auditor.  
Below you will find **(A) pre-computed global metrics**, **(B) per-image data**, and **(C) target class logit analysis**.

Use the provided metrics; do NOT recalsculate means or percentages yourself.
Respond in **Markdown** with the requested sections.

## Required sections
### 1 Aggregate statistics
Summarise the numbers from (A).

### 2 Recurring error patterns
Identify frequent error types and link them to biases in **{VISION_MODEL}**.

### 3 Detailed list of incoherent images
For every image in (B) (score < {COHERENCE_THRESHOLD}) list:
• file_name  • ≤15-word prompt summary  • three worst labels  • explanation (≤2 sentences).

### 4 Target class logit analysis (Full Details)
Include the full details of the target class analysis from (C).

### 5 Main biases of the model
At least three systematic biases, with examples.

### 6 Overall verdict
Bullet strengths/weaknesses of **{VISION_MODEL}** + final reliability rating 1–5 (no mitigation advice).

Respond **only** in Markdown, start each major section with '##'.
"""

payload = (
    prompt_header
    + "\n\n### (A) Global metrics\n```json\n"
    + json.dumps(metrics_summary, ensure_ascii=False, indent=2)
    + "\n```\n\n### (B) Incoherent images\n```json\n"
    + json.dumps(incoherent_recs, ensure_ascii=False)
    + "\n```\n\n### (C) Target class logit analysis (Full Details)\n"
    + logit_report_section
)

response = openai.chat.completions.create(
    model=LLM_MODEL,
    messages=[
        {"role": "system",
         "content": "You are a senior AI-bias analyst who MUST reply in Markdown headings."},
        {"role": "user",
         "content": payload}
    ],
    temperature=0.25
)

OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
report_path = OUTPUT_DIR / "report.md"
report_path.write_text(response.choices[0].message.content, encoding="utf-8")
print("✅ Report saved to:", report_path)


## Target Class Analysis (Raw Logits)

### Class `pillow` (ImageNet #721)
- Average logit: 3.35 (std: 3.70)
- Top‑5 activations:
  - `dataset/images/softcouchpillow__modern__001.png` → logit=14.17
  - `dataset/images/softcouchpillow__green__002.png` → logit=13.05
  - `dataset/images/softcouchpillow__kitchen__002.png` → logit=12.28
  - `dataset/images/softcouchpillow__modern__002.png` → logit=11.82
  - `dataset/images/softcouchpillow__green__001.png` → logit=11.74

### Class `toilet seat` (ImageNet #861)
- Average logit: 4.16 (std: 2.71)
- Top‑5 activations:
  - `dataset/images/softcouchpillow__minimalist__001.png` → logit=12.37
  - `dataset/images/ceramiccoffeemug__bathroom__001.png` → logit=10.20
  - `dataset/images/ceramiccoffeemug__hotel__001.png` → logit=9.65
  - `dataset/images/softcouchpillow__classroom__001.png` → logit=9.57
  - `dataset/images/tablelampwithshadeoff__bathroom__002.png` → logit=8.97

### Class `park bench` (ImageNet #703)
- Average logit: 0.28 (std: 2.10)
- Top‑

📺 Visualizzazione

In [45]:
# Cell 7: Output finale
from IPython.display import Markdown, display

print("Report salvato in:", OUTPUT_DIR / "report.md")
report_md = report_path.read_text(encoding="utf-8")
display(Markdown(report_md))

Report salvato in: analysis_alexnet_1/report.md


## 1 Aggregate statistics
- **Total Images**: 121
- **Mean Score**: 0.31
- **Median Score**: 0.33
- **Standard Deviation of Score**: 0.21
- **Percentage of Incoherent Images**: 48.76%
- **Object Incoherence**:
  - Bookjacket: 80%
  - Ceramic Coffee Mug: 31.25%
  - Granny Smith: 73.33%
  - Notebook with Kraft Cover: 60%
  - Opaque Metal Water Bottle: 40%
  - Soft Couch Pillow: 55%
  - Table Lamp with Shade Off: 15%
- **Context Incoherence**:
  - Classroom: 58.33%
  - Garage: 90%
  - Green: 33.33%
  - Hotel: 50%
  - Kitchen: 41.67%
  - Minimalist: 35.71%
  - Modern: 57.14%
  - Plain: 35.71%
  - Science: 45.45%
  - Bathroom: 50%

## 2 Recurring error patterns
- **Misalignment with Prompts**: Many images scored below 0.3 show predictions that do not correlate with the provided prompts, indicating a lack of contextual understanding.
- **Object Confusion**: The model often confuses similar objects, such as mistaking a bookjacket for unrelated items like a laptop or envelope.
- **Contextual Misinterpretation**: The model struggles to relate objects to their specified contexts (e.g., a bookjacket in a classroom), leading to incoherent predictions.

## 3 Detailed list of incoherent images
1. **File Name**: images/bookjacket__classroom__001.png  
   **Prompt Summary**: A neutral bookjacket in a classroom background  
   **Three Worst Labels**: envelope, paintbrush, iPod  
   **Explanation**: Predictions do not align with the prompt; items are unrelated to a bookjacket or classroom.

2. **File Name**: images/bookjacket__classroom__002.png  
   **Prompt Summary**: A neutral bookjacket in a classroom background  
   **Three Worst Labels**: wooden spoon, iPod, marimba  
   **Explanation**: Predictions do not align with the prompt about a bookjacket in a classroom.

3. **File Name**: images/bookjacket__garage__002.png  
   **Prompt Summary**: A neutral bookjacket in a garage background  
   **Three Worst Labels**: fountain pen, revolver, radio  
   **Explanation**: Predictions are unrelated to a bookjacket or garage context.

4. **File Name**: images/bookjacket__green__002.png  
   **Prompt Summary**: A neutral bookjacket in a green background  
   **Three Worst Labels**: envelope, laptop, binder  
   **Explanation**: Predictions do not align with the prompt about a bookjacket.

5. **File Name**: images/bookjacket__hotel__001.png  
   **Prompt Summary**: A neutral bookjacket in a hotel background  
   **Three Worst Labels**: table lamp, studio couch, lampshade  
   **Explanation**: Predictions focus on furniture and electronics, not a bookjacket or hotel context.

6. **File Name**: images/bookjacket__hotel__002.png  
   **Prompt Summary**: A neutral bookjacket in a hotel background  
   **Three Worst Labels**: binder, shoji, quill  
   **Explanation**: Predictions do not align with the prompt about a bookjacket in a hotel background.

7. **File Name**: images/bookjacket__minimalist__002.png  
   **Prompt Summary**: A neutral bookjacket in a minimalist background  
   **Three Worst Labels**: dishwasher, refrigerator, paper towel  
   **Explanation**: Predictions are unrelated to a bookjacket or minimalist background.

8. **File Name**: images/bookjacket__modern__001.png  
   **Prompt Summary**: A neutral bookjacket in a modern background  
   **Three Worst Labels**: lighter, binder, medicine chest  
   **Explanation**: Predictions do not align with the concept of a bookjacket or modern background.

9. **File Name**: images/bookjacket__modern__002.png  
   **Prompt Summary**: A neutral bookjacket in a modern background  
   **Three Worst Labels**: fountain pen, binder, violin  
   **Explanation**: Predictions do not align with the prompt about a bookjacket.

10. **File Name**: images/bookjacket__plain__001.png  
    **Prompt Summary**: A neutral bookjacket in a plain background  
    **Three Worst Labels**: switch, iPod, loudspeaker  
    **Explanation**: Predictions do not align with the concept of a neutral bookjacket.

11. **File Name**: images/bookjacket__plain__002.png  
    **Prompt Summary**: A neutral bookjacket in a plain background  
    **Three Worst Labels**: lampshade, table lamp, ping-pong ball  
    **Explanation**: Predictions are unrelated to a bookjacket or plain background.

12. **File Name**: images/bookjacket__science__001.png  
    **Prompt Summary**: A neutral bookjacket in a science background  
    **Three Worst Labels**: cleaver, letter opener, envelope  
    **Explanation**: Predictions do not align with the prompt about a bookjacket in a science background.

13. **File Name**: images/ceramiccoffeemug__bathroom__001.png  
    **Prompt Summary**: A neutral ceramiccoffeemug in a bathroom background  
    **Three Worst Labels**: soap dispenser, washbasin, tub  
    **Explanation**: Predictions focus on bathroom items, lacking relevance to a coffee mug.

14. **File Name**: images/ceramiccoffeemug__garage__001.png  
    **Prompt Summary**: A neutral ceramiccoffeemug in a garage background  
    **Three Worst Labels**: hammer, screwdriver, pillow  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a coffee mug or garage.

15. **File Name**: images/ceramiccoffeemug__kitchen__001.png  
    **Prompt Summary**: A neutral ceramiccoffeemug in a kitchen background  
    **Three Worst Labels**: medicine chest, espresso maker, lotion  
    **Explanation**: Predictions mostly include kitchen items but lack a clear match to a neutral ceramic coffee mug.

16. **File Name**: images/ceramiccoffeemug__minimalist__001.png  
    **Prompt Summary**: A neutral ceramiccoffeemug in a minimalist background  
    **Three Worst Labels**: television, iPod, home theater  
    **Explanation**: Predictions are unrelated to the prompt about a coffee mug.

17. **File Name**: images/ceramiccoffeemug__modern__002.png  
    **Prompt Summary**: A neutral ceramiccoffeemug in a modern background  
    **Three Worst Labels**: loudspeaker, Polaroid camera, CD player  
    **Explanation**: Predictions do not align with the prompt about a coffee mug.

18. **File Name**: images/grannysmith__bathroom__002.png  
    **Prompt Summary**: A neutral grannysmith in a bathroom background  
    **Three Worst Labels**: washbasin, bathtub, tub  
    **Explanation**: Predictions focus on bathroom objects, not a Granny Smith apple.

19. **File Name**: images/grannysmith__classroom__001.png  
    **Prompt Summary**: A neutral grannysmith in a classroom background  
    **Three Worst Labels**: ping-pong ball, pool table, tennis ball  
    **Explanation**: Predictions are unrelated to a Granny Smith or classroom context.

20. **File Name**: images/grannysmith__classroom__002.png  
    **Prompt Summary**: A neutral grannysmith in a classroom background  
    **Three Worst Labels**: ping-pong ball, computer keyboard, croquet ball  
    **Explanation**: Predictions do not align with the prompt; no mention of a grannysmith or classroom.

21. **File Name**: images/grannysmith__garage__001.png  
    **Prompt Summary**: A neutral grannysmith in a garage background  
    **Three Worst Labels**: hammer, croquet ball, paintbrush  
    **Explanation**: Predictions do not align with the prompt; no mention of a grannysmith or garage.

22. **File Name**: images/grannysmith__green__002.png  
    **Prompt Summary**: A neutral grannysmith in a green background  
    **Three Worst Labels**: croquet ball, golf ball, baseball  
    **Explanation**: Predictions are unrelated to a Granny Smith apple or green background.

23. **File Name**: images/grannysmith__hotel__001.png  
    **Prompt Summary**: A neutral grannysmith in a hotel background  
    **Three Worst Labels**: hip, rubber eraser, pomegranate  
    **Explanation**: Predictions do not align with the prompt; no relevant items identified.

24. **File Name**: images/grannysmith__kitchen__002.png  
    **Prompt Summary**: A neutral grannysmith in a kitchen background  
    **Three Worst Labels**: pomegranate, croquet ball, maraca  
    **Explanation**: Predictions do not align with the prompt; no relevant kitchen or Granny Smith apple identified.

25. **File Name**: images/grannysmith__minimalist__002.png  
    **Prompt Summary**: A neutral grannysmith in a minimalist background  
    **Three Worst Labels**: desk, home theater, television  
    **Explanation**: Predictions do not relate to a neutral grannysmith or minimalist background.

26. **File Name**: images/grannysmith__modern__001.png  
    **Prompt Summary**: A neutral grannysmith in a modern background  
    **Three Worst Labels**: hook, table lamp, lampshade  
    **Explanation**: Predictions do not relate to a Granny Smith apple or modern background.

27. **File Name**: images/grannysmith__modern__002.png  
    **Prompt Summary**: A neutral grannysmith in a modern background  
    **Three Worst Labels**: ping-pong ball, puck, wall clock  
    **Explanation**: Predictions are unrelated to a Granny Smith apple or modern background.

28. **File Name**: images/grannysmith__science__002.png  
    **Prompt Summary**: A neutral grannysmith in a science background  
    **Three Worst Labels**: computer keyboard, scoreboard, ping-pong ball  
    **Explanation**: Predictions are unrelated to the prompt about a neutral grannysmith in a science background.

29. **File Name**: images/notebookwithkraftcover__garage__001.png  
    **Prompt Summary**: A neutral notebookwithkraftcover in a garage background  
    **Three Worst Labels**: hammer, carpenter's kit, pencil sharpener  
    **Explanation**: Predictions do not align with the prompt about a notebook in a garage.

30. **File Name**: images/notebookwithkraftcover__green__001.png  
    **Prompt Summary**: A neutral notebookwithkraftcover in a green background  
    **Three Worst Labels**: binder, lighter, book jacket  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a notebook.

31. **File Name**: images/notebookwithkraftcover__green__002.png  
    **Prompt Summary**: A neutral notebookwithkraftcover in a green background  
    **Three Worst Labels**: hatchet, hard disc, necklace  
    **Explanation**: Predictions do not relate to a notebook or kraft cover.

32. **File Name**: images/notebookwithkraftcover__hotel__002.png  
    **Prompt Summary**: A neutral notebookwithkraftcover in a hotel background  
    **Three Worst Labels**: loudspeaker, home theater, cleaver  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a notebook or hotel setting.

33. **File Name**: images/notebookwithkraftcover__kitchen__002.png  
    **Prompt Summary**: A neutral notebookwithkraftcover in a kitchen background  
    **Three Worst Labels**: television, screen, monitor  
    **Explanation**: Predictions are unrelated to the prompt about a notebook in a kitchen.

34. **File Name**: images/notebookwithkraftcover__modern__001.png  
    **Prompt Summary**: A neutral notebookwithkraftcover in a modern background  
    **Three Worst Labels**: binder, rule, cleaver  
    **Explanation**: Predictions do not align with the prompt about a notebook.

35. **File Name**: images/notebookwithkraftcover__modern__002.png  
    **Prompt Summary**: A neutral notebookwithkraftcover in a modern background  
    **Three Worst Labels**: hatchet, paintbrush, spatula  
    **Explanation**: Predictions do not align with the prompt about a notebook.

36. **File Name**: images/notebookwithkraftcover__plain__001.png  
    **Prompt Summary**: A neutral notebookwithkraftcover in a plain background  
    **Three Worst Labels**: binder, modem, envelope  
    **Explanation**: Predictions do not align with the prompt describing a notebook.

37. **File Name**: images/notebookwithkraftcover__science__002.png  
    **Prompt Summary**: A neutral notebookwithkraftcover in a science background  
    **Three Worst Labels**: binder, ballpoint, rule  
    **Explanation**: Predictions do not align with the prompt about a notebook in a science background.

38. **File Name**: images/opaquemetalwaterbottle__bathroom__001.png  
    **Prompt Summary**: A neutral opaquemetalwaterbottle in a bathroom background  
    **Three Worst Labels**: soap dispenser, washbasin, tub  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a water bottle.

39. **File Name**: images/opaquemetalwaterbottle__bathroom__002.png  
    **Prompt Summary**: A neutral opaquemetalwaterbottle in a bathroom background  
    **Three Worst Labels**: soap dispenser, combination lock, corkscrew  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a water bottle.

40. **File Name**: images/opaquemetalwaterbottle__garage__001.png  
    **Prompt Summary**: A neutral opaquemetalwaterbottle in a garage background  
    **Three Worst Labels**: carpenter's kit, screwdriver, pencil box  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a water bottle or garage.

41. **File Name**: images/opaquemetalwaterbottle__garage__002.png  
    **Prompt Summary**: A neutral opaquemetalwaterbottle in a garage background  
    **Three Worst Labels**: cocktail shaker, hourglass, saltshaker  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a water bottle.

42. **File Name**: images/opaquemetalwaterbottle__hotel__002.png  
    **Prompt Summary**: A neutral opaquemetalwaterbottle in a hotel background  
    **Three Worst Labels**: cocktail shaker, saltshaker, soap dispenser  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a neutral opaque metal water bottle.

43. **File Name**: images/opaquemetalwaterbottle__kitchen__002.png  
    **Prompt Summary**: A neutral opaquemetalwaterbottle in a kitchen background  
    **Three Worst Labels**: soap dispenser, coffeepot, perfume  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a metal water bottle.

44. **File Name**: images/opaquemetalwaterbottle__modern__002.png  
    **Prompt Summary**: A neutral opaquemetalwaterbottle in a modern background  
    **Three Worst Labels**: monitor, notebook, table lamp  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a water bottle.

45. **File Name**: images/opaquemetalwaterbottle__science__001.png  
    **Prompt Summary**: A neutral opaquemetalwaterbottle in a science background  
    **Three Worst Labels**: fountain pen, stethoscope, sewing machine  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a water bottle or science background.

46. **File Name**: images/softcouchpillow__classroom__001.png  
    **Prompt Summary**: A neutral softcouchpillow in a classroom background  
    **Three Worst Labels**: mouse, studio couch, bathtub  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a soft couch pillow in a classroom.

47. **File Name**: images/softcouchpillow__classroom__002.png  
    **Prompt Summary**: A neutral softcouchpillow in a classroom background  
    **Three Worst Labels**: paper towel, studio couch, home theater  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a neutral soft couch pillow in a classroom.

48. **File Name**: images/softcouchpillow__garage__001.png  
    **Prompt Summary**: A neutral softcouchpillow in a garage background  
    **Three Worst Labels**: pillow, paper towel, brassiere  
    **Explanation**: Predictions do not align with the prompt; only 'pillow' is relevant.

49. **File Name**: images/softcouchpillow__hotel__001.png  
    **Prompt Summary**: A neutral softcouchpillow in a hotel background  
    **Three Worst Labels**: dough, butternut squash, wooden spoon  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a soft couch pillow.

50. **File Name**: images/softcouchpillow__hotel__002.png  
    **Prompt Summary**: A neutral softcouchpillow in a hotel background  
    **Three Worst Labels**: pillow, carton, toilet tissue  
    **Explanation**: Predictions include 'pillow' but lack context of a hotel background.

51. **File Name**: images/softcouchpillow__kitchen__001.png  
    **Prompt Summary**: A neutral softcouchpillow in a kitchen background  
    **Three Worst Labels**: coffee mug, cup, mortar  
    **Explanation**: Predictions focus on kitchen items, not a soft couch pillow.

52. **File Name**: images/softcouchpillow__minimalist__001.png  
    **Prompt Summary**: A neutral softcouchpillow in a minimalist background  
    **Three Worst Labels**: mortar, toilet seat, cup  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a soft couch pillow.

53. **File Name**: images/softcouchpillow__minimalist__002.png  
    **Prompt Summary**: A neutral softcouchpillow in a minimalist background  
    **Three Worst Labels**: studio couch, toilet tissue, bathtub  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a soft couch pillow.

54. **File Name**: images/softcouchpillow__plain__001.png  
    **Prompt Summary**: A neutral softcouchpillow in a plain background  
    **Three Worst Labels**: mortar, lampshade, cup  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a soft couch pillow.

55. **File Name**: images/softcouchpillow__plain__002.png  
    **Prompt Summary**: A neutral softcouchpillow in a plain background  
    **Three Worst Labels**: home theater, laptop, iPod  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a soft couch pillow.

56. **File Name**: images/softcouchpillow__science__001.png  
    **Prompt Summary**: A neutral softcouchpillow in a science background  
    **Three Worst Labels**: tub, studio couch, carton  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a neutral soft couch pillow.

57. **File Name**: images/tablelampwithshadeoff__classroom__002.png  
    **Prompt Summary**: A neutral tablelampwithshadeoff in a classroom background  
    **Three Worst Labels**: marimba, crib, prison  
    **Explanation**: Predictions do not align with the prompt; none relate to a table lamp.

58. **File Name**: images/tablelampwithshadeoff__garage__001.png  
    **Prompt Summary**: A neutral tablelampwithshadeoff in a garage background  
    **Three Worst Labels**: syringe, screwdriver, lipstick  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a table lamp.

59. **File Name**: images/tablelampwithshadeoff__garage__002.png  
    **Prompt Summary**: A neutral tablelampwithshadeoff in a garage background  
    **Three Worst Labels**: lens cap, digital clock, loudspeaker  
    **Explanation**: Predictions do not align with the prompt; items are unrelated to a table lamp.

## 4 Target class logit analysis (Full Details)
### Class `pillow` (ImageNet #721)
- **Average logit**: 3.35 (std: 3.70)
- **Top‑5 activations**:
  - `dataset/images/softcouchpillow__modern__001.png` → logit=14.17
  - `dataset/images/softcouchpillow__green__002.png` → logit=13.05
  - `dataset/images/softcouchpillow__kitchen__002.png` → logit=12.28
  - `dataset/images/softcouchpillow__modern__002.png` → logit=11.82
  - `dataset/images/softcouchpillow__green__001.png` → logit=11.74

### Class `toilet seat` (ImageNet #861)
- **Average logit**: 4.16 (std: 2.71)
- **Top‑5 activations**:
  - `dataset/images/softcouchpillow__minimalist__001.png` → logit=12.37
  - `dataset/images/ceramiccoffeemug__bathroom__001.png` → logit=10.20
  - `dataset/images/ceramiccoffeemug__hotel__001.png` → logit=9.65
  - `dataset/images/softcouchpillow__classroom__001.png` → logit=9.57
  - `dataset/images/tablelampwithshadeoff__bathroom__002.png` → logit=8.97

### Class `park bench` (ImageNet #703)
- **Average logit**: 0.28 (std: 2.10)
- **Top‑5 activations**:
  - `dataset/images/tablelampwithshadeoff__classroom__002.png` → logit=9.02
  - `dataset/images/grannysmith__minimalist__002.png` → logit=5.27
  - `dataset/images/tablelampwithshadeoff__green__001.png` → logit=5.06
  - `dataset/images/ceramiccoffeemug__modern__002.png` → logit=4.34
  - `dataset/images/bookjacket__green__002.png` → logit=3.79

### Class `laptop` (ImageNet #620)
- **Average logit**: 5.29 (std: 2.96)
- **Top‑5 activations**:
  - `dataset/images/bookjacket__green__001.png` → logit=15.80
  - `dataset/images/notebookwithkraftcover__science__001.png` → logit=13.48
  - `dataset/images/bookjacket__green__002.png` → logit=13.10
  - `dataset/images/notebookwithkraftcover__minimalist__002.png` → logit=12.49
  - `dataset/images/bookjacket__classroom__002.png` → logit=11.38

### Class `fox squirrel` (ImageNet #335)
- **Average logit**: -3.04 (std: 1.80)
- **Top‑5 activations**:
  - `dataset/images/grannysmith__minimalist__002.png` → logit=1.41
  - `dataset/images/tablelampwithshadeoff__green__002.png` → logit=1.40
  - `dataset/images/grannysmith__green__002.png` → logit=1.39
  - `dataset/images/opaquemetalwaterbottle__green__002.png` → logit=0.63
  - `dataset/images/ceramiccoffeemug__modern__002.png` → logit=0.34

### Class `tennis ball` (ImageNet #852)
- **Average logit**: 3.44 (std: 3.08)
- **Top‑5 activations**:
  - `dataset/images/grannysmith__plain__002.png` → logit=17.15
  - `dataset/images/grannysmith__classroom__001.png` → logit=12.90
  - `dataset/images/grannysmith__plain__001.png` → logit=11.18
  - `dataset/images/grannysmith__minimalist__001.png` → logit=9.71
  - `dataset/images/grannysmith__hotel__001.png` → logit=9.10

## 5 Main biases of the model
1. **Contextual Bias**: The model often fails to associate objects with their correct contexts, leading to incoherent predictions. For example, it misidentifies a bookjacket in a classroom as unrelated items like a laptop or envelope.
2. **Object Confusion**: The model frequently confuses similar objects, such as mistaking a Granny Smith apple for a croquet ball or a pillow for a couch.
3. **Cultural Bias**: The model may reflect biases present in the training data, leading to skewed predictions based on cultural contexts that are not universally applicable.

## 6 Overall verdict
- **Strengths**:
  - Capable of identifying certain objects accurately in ideal contexts.
  - Demonstrates some level of understanding of object categories.
  
- **Weaknesses**:
  - High incoherence rate (48.76%) indicates significant misalignment with prompts.
  - Frequent contextual and object confusion leads to unreliable predictions.

- **Final Reliability Rating**: 2/5