# Counterfactual Evidence-Based Explanation for RAG

This notebook demonstrates a post-hoc explanation method for retrieval-augmented generation (RAG) systems that focuses on **evidence support rather than semantic similarity**.

Given a question and a set of retrieved documents, the system first generates a final answer using the standard RAG pipeline.  
It then explains this answer by identifying **which parts of the retrieved documents explicitly support the answer content**.

Unlike similarity-based or attention-based highlighting methods, this approach treats explanation as an **evidence verification problem**:  
a sentence is highlighted only if it can be shown to directly support the generated answer.

The method is:
- model-agnostic and fully post-hoc
- compatible with local LLMs (e.g. Ollama)
- lightweight and bounded in runtime
- designed to reduce over-highlighting and spurious evidence

The resulting highlights aim to answer the question:  
**“Which retrieved text actually justifies the answer?”**

Con: we use the LLM to awnser the question of justifiying the awnser. If our LLM is poor, this can go wrong.

In [31]:
import sys
from pathlib import Path

NOTEBOOK_DIR = Path.cwd()

project_root = None
for p in [NOTEBOOK_DIR] + list(NOTEBOOK_DIR.parents):
    if (p / "src").exists():
        project_root = p
        break

if project_root is None:
    raise RuntimeError("Notebook must live inside the repo")

sys.path.insert(0, str(project_root))

print("Repo root:", project_root)
print("Python:", sys.executable)


Repo root: c:\Users\Admin\Desktop\XAI\FINAL\xai-rag
Python: c:\Users\Admin\Desktop\XAI\FINAL\xai-rag\.venv\Scripts\python.exe


In [None]:
import requests

r = requests.get("http://localhost:11434/api/tags", timeout=10)
r.raise_for_status()
print("Ollama models:", [m["name"] for m in r.json()["models"]])


Ollama models: ['qwen3-vl:8b']


In [33]:
import tomllib
from pathlib import Path

CONFIG_PATH = project_root / "config.toml"

with open(CONFIG_PATH, "rb") as f:
    cfg = tomllib.load(f)

med_cfg = cfg["medmcqa"]
rag_cfg = cfg["rag"]
llm_cfg = cfg["llm"]

QUESTION_IDS = med_cfg["question_ids"]
KG_CAPABLE = set(med_cfg.get("kg_capable", []))
SPLIT = med_cfg["split"]

print("MedMCQA questions:", len(QUESTION_IDS))
print("KG capable:", len(KG_CAPABLE))
print("RAG hops:", rag_cfg["n_hops"])
print("LLM:", llm_cfg["provider"], llm_cfg["model"])


MedMCQA questions: 59
KG capable: 12
RAG hops: 2
LLM: ollama gemma3:4b


In [None]:
# Cell 4: Config
from src.modules.loader.medmcqa_data_loader import MedMCQADataLoader

loader = MedMCQADataLoader()
documents = loader.setup()

print("Loaded:", len(documents))

for d in documents[:3]:
    print()
    print(d.metadata["question"])
    print("Answer:", d.metadata["answer"])
    print("KG:", d.metadata["question_id"] in KG_CAPABLE)


Loaded: 59

Which of the following agents is likely to cause cerebral calcification and hydrocephalus in a newborn whose mother has history of taking spiramycin but was not compliant with therapy?
Answer: B
KG: False

Myocarditis is caused bya) Pertussisb) Measlesc) Diptheriad) Scorpion sting
Answer: A
KG: False

Childhood osteopetrosis is characterized by – a) B/L frontal bossingb) Multiple # (fracture)c) Hepatosplenomegalyd) Cataracte) Mental retardation
Answer: A
KG: True


In [42]:
from src.modules.llm.llm_client import LLMClient

llm_client = LLMClient(
    provider=LLM_PROVIDER,
    model_name=LLM_MODEL
)

llm = llm_client.get_llm()

print("LLM ready:", llm)


Connecting to local Ollama (gemma3:4b)...
LLM ready: model='gemma3:4b' reasoning=False temperature=0.0


In [43]:
#5 Dataset Loading
from src.modules.rag.rag_engine import RAGEngine

rag = RAGEngine(persist_dir="../data/vector_db_medmcqa")

rag.setup(
    documents=documents,
    reset=False,   # set True if you want to rebuild
    k=TOP_K_DOCS
)

print("Vector DB ready")


Loading existing vector store from ../data/vector_db_medmcqa...
RagEngine ready.
Vector DB ready


In [44]:
question = documents[0].metadata["question"]

docs = rag.retrieve_documents(question)

print("Retrieved:", len(docs))
for i, d in enumerate(docs):
    print(f"\nDOC {i}")
    print(d.page_content[:300])

Retrieved: 4

DOC 0
Which of the following agents is likely to cause cerebral calcification and hydrocephalus in a newborn whose mother has history of taking spiramycin but was not compliant with therapy?

A: Rubella
B: Toxoplasmosis
C: CMV
D: Herpes

Explanation: b. Toxoplasmosis(Ref: Nelson's 20/e p 2814, Ghai 8/e p 

DOC 1
Child with "Intracranial Calcification, Chorioretinitis" most appropriate cause is:

A: Toxoplasmosis
B: Herpes simplex
C: CMV
D: Syphilis

Explanation: Toxoplasmosis

DOC 2
Mechanism of action of beta-Lactam antibiotics would be

A: Inhibition of bacterial protein synthesis by binding to subunit of bacterial ribosomes
B: Inhibition of cell wall peptidoglycan synthesis by competitive inhibition of transpeptidases
C: Reduced drug causes strand breaks in DNA.
D: Inhibitio

DOC 3
Acyclovir is used for the following viral infection :

A: Rabies virus
B: Cytomegalovirus
C: Herpes simplex virus
D: Human immunodeficiency virus

Explanation: None


In [45]:
context = "\n\n".join(d.page_content for d in docs if d.page_content)


In [47]:
prompt = f"""
Answer using only the context.
If the answer is missing, say unknown.
Return only the final answer.

Question: {question}

Context:
{context}

Answer:
""".strip()

baseline = llm.invoke(prompt).content.strip()
print("Baseline:", baseline)

Baseline: Toxoplasmosis


In [49]:
# Cell 8: Sentence splitter + HTML highlight helpers
def split_sentences(text: str):
    text = (text or "").strip()
    if not text:
        return []
    return [s.strip() for s in re.split(r"(?<=[.!?])\s+", text) if s.strip()]

def _find_all_spans(text: str, needle: str):
    spans = []
    if not needle:
        return spans
    start = 0
    n = len(needle)
    while True:
        idx = text.find(needle, start)
        if idx == -1:
            break
        spans.append((idx, idx + n))
        start = idx + max(1, n)
    return spans

def _merge_spans(spans):
    if not spans:
        return []
    spans = sorted(spans, key=lambda x: (x[0], x[1]))
    merged = [spans[0]]
    for s, e in spans[1:]:
        ps, pe = merged[-1]
        if s <= pe:
            merged[-1] = (ps, max(pe, e))
        else:
            merged.append((s, e))
    return merged

def highlight_html_exact(text: str, snippets):
    spans = []
    for snip in snippets:
        snip = (snip or "").strip()
        if not snip:
            continue
        spans.extend(_find_all_spans(text, snip))
    spans = _merge_spans(spans)

    out = []
    last = 0
    for s, e in spans:
        out.append(html.escape(text[last:s]))
        out.append("<mark>")
        out.append(html.escape(text[s:e]))
        out.append("</mark>")
        last = e
    out.append(html.escape(text[last:]))
    return "".join(out)


In [50]:
# Cell 9: Claim extraction (simple and strict)
def extract_claims(answer: str):
    a = (answer or "").strip()
    if not a or a.lower() == "unknown":
        return []
    # For short answers like a color, treat as one claim
    return [a]

claims = extract_claims(baseline)
print("claims:", claims)


claims: ['Toxoplasmosis']


In [51]:
# Cell 10: Support check prompt (strict entailment)
def support_check(claim: str, window: str):
    check_prompt = f"""
You are verifying evidence support.

Claim:
{claim}

Evidence context:
{window}

Decide if the evidence context explicitly supports the claim.
Reply with exactly one token: YES or NO.
""".strip()

    out = llm.invoke(check_prompt).content.strip().upper()
    return out.startswith("YES")

print("Support checker ready")


Support checker ready


In [52]:
# Cell 11: Candidate sentences (cheap prefilter)
def candidate_windows(docs, max_sents_per_doc: int, window_size: int = 3):
    per_doc = {}
    for i, d in enumerate(docs):
        sents = split_sentences(d.page_content)[:max_sents_per_doc]
        windows = []
        for idx in range(len(sents)):
            start = max(0, idx - (window_size - 1))
            window = " ".join(sents[start:idx+1])
            windows.append({
                "sentence": sents[idx],
                "window": window,
                "sent_idx": idx
            })
        per_doc[i] = windows
    return per_doc

cands_by_doc = candidate_windows(docs, MAX_SENTENCES_PER_DOC, window_size=3)

for di, items in cands_by_doc.items():
    print("doc", di, "candidates:", len(items))



doc 0 candidates: 3
doc 1 candidates: 1
doc 2 candidates: 2
doc 3 candidates: 1


In [53]:
# Cell 12: Evidence selection per claim (with progress bar)
highlights_by_doc = {i: [] for i in range(len(docs))}

total_checks = sum(len(items) for items in cands_by_doc.values()) * max(1, len(claims))
pbar = tqdm(total=total_checks, desc="Checking evidence support")

for claim in claims:
    found = 0
    for doc_idx, items in cands_by_doc.items():
        for item in items:
            pbar.update(1)

            if support_check(claim, item["window"]):
                highlights_by_doc[doc_idx].append(item["sentence"])
                found += 1
                if found >= MAX_SUPPORT_PER_CLAIM:
                    break

        if found >= MAX_SUPPORT_PER_CLAIM:
            break

pbar.close()

for i in highlights_by_doc:
    highlights_by_doc[i] = list(dict.fromkeys(highlights_by_doc[i]))

highlights_by_doc

Checking evidence support:   0%|          | 0/7 [00:00<?, ?it/s]

{0: ['Which of the following agents is likely to cause cerebral calcification and hydrocephalus in a newborn whose mother has history of taking spiramycin but was not compliant with therapy?',
  'A: Rubella\nB: Toxoplasmosis\nC: CMV\nD: Herpes\n\nExplanation: b.'],
 1: [],
 2: [],
 3: []}

In [54]:
# Cell 13: Render output
parts = []
parts.append("<style>mark{padding:0.08em 0.15em; border-radius:3px;}</style>")
parts.append("<h2>Answer</h2>")
parts.append(f"<div style='white-space: pre-wrap;'>{html.escape(baseline)}</div>")
parts.append("<hr/>")

parts.append("<h2>Retrieved Documents</h2>")
parts.append("<p>Highlights are sentences that explicitly support the claim.</p>")

for i, d in enumerate(docs):
    snippets = highlights_by_doc.get(i, [])
    body = highlight_html_exact(d.page_content, snippets) if snippets else html.escape(d.page_content)

    parts.append(f"<h3>Document {i}</h3>")
    parts.append("<div style='white-space: pre-wrap; border: 1px solid #ddd; padding: 10px; border-radius: 6px;'>")
    parts.append(body)
    parts.append("</div><br/>")

display(HTML("".join(parts)))


In [55]:
# Cell 14: Simple table for debugging
import pandas as pd

rows = []
for doc_idx, sents in highlights_by_doc.items():
    for s in sents:
        rows.append({
            "doc_idx": doc_idx,
            "highlighted_sentence": s
        })

df = pd.DataFrame(rows)
df


Unnamed: 0,doc_idx,highlighted_sentence
0,0,Which of the following agents is likely to cau...
1,0,A: Rubella\nB: Toxoplasmosis\nC: CMV\nD: Herpe...


In [56]:
import html
from IPython.display import display, HTML
def highlight_text(text, snippets):
    out = html.escape(text)
    for s in snippets:
        out = out.replace(
            html.escape(s),
            f"<mark>{html.escape(s)}</mark>"
        )
    return out

html_out = "<h2>Answer</h2>"
html_out += f"<p>{html.escape(baseline)}</p><hr>"

for i, d in enumerate(docs):
    html_out += f"<h3>Document {i}</h3>"
    body = highlight_text(d.page_content, highlights_by_doc.get(i, []))
    html_out += f"<pre>{body}</pre>"

display(HTML(html_out))

In [57]:
# We prepare helpers for removing/keeping evidence and mapping highlights to sentence indices.
# We assume highlights mark full sentences.

def mask_remove(indices, sentences):
    return " ".join([s for i, s in enumerate(sentences) if i not in indices])

def mask_except(indices, sentences):
    return " ".join([s for i, s in enumerate(sentences) if i in indices])

def get_highlight_indices(sentences, highlights_by_doc):
    hl = set()
    for doc_sents in highlights_by_doc.values():
        for h in doc_sents:
            for i, s in enumerate(sentences):
                if h.strip() == s.strip():
                    hl.add(i)
    return sorted(list(hl))

sentences_full = split_sentences(context)
highlight_indices = get_highlight_indices(sentences_full, highlights_by_doc)
print("highlighted sentence indices:", highlight_indices)


highlighted sentence indices: [0, 1]


In [60]:
import time
import math
import random
import pandas as pd
from tqdm.auto import tqdm

# ------------------------
# Configuration
# ------------------------
EVAL_MAX_Q = len(QUESTION_IDS)        # use all configured questions
CURVE_STEPS = 6                      # resolution of deletion / insertion curves
random.seed(0)

# ------------------------
# LLM call accounting
# ------------------------
LLM_CALLS = 0
LLM_TIME = 0.0

def timed_invoke(prompt):
    global LLM_CALLS, LLM_TIME
    t0 = time.time()
    out = llm.invoke(prompt).content.strip()
    LLM_CALLS += 1
    LLM_TIME += time.time() - t0
    return out

# ------------------------
# Utilities
# ------------------------
def norm(x):
    return (x or "").strip().lower()

def exact_match(a, b):
    return int(norm(a) == norm(b))

def answer_from_context(question, context):
    prompt = f"""
Answer using only the context.
If the answer is missing, say unknown.
Return only the final answer.

Question: {question}

Context:
{context}

Answer:
""".strip()
    return timed_invoke(prompt)

def build_context(docs):
    return "\n\n".join(d.page_content for d in docs if d.page_content)

def remove_indices(indices, sents):
    ix = set(indices)
    return " ".join(s for i,s in enumerate(sents) if i not in ix)

def keep_indices(indices, sents):
    ix = set(indices)
    return " ".join(s for i,s in enumerate(sents) if i in ix)

def get_highlight_indices(full_sentences, highlights_by_doc):
    hl = set()
    for doc_sents in highlights_by_doc.values():
        for h in doc_sents:
            for i,s in enumerate(full_sentences):
                if s.strip() == h.strip():
                    hl.add(i)
    return sorted(list(hl))

def curve_prefixes(indices, steps):
    if not indices:
        return [set() for _ in range(steps)]
    n = len(indices)
    cuts = []
    for k in range(1, steps+1):
        m = min(n, max(1, math.ceil(k*n/steps)))
        cuts.append(set(indices[:m]))
    return cuts

# ------------------------
# One full counterfactual run
# ------------------------
def run_one(question, gold):
    docs = rag.retrieve_documents(question)
    ctx = build_context(docs)

    baseline = answer_from_context(question, ctx)
    claims = extract_claims(baseline)

    cands = candidate_windows(docs, MAX_SENTENCES_PER_DOC, window_size=3)
    highlights = {i: [] for i in range(len(docs))}

    for claim in claims[:1]:
        found = 0
        for di, items in cands.items():
            for item in items:
                check_prompt = f"""
You are verifying evidence support.

Claim:
{claim}

Evidence context:
{item["window"]}

Decide if the evidence context explicitly supports the claim.
Reply with exactly one token: YES or NO.
""".strip()

                if timed_invoke(check_prompt).upper().startswith("YES"):
                    highlights[di].append(item["sentence"])
                    found += 1
                    if found >= MAX_SUPPORT_PER_CLAIM:
                        break
            if found >= MAX_SUPPORT_PER_CLAIM:
                break

    for i in highlights:
        highlights[i] = list(dict.fromkeys(highlights[i]))

    full_sentences = split_sentences(ctx)
    hl_indices = get_highlight_indices(full_sentences, highlights)

    ctx_wo = remove_indices(hl_indices, full_sentences)
    ctx_only = keep_indices(hl_indices, full_sentences)

    ans_wo = answer_from_context(question, ctx_wo) if ctx_wo.strip() else "unknown"
    ans_only = answer_from_context(question, ctx_only) if ctx_only.strip() else "unknown"

    # Faithfulness
    comprehensiveness = int(norm(ans_wo) != norm(baseline))
    sufficiency = int(norm(ans_only) == norm(baseline))

    # Deletion curve
    del_prefixes = curve_prefixes(hl_indices, CURVE_STEPS)
    del_curve = []
    for pref in del_prefixes:
        c = remove_indices(sorted(pref), full_sentences)
        a = answer_from_context(question, c) if c.strip() else "unknown"
        del_curve.append(int(norm(a) != norm(baseline)))
    deletion_auc = sum(del_curve) / len(del_curve)

    # Insertion curve
    ins_curve = []
    for pref in del_prefixes:
        c = keep_indices(sorted(pref), full_sentences)
        a = answer_from_context(question, c) if c.strip() else "unknown"
        ins_curve.append(int(norm(a) == norm(baseline)))
    insertion_auc = sum(ins_curve) / len(ins_curve)

    task = exact_match(baseline, gold)

    return {
        "question": question,
        "gold": gold,
        "baseline": baseline,
        "task_correct": task,
        "comprehensiveness": comprehensiveness,
        "sufficiency": sufficiency,
        "deletion_auc": deletion_auc,
        "insertion_auc": insertion_auc,
        "n_sentences": len(full_sentences),
        "n_highlighted": len(hl_indices),
        "highlight_frac": len(hl_indices)/max(1,len(full_sentences))
    }

# ------------------------
# Build evaluation set
# ------------------------
eval_items = []
for d in documents:
    q = d.metadata["question"]
    a = d.metadata["answer"]
    if q and a:
        eval_items.append((q,a))

random.shuffle(eval_items)
eval_items = eval_items[:EVAL_MAX_Q]

# ------------------------
# Run sweep
# ------------------------
t0 = time.time()
rows = []
pbar = tqdm(total=len(eval_items), desc="Evaluating")

for q,gold in eval_items:
    rows.append(run_one(q,gold))
    pbar.update(1)

pbar.close()
wall = time.time() - t0

df = pd.DataFrame(rows)

# ------------------------
# Aggregate summary
# ------------------------
summary = {
    "n_questions": len(df),
    "task_accuracy": df["task_correct"].mean(),
    "comprehensiveness": df["comprehensiveness"].mean(),
    "sufficiency": df["sufficiency"].mean(),
    "deletion_auc": df["deletion_auc"].mean(),
    "insertion_auc": df["insertion_auc"].mean(),
    "highlight_frac": df["highlight_frac"].mean(),
    "wall_time_s": wall,
    "llm_calls": LLM_CALLS,
    "llm_total_time_s": LLM_TIME,
    "llm_avg_call_s": LLM_TIME / max(1, LLM_CALLS)
}

print("=== COUNTERFACTUAL EVIDENCE EVALUATION ===")
for k,v in summary.items():
    if isinstance(v,float):
        print(f"{k}: {v:.4f}")
    else:
        print(f"{k}: {v}")

display(pd.DataFrame([summary]))
display(df.head(10))


Evaluating:   0%|          | 0/59 [00:00<?, ?it/s]

=== COUNTERFACTUAL EVIDENCE EVALUATION ===
n_questions: 59
task_accuracy: 0.0847
comprehensiveness: 0.6441
sufficiency: 0.4915
deletion_auc: 0.5819
insertion_auc: 0.4209
highlight_frac: 0.0952
wall_time_s: 186.1360
llm_calls: 857
llm_total_time_s: 185.6045
llm_avg_call_s: 0.2166


Unnamed: 0,n_questions,task_accuracy,comprehensiveness,sufficiency,deletion_auc,insertion_auc,highlight_frac,wall_time_s,llm_calls,llm_total_time_s,llm_avg_call_s
0,59,0.084746,0.644068,0.491525,0.581921,0.420904,0.09522,186.135994,857,185.604475,0.216575


Unnamed: 0,question,gold,baseline,task_correct,comprehensiveness,sufficiency,deletion_auc,insertion_auc,n_sentences,n_highlighted,highlight_frac
0,Edwards syndrome is,B,A: Trisomy 21,0,1,1,1.0,1.0,4,1,0.25
1,Epley's test is used for?,A,A: Benign paroxysmal vertigo,0,1,0,1.0,0.0,16,2,0.125
2,Which muscle is responsible for sitting to sta...,B,Gluteus maximus,0,0,0,0.0,0.0,21,2,0.095238
3,A 19-year-old presents at the emergency depart...,C,Subarachnoid space,0,1,0,0.666667,0.0,17,2,0.117647
4,30 yr old mom presents with cramping gluteal p...,A,unknown,0,0,1,0.0,1.0,19,0,0.0
5,Which of the following statements regarding co...,C,Specular microscopy analysis is used to assess...,0,1,0,0.5,0.0,12,2,0.166667
6,"Child with ""Intracranial Calcification, Chorio...",A,unknown,0,0,1,0.0,1.0,18,0,0.0
7,Macrosomia is/are associated with:a) Gestation...,D,a) Gestational diabetes mellitusb) Maternal ob...,0,1,0,1.0,0.0,13,2,0.153846
8,Dupuytren's contracture mvolves-,B,Palmar fascia,0,1,1,1.0,0.5,20,2,0.1
9,Renshaw inhibition,D,unknown,0,1,1,1.0,1.0,15,0,0.0


## Counterfactual Evidence Based RAG Explainer

**What it does**
- The RAG system answers first, then we highlight only the retrieved sentences that are verified to entail that answer.

**How it is evaluated**
- We test whether removing or keeping the highlighted sentences changes the answer.

## Metrics

- **Task accuracy**: Whether the RAG answer matches the dataset gold answer.
- **Comprehensiveness**: Does removing highlighted sentences change the answer.
- **Sufficiency**: Do highlighted sentences alone reproduce the answer.
- **Deletion AUC**: How quickly the answer breaks when highlighted text is removed.
- **Insertion AUC**: How quickly the answer is recovered when highlighted text is added.
- **Highlight fraction**: How much of the context is highlighted.

## Interpretation

- High comprehensiveness and deletion AUC mean the highlights are causally necessary.  
- High sufficiency and insertion AUC mean the highlights are informative.  
- Low highlight fraction means the explanation is sparse and focused.


In [None]:
# ================================
# CELL: End-User View of Explanations
# ================================

import html
from IPython.display import display, HTML

def explain_question(question):
    docs = rag.retrieve_documents(question)
    context = "\n\n".join(d.page_content for d in docs if d.page_content)

    # Get baseline answer
    baseline = answer_from_context(question, context)

    # Extract claim
    claims = extract_claims(baseline)

    # Candidate windows
    cands = candidate_windows(docs, MAX_SENTENCES_PER_DOC, window_size=3)

    # Select supporting sentences
    highlights = {i: [] for i in range(len(docs))}
    for claim in claims[:1]:
        found = 0
        for di, items in cands.items():
            for item in items:
                check_prompt = f"""
You are verifying evidence support.

Claim:
{claim}

Evidence context:
{item["window"]}

Decide if the evidence context explicitly supports the claim.
Reply with exactly one token: YES or NO.
""".strip()

                if llm.invoke(check_prompt).content.strip().upper().startswith("YES"):
                    highlights[di].append(item["sentence"])
                    found += 1
                    if found >= MAX_SUPPORT_PER_CLAIM:
                        break
            if found >= MAX_SUPPORT_PER_CLAIM:
                break

    for i in highlights:
        highlights[i] = list(dict.fromkeys(highlights[i]))

    # Render HTML
    parts = []
    parts.append("<h3>Question</h3>")
    parts.append(f"<div style='white-space: pre-wrap;'>{html.escape(question)}</div>")
    parts.append("<h3>RAG Answer</h3>")
    parts.append(f"<div style='white-space: pre-wrap; font-weight: bold;'>{html.escape(baseline)}</div>")
    parts.append("<hr/>")
    parts.append("<h3>Retrieved Documents</h3>")
    parts.append("<p>Highlighted sentences explicitly support the answer.</p>")

    for i, d in enumerate(docs):
        snippets = highlights.get(i, [])
        body = highlight_html_exact(d.page_content, snippets) if snippets else html.escape(d.page_content)
        parts.append(f"<h4>Document {i+1}</h4>")
        parts.append("<div style='white-space: pre-wrap; border: 1px solid #ddd; padding: 10px; border-radius: 6px;'>")
        parts.append(body)
        parts.append("</div><br/>")

    return HTML("".join(parts))


# ------------------------
# Run for all configured questions
# ------------------------

questions = [d.metadata["question"] for d in documents]

for i, q in enumerate(questions):
    print(f"===== QUESTION {i+1} =====")
    display(explain_question(q))


#TO SEE THE OUTPUT i.e. ALL 59 AWNSERED QUESTIONS OPEN OUTPUT (UNCOLAPSE)

===== QUESTION 1 =====


===== QUESTION 2 =====


===== QUESTION 3 =====


===== QUESTION 4 =====


===== QUESTION 5 =====


===== QUESTION 6 =====


===== QUESTION 7 =====


===== QUESTION 8 =====


===== QUESTION 9 =====


===== QUESTION 10 =====


===== QUESTION 11 =====


===== QUESTION 12 =====


===== QUESTION 13 =====


===== QUESTION 14 =====


===== QUESTION 15 =====


===== QUESTION 16 =====


===== QUESTION 17 =====


===== QUESTION 18 =====


===== QUESTION 19 =====


===== QUESTION 20 =====


===== QUESTION 21 =====


===== QUESTION 22 =====


===== QUESTION 23 =====


===== QUESTION 24 =====


===== QUESTION 25 =====


===== QUESTION 26 =====


===== QUESTION 27 =====


===== QUESTION 28 =====


===== QUESTION 29 =====


===== QUESTION 30 =====


===== QUESTION 31 =====


===== QUESTION 32 =====


===== QUESTION 33 =====


===== QUESTION 34 =====


===== QUESTION 35 =====


===== QUESTION 36 =====


===== QUESTION 37 =====


===== QUESTION 38 =====


===== QUESTION 39 =====


===== QUESTION 40 =====


===== QUESTION 41 =====


===== QUESTION 42 =====


===== QUESTION 43 =====


===== QUESTION 44 =====


===== QUESTION 45 =====


===== QUESTION 46 =====


===== QUESTION 47 =====


===== QUESTION 48 =====


===== QUESTION 49 =====


===== QUESTION 50 =====


===== QUESTION 51 =====


===== QUESTION 52 =====


===== QUESTION 53 =====


===== QUESTION 54 =====


===== QUESTION 55 =====


===== QUESTION 56 =====


===== QUESTION 57 =====


===== QUESTION 58 =====


===== QUESTION 59 =====
