# Dynamic Truncation + Validation + Cross-Encoder + Fusion

## Notebook Summary: Truncation + Validation Fusion RAG (Setup 8)

This notebook implements an optimized RAG pipeline for extractive QA over telecom documents, integrating robust safeguards against hallucination and prompt overload.

### Key Enhancements:

1. **Cross-Encoder Reranking**  
   Initial chunk retrieval is reranked using `cross-encoder/ms-marco-MiniLM-L-6-v2` for deep semantic alignment.

2. **Dynamic Truncation**  
   Top-ranked chunks are split into overlapping word windows (150 tokens, stride 75), and re-scored using a weighted combination of TF-IDF and lexical overlap. Only the most relevant spans are retained.

3. **Fusion Prompting**  
   Selected spans are fused into a single prompt, annotated with source identifiers, and passed to a LoRA-fine-tuned LLaMA-2 model for extractive generation.

4. **Fuzzy Validation**  
   Ensures that the final answer text is approximately contained within the retrieved spans, using token similarity. Warnings are issued if validation fails.

5. **Evaluation**  
   Assessed on 100 QA pairs with:
   - **SQuAD (Exact Match, F1)**
   - **ROUGE-L**
   - **BLEU**

This setup provides the most robust centralized RAG architecture so far — balancing precision, safety, and context richness — and is a strong candidate for downstream deployment or FL extension.

In [1]:
# Imports
import re
import faiss
import pickle
import numpy as np
from sentence_transformers import SentenceTransformer, CrossEncoder
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

In [2]:
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.corpus import stopwords
from difflib import SequenceMatcher

def is_similar(a, b, threshold=0.75):
    return SequenceMatcher(None, a, b).ratio() >= threshold

def truncate_and_filter_chunks(chunks, query, window_size=150, stride=75, max_windows=5):
    STOPWORDS = set(stopwords.words("english"))

    def normalize(text):
        return re.sub(r'\W+', ' ', text.lower())

    def lexical_overlap(query, span):
        q_tokens = set(normalize(query).split()) - STOPWORDS
        c_tokens = set(normalize(span).split()) - STOPWORDS
        return len(q_tokens & c_tokens) / (len(q_tokens | c_tokens) + 1e-5)

    def tfidf_score(query, span):
        vec = TfidfVectorizer().fit([query, span])
        X = vec.transform([query, span])
        return (X[0] @ X[1].T).A[0][0]

    scored_spans = []
    for chunk in chunks:
        words = chunk["content"].split()
        for i in range(0, len(words), stride):
            span_words = words[i:i + window_size]
            if len(span_words) < 30:
                continue
            span = " ".join(span_words)
            score = 0.6 * lexical_overlap(query, span) + 0.4 * tfidf_score(query, span)
            scored_spans.append({
                "content": span,
                "score": score,
                "source": chunk.get("source", "unknown")
            })

    return sorted(scored_spans, key=lambda x: x["score"], reverse=True)[:max_windows]

In [3]:
import torch
# Load FAISS index and chunked docs
index = faiss.read_index("/mnt/data/RAG/3gpp_index.faiss")
with open("/mnt/data/RAG/3gpp_chunks.pkl", "rb") as f:
    documents = pickle.load(f)

# Load embedding + cross-encoder models
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

# FAISS + Cross-Encoder Retrieval
def retrieve_with_rerank(query, top_k=5):
    query_vec = embedding_model.encode(query, normalize_embeddings=True)
    query_vec = np.array(query_vec).reshape(1, -1).astype("float32") 

    D, I = index.search(query_vec, top_k * 2)

    initial_results = [documents[i] for i in I[0]]
    pairs = [(query, doc["content"]) for doc in initial_results]

    scores = reranker.predict(pairs)
    reranked = sorted(zip(scores, initial_results), key=lambda x: x[0], reverse=True)[:top_k]

    return [doc for _, doc in reranked]

# Multi-Chunk Fusion Prompt Builder
SYSTEM_PROMPT = (
    "You are a precise assistant. Extract the exact answer span from the context. "
    "Do not paraphrase, summarize, or add extra information. "
    "The answer must appear exactly in the context. "
    "If the context lists multiple conditions, actions, or branches, include them all as written. "
    "Do not summarize or paraphrase — copy the exact text from the context, line by line."
)

def build_fusion_prompt(context_chunks, question):
    context_lines = []
    for chunk in context_chunks:
        source = chunk.get("source", "unknown").split("/")[-1]
        context_lines.append(f"[Source: {source}]\n-----\n{chunk['content'].strip()}")
    fused_context = "\n\n".join(context_lines)

    user_prompt = (
        f"Context:\n{fused_context}\n\n"
        f"Question: {question}\n"
        f"Answer from the context only:"
    )

    return f"<s>[INST] <<SYS>>\n{SYSTEM_PROMPT}\n<</SYS>>\n\n{user_prompt} [/INST]"

# Output Cleaning
def clean_prediction(raw_text):
    answer = raw_text.split("[/INST]")[-1].strip()
    answer = re.sub(r"[^\w\s\-.,:/()]", "", answer)
    answer = re.sub(r'(\b.+?:)(\s*\1)+', r'\1', answer)

    tokens = answer.split()
    for i in range(1, len(tokens) // 2):
        if tokens[:i] == tokens[i:2*i]:
            answer = " ".join(tokens[:i])
            break

    sentence_end = re.search(r'[.?!]', answer)
    if sentence_end:
        answer = answer[:sentence_end.end()]
    return answer.strip()

# Load Fine-Tuned LLaMA-2 + Pipeline
model_path = "/mnt/data/llama2_qa_lora_output5/final"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16).to("cuda")

qa_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

def answer_with_fusion_cross_rag_truncated(question, top_k=6, max_windows=5, verbose=False):
    # Step 1: Retrieve + rerank with cross-encoder
    initial_chunks = retrieve_with_rerank(question, top_k=top_k)

    # Step 2: Dynamic truncation filtering
    final_chunks = truncate_and_filter_chunks(initial_chunks, question, max_windows=max_windows)

    # Step 3: Build fusion prompt
    prompt = build_fusion_prompt(final_chunks, question)

    # Step 4: Run model
    output = qa_pipeline(
        prompt,
        max_new_tokens=160,
        do_sample=False,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id
    )[0]["generated_text"]

    answer = clean_prediction(output)

    # Step 5: Fuzzy validation
    all_context = " ".join([c["content"] for c in final_chunks])
    if not any(is_similar(answer.lower(), c["content"].lower()) for c in final_chunks):
        print("🚨 WARNING: Approximate match for answer not found in final context. Review answer relevance.")

    if verbose:
        print("📌 Prompt (truncated):\n", prompt[:500], "...\n")
        print("🧾 Raw Output:\n", output)
        print("✅ Final Answer:\n", answer)
        for i, chunk in enumerate(final_chunks):
            print(f"\n--- Context {i+1} ---\n{chunk['content'][:300]}...\n")

    return answer, final_chunks

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda:0


In [4]:
import re
from nltk.corpus import stopwords
import nltk

nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     /home/ec2-user/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [5]:
import json
from tqdm import tqdm
from evaluate import load

# Load QA pairs
def load_qa_pairs(path):
    with open(path, "r", encoding="utf-8") as f:
        return [json.loads(line) for line in f]

qa_pairs = load_qa_pairs("3gpp_qa_100_pairs.jsonl")

# Load metrics
squad_metric = load("squad")
rouge = load("rouge")
bleu = load("bleu")

bleu_predictions = []
bleu_references = []
results = []

for sample in tqdm(qa_pairs):
    question = sample["question"]
    reference = sample["answer"]

    try:
        prediction, _ = answer_with_fusion_cross_rag_truncated(question)
    except Exception as e:
        print(f"⚠️ Error on: {question}\n{e}")
        prediction = ""

    # Add to metrics
    squad_metric.add(
        prediction={"id": str(hash(question)), "prediction_text": prediction},
        reference={"id": str(hash(question)), "answers": {"text": [reference], "answer_start": [0]}}
    )
    rouge.add(prediction=prediction, reference=reference)
    bleu_predictions.append(prediction)
    bleu_references.append([reference])
    results.append({
        "question": question,
        "reference": reference,
        "prediction": prediction
    })

# Compute final scores
squad_scores = squad_metric.compute()
rouge_scores = rouge.compute()
bleu_score = bleu.compute(predictions=bleu_predictions, references=bleu_references)["bleu"]

# Print results
print("\n📊 Final Evaluation Results (Setup 8 — Truncation + Validation + Cross-Encoder + Fusion):")
print(f"Exact Match (EM): {squad_scores['exact_match']:.2f}")
print(f"F1 Score        : {squad_scores['f1']:.2f}")
print(f"ROUGE-L         : {rouge_scores['rougeL']:.4f}")
print(f"BLEU            : {bleu_score:.4f}")

  0%|                                                   | 0/100 [00:00<?, ?it/s]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
  1%|▍                                          | 1/100 [00:07<12:25,  7.53s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




  2%|▊                                          | 2/100 [00:08<05:42,  3.50s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




  3%|█▎                                         | 3/100 [00:12<06:25,  3.97s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




  4%|█▋                                         | 4/100 [00:19<08:20,  5.21s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




  5%|██▏                                        | 5/100 [00:21<06:17,  3.97s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




  6%|██▌                                        | 6/100 [00:28<07:47,  4.98s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




  7%|███                                        | 7/100 [00:29<05:31,  3.57s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




  8%|███▍                                       | 8/100 [00:31<04:44,  3.09s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




  9%|███▊                                       | 9/100 [00:32<03:56,  2.60s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 10%|████▏                                     | 10/100 [00:39<05:47,  3.86s/it]You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 11%|████▌                                     | 11/100 [00:46<07:08,  4.81s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 12%|█████                                     | 12/100 [00:53<08:02,  5.48s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 13%|█████▍                                    | 13/100 [00:56<06:56,  4.78s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 14%|█████▉                                    | 14/100 [00:57<05:11,  3.62s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 15%|██████▎                                   | 15/100 [00:58<04:09,  2.94s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 16%|██████▋                                   | 16/100 [01:06<05:53,  4.21s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 17%|███████▏                                  | 17/100 [01:12<06:54,  5.00s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 18%|███████▌                                  | 18/100 [01:15<05:51,  4.29s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 19%|███████▉                                  | 19/100 [01:22<06:58,  5.16s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 20%|████████▍                                 | 20/100 [01:23<05:15,  3.94s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 21%|████████▊                                 | 21/100 [01:29<05:57,  4.52s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 22%|█████████▏                                | 22/100 [01:30<04:29,  3.45s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 23%|█████████▋                                | 23/100 [01:31<03:35,  2.80s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 24%|██████████                                | 24/100 [01:38<05:05,  4.02s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 25%|██████████▌                               | 25/100 [01:45<06:10,  4.94s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 26%|██████████▉                               | 26/100 [01:46<04:32,  3.68s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 27%|███████████▎                              | 27/100 [01:53<05:40,  4.66s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 28%|███████████▊                              | 28/100 [02:00<06:30,  5.42s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 29%|████████████▏                             | 29/100 [02:07<06:57,  5.88s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 30%|████████████▌                             | 30/100 [02:08<05:05,  4.37s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 31%|█████████████                             | 31/100 [02:15<06:00,  5.22s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 32%|█████████████▍                            | 32/100 [02:22<06:33,  5.79s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 33%|█████████████▊                            | 33/100 [02:29<06:50,  6.13s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 34%|██████████████▎                           | 34/100 [02:31<05:22,  4.88s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 35%|██████████████▋                           | 35/100 [02:38<05:55,  5.47s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 36%|███████████████                           | 36/100 [02:39<04:19,  4.06s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 37%|███████████████▌                          | 37/100 [02:46<05:09,  4.92s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 38%|███████████████▉                          | 38/100 [02:53<05:39,  5.47s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
 39%|████████████████▍                         | 39/100 [02:58<05:37,  5.53s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 40%|████████████████▊                         | 40/100 [03:06<06:03,  6.05s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 41%|█████████████████▏                        | 41/100 [03:13<06:15,  6.36s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 42%|█████████████████▋                        | 42/100 [03:20<06:23,  6.61s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 43%|██████████████████                        | 43/100 [03:27<06:25,  6.76s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 44%|██████████████████▍                       | 44/100 [03:28<04:36,  4.94s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 45%|██████████████████▉                       | 45/100 [03:35<05:07,  5.60s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 46%|███████████████████▎                      | 46/100 [03:42<05:22,  5.98s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 47%|███████████████████▋                      | 47/100 [03:48<05:27,  6.19s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 48%|████████████████████▏                     | 48/100 [03:49<03:55,  4.53s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 49%|████████████████████▌                     | 49/100 [03:54<03:54,  4.59s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 50%|█████████████████████                     | 50/100 [04:01<04:25,  5.31s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 51%|█████████████████████▍                    | 51/100 [04:08<04:44,  5.80s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 52%|█████████████████████▊                    | 52/100 [04:12<04:15,  5.32s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 53%|██████████████████████▎                   | 53/100 [04:19<04:28,  5.72s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 54%|██████████████████████▋                   | 54/100 [04:26<04:43,  6.16s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 55%|███████████████████████                   | 55/100 [04:33<04:45,  6.35s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 56%|███████████████████████▌                  | 56/100 [04:40<04:50,  6.60s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 57%|███████████████████████▉                  | 57/100 [04:47<04:51,  6.77s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 58%|████████████████████████▎                 | 58/100 [04:54<04:45,  6.81s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 59%|████████████████████████▊                 | 59/100 [05:01<04:39,  6.81s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 60%|█████████████████████████▏                | 60/100 [05:08<04:34,  6.86s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 61%|█████████████████████████▌                | 61/100 [05:12<03:59,  6.14s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 62%|██████████████████████████                | 62/100 [05:16<03:34,  5.65s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 63%|██████████████████████████▍               | 63/100 [05:23<03:43,  6.04s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 64%|██████████████████████████▉               | 64/100 [05:30<03:45,  6.25s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 65%|███████████████████████████▎              | 65/100 [05:34<03:13,  5.52s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 66%|███████████████████████████▋              | 66/100 [05:36<02:33,  4.52s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 67%|████████████████████████████▏             | 67/100 [05:43<02:52,  5.22s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 68%|████████████████████████████▌             | 68/100 [05:50<03:03,  5.72s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 69%|████████████████████████████▉             | 69/100 [05:54<02:45,  5.33s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 70%|█████████████████████████████▍            | 70/100 [05:55<02:00,  4.02s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 71%|█████████████████████████████▊            | 71/100 [05:56<01:29,  3.08s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 72%|██████████████████████████████▏           | 72/100 [05:58<01:14,  2.66s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 73%|██████████████████████████████▋           | 73/100 [06:05<01:44,  3.88s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 74%|███████████████████████████████           | 74/100 [06:12<02:05,  4.81s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 75%|███████████████████████████████▌          | 75/100 [06:12<01:29,  3.56s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 76%|███████████████████████████████▉          | 76/100 [06:19<01:48,  4.52s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 77%|████████████████████████████████▎         | 77/100 [06:26<02:00,  5.22s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 78%|████████████████████████████████▊         | 78/100 [06:32<02:03,  5.64s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 79%|█████████████████████████████████▏        | 79/100 [06:40<02:09,  6.16s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 80%|█████████████████████████████████▌        | 80/100 [06:41<01:35,  4.80s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 81%|██████████████████████████████████        | 81/100 [06:49<01:44,  5.49s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 82%|██████████████████████████████████▍       | 82/100 [06:55<01:45,  5.86s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 83%|██████████████████████████████████▊       | 83/100 [07:01<01:38,  5.78s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 84%|███████████████████████████████████▎      | 84/100 [07:03<01:16,  4.81s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 85%|███████████████████████████████████▋      | 85/100 [07:10<01:21,  5.40s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
 86%|████████████████████████████████████      | 86/100 [07:17<01:21,  5.85s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 87%|████████████████████████████████████▌     | 87/100 [07:24<01:21,  6.26s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 88%|████████████████████████████████████▉     | 88/100 [07:31<01:18,  6.53s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 89%|█████████████████████████████████████▍    | 89/100 [07:39<01:13,  6.72s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 90%|█████████████████████████████████████▊    | 90/100 [07:42<00:57,  5.73s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 91%|██████████████████████████████████████▏   | 91/100 [07:49<00:55,  6.13s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 92%|██████████████████████████████████████▋   | 92/100 [07:56<00:51,  6.47s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 93%|███████████████████████████████████████   | 93/100 [07:58<00:34,  4.98s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 94%|███████████████████████████████████████▍  | 94/100 [08:05<00:32,  5.47s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 95%|███████████████████████████████████████▉  | 95/100 [08:11<00:29,  5.85s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 96%|████████████████████████████████████████▎ | 96/100 [08:14<00:19,  4.80s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 97%|████████████████████████████████████████▋ | 97/100 [08:21<00:16,  5.48s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
 98%|█████████████████████████████████████████▏| 98/100 [08:22<00:08,  4.16s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




 99%|█████████████████████████████████████████▌| 99/100 [08:29<00:05,  5.07s/it]The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.




100%|█████████████████████████████████████████| 100/100 [08:33<00:00,  5.13s/it]


📊 Final Evaluation Results (Setup 8 — Truncation + Validation + Cross-Encoder + Fusion):
Exact Match (EM): 2.00
F1 Score        : 21.19
ROUGE-L         : 0.2249
BLEU            : 0.0328



