# AskQE Baseline Pipeline (Vanilla + NLI-Atomic)

This notebook runs the full **baseline AskQE pipeline** end-to-end:

1. **Fact Extraction** — extract atomic facts from source sentences (NLI pipeline only)
2. **Question Generation** — Vanilla and Atomic QG via Qwen3-4B
3. **Question Answering** — answer questions on source sentences
4. **Add Questions to BT** — merge Q&A into back-translation files
5. **QA on Back-Translations** — answer questions on perturbed back-translations
6. **Evaluation** — SBERT, string-comparison, xCOMET, BT-Score, Pearson correlation, silhouette

> **Runtime**: Requires **GPU** (Kaggle T4 / Colab T4 is sufficient). Total ~6–7 hours.

In [None]:
!git clone https://github.com/AlessandroMaini/CucumBERT_askqe.git
!pip install -q -r CucumBERT_askqe/requirements.txt

In [None]:
# LOGIN TO HUGGINGFACE
import huggingface_hub
huggingface_hub.login()

In [None]:
import os, sys, shutil
from pathlib import Path

BASE = Path("CucumBERT_askqe")
os.environ["GROQ_API_KEY"] = ""  # ← paste your key here

# Constants used throughout
LANG_PAIRS  = {"en-es": "es", "en-fr": "fr"}
PERTURBATIONS = ["synonym", "alteration", "omission", "expansion_noimpact"]
PIPELINES = ["vanilla", "atomic"]

## 1. Fact Extraction (NLI-Atomic pipeline only)
Extract atomic facts from source sentences using Groq, then filter by NLI entailment.

In [None]:
EXTRACT_SCRIPT = BASE / "QG/fact-extraction/extract_facts_groq.py"
ENTAIL_SCRIPT  = BASE / "QG/fact-extraction/entail_facts.py"

!python {EXTRACT_SCRIPT} --input_file "data/processed/en-es.jsonl"
!python {ENTAIL_SCRIPT}  --input_file "QG/atomic_facts.jsonl" --threshold 0.9

## 2. Question Generation (Vanilla + Atomic)

In [None]:
sys.path.insert(0, str(BASE / "QG/code"))
from qg_qwen3_4b import QuestionGenerator

qg_engine = QuestionGenerator(model_id="Qwen/Qwen3-4B-Instruct-2507")

for variant in PIPELINES:
    print(f"\n── QG: {variant} ──")
    qg_engine.generate_questions("data/processed/en-es.jsonl", variant)

In [None]:
# Free QG model memory
del qg_engine
import torch; torch.cuda.empty_cache()

cache_dir = BASE / "QG/code/models--Qwen--Qwen3-4B-Instruct-2507"
if cache_dir.exists():
    shutil.rmtree(cache_dir)
    print("QG model cache deleted.")

## 3. Question Answering on Source Sentences

In [None]:
sys.path.insert(0, str(BASE / "QA/code"))
from qa_qwen3_4b import QuestionAnswerer

qa_engine = QuestionAnswerer(model_id="Qwen/Qwen3-4B-Instruct-2507")

for variant in PIPELINES:
    print(f"\n── QA source: {variant} ──")
    qa_engine.answer_questions(
        input_file=f"QG/qwen3-4b/questions-{variant}.jsonl",
        pipeline_type=variant,
        sentence_key="en",
    )

## 4. Add Questions to Back-Translation Files
Merge generated questions into BT files for both language pairs.

In [None]:
ADD_Q_SCRIPT = BASE / "backtranslation/add_questions.py"

for variant in PIPELINES:
    qg_file = f"QG/qwen3-4b/questions-{variant}.jsonl"
    for lp in LANG_PAIRS:
        for pert in PERTURBATIONS:
            target  = f"backtranslation/{lp}/bt-{pert}.jsonl"
            output  = f"backtranslation/{lp}/bt-{pert}-{variant}.jsonl"
            print(f"  {variant} | {lp} | {pert}")
            !python {ADD_Q_SCRIPT} --qg_file {qg_file} --target_file {target} --output_file {output}

## 5. QA on Back-Translated Perturbations

In [None]:
for variant in PIPELINES:
    for lp, lang in LANG_PAIRS.items():
        for pert in PERTURBATIONS:
            inp = f"backtranslation/{lp}/bt-{pert}-{variant}.jsonl"
            print(f"  {variant} | {lp} | {pert}")
            qa_engine.answer_questions(
                input_file=inp,
                pipeline_type=variant,
                sentence_key=f"bt_pert_{lang}",
            )

## 6. Evaluation

In [None]:
# AskQE metrics: SBERT + String comparison (F1, CHRF, BLEU)
!python {BASE / "evaluation/sbert/sbert.py"} --model "qwen3-4b" --output_file "evaluation/sbert/qwen3-4b.csv"
!python {BASE / "evaluation/string-comparison/string_comparison.py"} --model "qwen3-4b"

In [None]:
# Standard MT metrics: xCOMET + BT-Score (BERTScore)
!python {BASE / "evaluation/xcomet/xcomet.py"}
!python {BASE / "evaluation/bt-score/bt_score.py"}

In [None]:
# Pearson correlation + Silhouette score
PEARSON_SCRIPT    = BASE / "evaluation/pearson-correlation/compute_correlation.py"
SILHOUETTE_SCRIPT = BASE / "evaluation/silhouette/silhouette_score.py"

for lp in LANG_PAIRS:
    !python {PEARSON_SCRIPT}    --dataset {lp}
    !python {SILHOUETTE_SCRIPT} --target-lang {lp}