# AskQE — Answerability-Check Extension Pipeline

This notebook runs the full **answerability-check extension** of AskQE on Kaggle. It reproduces the pipeline end-to-end:

1. **Answerability Check** — filter Vanilla questions with Longformer & ELECTRA
2. **Question Answering** — answer filtered questions on source & back-translated sentences
3. **Add Questions to BT** — merge filtered Q&A into back-translation files
4. **QA on Back-Translations** — answer questions on perturbed back-translations
5. **Evaluation** — compute SBERT and string-comparison (F1, CHRF, BLEU) metrics

> **Runtime**: Requires **GPU** (Kaggle T4 is sufficient). Total ~3 hours.

In [None]:
!git clone https://github.com/AlessandroMaini/CucumBERT_askqe.git -b answerability-check
%%capture
!pip install -q -r CucumBERT_askqe/requirements.txt

## 1. Answerability Check — Filter Vanilla Questions

Run the answerability-check filter on the Vanilla QG output using both models:
- **Longformer** (`potsawee/longformer-large-4096-answerable-squad2`) — sequence classification
- **ELECTRA** (`deepset/electra-base-squad2`) — extractive QA with null-score calibration

Output: `questions-anscheck-longformer.jsonl` and `questions-anscheck-electra.jsonl`

In [None]:
import sys
from pathlib import Path

BASE = Path('CucumBERT_askqe')

# Run answerability check for both models
!python {BASE / 'QG/answerability-check/answerability_check.py'} \
    --input_file QG/qwen3-4b/questions-vanilla.jsonl \
    --anscheck_type longformer

!python {BASE / 'QG/answerability-check/answerability_check.py'} \
    --input_file QG/qwen3-4b/questions-vanilla.jsonl \
    --anscheck_type electra

## 2. Question Answering — Answer Filtered Questions on Source

Load `Qwen/Qwen3-4B-Instruct-2507` once, then answer the filtered question sets.
Each variant produces a QA output file used downstream.

In [None]:
# Add QA module to path and load the model
sys.path.insert(0, str(BASE / 'QA/code'))

from qa_qwen3_4b import QuestionAnswerer
qa_engine = QuestionAnswerer(model_id="Qwen/Qwen3-4B-Instruct-2507")

In [None]:
VARIANTS = ['longformer', 'electra']
SENTENCE_KEY = 'en'

for variant in VARIANTS:
    qa_engine.answer_questions(
        input_file=f"QG/qwen3-4b/questions-anscheck-{variant}.jsonl",
        pipeline_type="anscheck",
        sentence_key=SENTENCE_KEY,
        check_variant=variant
    )

## 3. Merge Filtered Questions into Back-Translations

For each anscheck variant × perturbation × language pair, merge the filtered questions
into the existing back-translation files. This produces new BT files with the
anscheck question sets attached.

In [None]:
VARIANTS = ['anscheck-longformer', 'anscheck-electra']
PERTURBATIONS = ['synonym', 'alteration', 'expansion_noimpact', 'omission']
LANG_PAIRS = ['en-es', 'en-fr']

for variant in VARIANTS:
    qg_file = f"QG/qwen3-4b/questions-{variant}.jsonl"
    for pert in PERTURBATIONS:
        for lp in LANG_PAIRS:
            target = f"backtranslation/{lp}/bt-{pert}.jsonl"
            output = f"backtranslation/{lp}/bt-{pert}-{variant}.jsonl"
            print(f"  {variant} | {pert} | {lp}")
            !python {BASE / 'backtranslation/add_questions.py'} \
                --qg_file {qg_file} --target_file {target} --output_file {output}

## 4. QA on Back-Translated Perturbations

Answer the same filtered questions, this time on the **back-translated perturbed** sentences.
The comparison between source answers (step 2) and BT answers (this step) is the core of AskQE.

In [None]:
VARIANTS = ['longformer', 'electra']
PERTURBATIONS = ['synonym', 'alteration', 'expansion_noimpact', 'omission']

# en→es
for variant in VARIANTS:
    for pert in PERTURBATIONS:
        inp = f"backtranslation/en-es/bt-{pert}-anscheck-{variant}.jsonl"
        print(f"  {inp}")
        qa_engine.answer_questions(
            input_file=inp, pipeline_type="anscheck",
            sentence_key='bt_pert_es', check_variant=variant)

# en→fr
for variant in VARIANTS:
    for pert in PERTURBATIONS:
        inp = f"backtranslation/en-fr/bt-{pert}-anscheck-{variant}.jsonl"
        print(f"  {inp}")
        qa_engine.answer_questions(
            input_file=inp, pipeline_type="anscheck",
            sentence_key='bt_pert_fr', check_variant=variant)

## 5. Evaluation

Compute AskQE answer-comparison metrics on the anscheck pipeline outputs:
- **SBERT** — semantic similarity between source and BT answers
- **String comparison** — F1, CHRF, BLEU token overlap

In [None]:
!python {BASE / 'evaluation/sbert/sbert.py'} --model "qwen3-4b"

In [None]:
!python {BASE / 'evaluation/string-comparison/string_comparison.py'} --model "qwen3-4b"