# BIOMQM Question Answering Pipeline

This notebook runs the QA pipeline for BIOMQM dataset.

**Steps:**
1. **QA Source** - Generate answers for 489 unique source sentences (run FIRST)
2. **QA BT** - Generate answers for each language's backtranslations (can run in PARALLEL)
3. **Mapping** - Combine source and bt answers into 5216 complete rows
4. **Evaluation** - Calculate metrics per language and severity

**Languages:** de, es, fr, ru, zh-CN

**Compatible with:** Google Colab & Kaggle

## 0. Environment Setup

In [None]:
import os
import sys
import subprocess

IN_COLAB = 'google.colab' in sys.modules
IN_KAGGLE = 'KAGGLE_KERNEL_RUN_TYPE' in os.environ

print(f'Running on: {"Colab" if IN_COLAB else "Kaggle" if IN_KAGGLE else "Local"}')

if IN_COLAB:
    from google.colab import drive
    drive.mount('/content/drive')
    
    DRIVE_CACHE_DIR = '/content/drive/MyDrive/AskQE_Models_Cache'
    os.makedirs(DRIVE_CACHE_DIR, exist_ok=True)
    
    os.environ['HF_HOME'] = DRIVE_CACHE_DIR
    os.environ['TRANSFORMERS_CACHE'] = os.path.join(DRIVE_CACHE_DIR, 'transformers')
    os.environ['SENTENCE_TRANSFORMERS_HOME'] = os.path.join(DRIVE_CACHE_DIR, 'sentence_transformers')
    
    print(f'Model cache directory: {DRIVE_CACHE_DIR}')
else:
    print('Not running in Colab - using default cache directories')

In [None]:
subprocess.run([sys.executable, '-m', 'pip', 'install', '-q', 
                'transformers', 'torch', 'accelerate', 'nltk', 
                'sentence-transformers', 'sacrebleu', 'textstat'], check=True)

REPO_URL = 'https://github.com/FedeBaldi-28/askqe-weighted-extension.git'

if IN_COLAB:
    PROJECT_ROOT = '/content/askqe'
elif IN_KAGGLE:
    PROJECT_ROOT = '/kaggle/working/askqe'
else:
    PROJECT_ROOT = os.getcwd()

if not os.path.exists(PROJECT_ROOT):
    subprocess.run(['git', 'clone', REPO_URL, PROJECT_ROOT], check=True)
    print(f'Repository cloned to: {PROJECT_ROOT}')
else:
    print(f'Repository already exists at: {PROJECT_ROOT}')

RESULTS_DIR = os.path.join(PROJECT_ROOT, 'results Qwen3B baseline')
os.makedirs(RESULTS_DIR, exist_ok=True)

---
## 1. QA Source (Run FIRST!)

Generates answers for 489 unique source sentences.
This must complete before running BT QA.

In [None]:
os.chdir(os.path.join(PROJECT_ROOT, 'QA', 'code'))
subprocess.run([sys.executable, '-u', 'qwen-3b-biomqm.py', '--mode', 'source', '--pipeline', 'vanilla'], check=True)
print('\n✓ Source QA completed!')

---
## 2. QA BT per Language

Run these cells AFTER source QA is complete.
You can run them in parallel on different GPU instances if available.

Each language generates answers for unique (src, bt_tgt) pairs.

In [None]:
# German (de) - 1309 rows
os.chdir(os.path.join(PROJECT_ROOT, 'QA', 'code'))
subprocess.run([sys.executable, '-u', 'qwen-3b-biomqm.py', '--mode', 'bt', '--lang', 'de', '--pipeline', 'vanilla'], check=True)
print('\n✓ German BT QA completed!')

In [None]:
# Spanish (es) - 801 rows
os.chdir(os.path.join(PROJECT_ROOT, 'QA', 'code'))
subprocess.run([sys.executable, '-u', 'qwen-3b-biomqm.py', '--mode', 'bt', '--lang', 'es', '--pipeline', 'vanilla'], check=True)
print('\n✓ Spanish BT QA completed!')

In [None]:
# French (fr) - 757 rows
os.chdir(os.path.join(PROJECT_ROOT, 'QA', 'code'))
subprocess.run([sys.executable, '-u', 'qwen-3b-biomqm.py', '--mode', 'bt', '--lang', 'fr', '--pipeline', 'vanilla'], check=True)
print('\n✓ French BT QA completed!')

In [None]:
# Russian (ru) - 680 rows
os.chdir(os.path.join(PROJECT_ROOT, 'QA', 'code'))
subprocess.run([sys.executable, '-u', 'qwen-3b-biomqm.py', '--mode', 'bt', '--lang', 'ru', '--pipeline', 'vanilla'], check=True)
print('\n✓ Russian BT QA completed!')

In [None]:
# Chinese (zh-CN) - 1669 rows
os.chdir(os.path.join(PROJECT_ROOT, 'QA', 'code'))
subprocess.run([sys.executable, '-u', 'qwen-3b-biomqm.py', '--mode', 'bt', '--lang', 'zh-CN', '--pipeline', 'vanilla'], check=True)
print('\n✓ Chinese BT QA completed!')

---
## 3. Mapping

Combines source and bt answers to reconstruct all 5216 rows.
Run AFTER all QA jobs are complete.

In [None]:
os.chdir(os.path.join(PROJECT_ROOT, 'QA', 'code'))
subprocess.run([sys.executable, 'mapping_biomqm.py', '--pipeline', 'vanilla'], check=True)
print('\n✓ Mapping completed!')

---
## 4. Verification

Check output files and row counts.

In [None]:
import os

results_dir = os.path.join(PROJECT_ROOT, 'results Qwen3B baseline', 'QA', 'biomqm')

# Check unique answers
print("=" * 50)
print("UNIQUE ANSWERS")
print("=" * 50)
unique_dir = os.path.join(results_dir, "unique")
if os.path.exists(unique_dir):
    for f in sorted(os.listdir(unique_dir)):
        path = os.path.join(unique_dir, f)
        with open(path, 'r') as file:
            count = sum(1 for _ in file)
        print(f"{f}: {count} rows")

# Check mapped files
print("\n" + "=" * 50)
print("MAPPED FILES (should sum to 5216)")
print("=" * 50)
mapped_dir = os.path.join(results_dir, "mapped")
total = 0
if os.path.exists(mapped_dir):
    for f in sorted(os.listdir(mapped_dir)):
        path = os.path.join(mapped_dir, f)
        with open(path, 'r') as file:
            count = sum(1 for _ in file)
        print(f"{f}: {count} rows")
        total += count
    print(f"\nTOTAL: {total} rows")

---
## 5. Evaluation (Optional)

Run evaluation to calculate metrics.

In [None]:
os.chdir(os.path.join(PROJECT_ROOT, 'evaluation', 'string-comparison'))
subprocess.run([sys.executable, 'string_comparison_biomqm.py'], check=True)
print('\n✓ String comparison completed!')

---
## Pipeline Complete!