# Open-Set LID Prototype – Colab

This notebook trains the multilingual classifier (SEEN langs), extracts logits on UNSEEN langs, then plots OOD PR curves (MSP, Energy, Mahalanobis) and a reliability diagram (ECE).

In [None]:
!nvidia-smi || echo "CPU runtime"
%cd /content
REPO_URL = "https://github.com/Vabi-tech/open-set-lid-prototype.git"
!rm -rf open-set-lid-prototype && git clone $REPO_URL
%cd /content/open-set-lid-prototype
!pip install -q -r requirements.txt
%env WANDB_DISABLED=true
%env HF_HUB_DISABLE_TELEMETRY=1
%env PYTHONPATH=/content/open-set-lid-prototype

## 1) Train on SEEN languages

In [None]:
!python -m src.train_classifier \
  --dataset papluca \
  --seen_langs en,es,de,fr,it \
  --output_dir models/lid-seen \
  --epochs 3 --batch_size 16 --lr 3e-5

## 2) Extract logits/features for SEEN test + UNSEEN languages

In [None]:
!python -m src.extract_logits \
  --dataset papluca \
  --model_dir /content/open-set-lid-prototype/models/lid-seen \
  --seen_langs en,es,de,fr,it \
  --unseen_langs pt,ru,sv \
  --max_per_lang 2000 \
  --out_path outputs/preds.jsonl

## 3) Plot OOD PR curves + reliability diagram

In [None]:
!python scripts/plot_ood_and_calibration.py \
  --preds outputs/preds.jsonl \
  --out_dir outputs

from IPython.display import Image, display
display(Image('outputs/pr_msp.png'))
display(Image('outputs/pr_energy.png'))
display(Image('outputs/pr_mahalanobis.png'))
display(Image('outputs/reliability.png'))