Authors:
Jonathan Mutal×, Perla Al Almaoui×, Simon Hengchen×+, and Pierrette Bouillon×
Affiliations:
×TIM, University of Geneva
+iguanodon.ai
Paper:
Aladdin-FTI @ AMIYA: Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation
If you use (part(s) of) this code, or models, in your research, please cite the following paper:
@inproceedings{mutal2026aladdinfti,
title = {Aladdin-FTI @ AMIYA: Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation},
author = {Mutal, Jonathan and Al Almaoui, Perla and Hengchen, Simon and Bouillon, Pierrette},
booktitle = {Proceedings of the AMIYA Shared Task, co-located with VarDial at EACL 2026},
year = {2026},
address = {Rabat, Morocco},
publisher = {Association for Computational Linguistics},
}
Models:
🤗 Available on Hugging Face
Utilities to fine-tune, generate, and evaluate causal LLMs for machine translation (MT) and dialect experiments.
This repository is organized around:
- a Typer CLI with YAML configuration file (LoRA + optional quantization, custom eval hooks) (
scripts/python/finetune/instruct.py) - batch generation, automatic evaluation (ChrF++, SpBLEU, + dialect-ID based fidelity scores), and MBR ranking
- Fine-tuning
- Full fine-tuning (TRL / SFTTrainer) or LoRA/PEFT
- YAML configs for reproducible runs (training hyperparams, dataset lists, metrics, checkpoint selection)
- Generation
- Single prompt generation
- Batch generation from a text file
- Evaluation
- MT metrics: ChrF++ and SpBLEU (SacreBLEU)
- Dialect/fidelity scoring via ADI-family models (ALDI/NADI) + fastText language-ID helpers
- Ranking
- MBR-style ranking script (
scripts/python/rank/mbr_rank.py)
- MBR-style ranking script (
- HPC-friendly
- SLURM scripts (
scripts/slurm/**) + bash wrappers (scripts/bash/**)
- SLURM scripts (
.
├── configs.py # Small helper configs (LoRA, SFTConfig, prompt templates)
├── configs/ # YAML training configs (instruct pipeline)
│ └── instruct/*.yaml
├── scripts/
│ ├── python/
│ │ ├── finetune/ # fine-tuning entrypoints
│ │ ├── generate/ # generation entrypoints
│ │ ├── evaluate/ # evaluation (ChrF/SpBLEU + dialect/fidelity)
│ │ ├── preprocess/ # dataset preparation helpers
│ │ ├── rank/ # MBR ranking
│ ├── slurm/ # SLURM job scripts (call the python entrypoints)
│ └── bash/ # wrappers to submit batches
└── uv.lock # dependency lockfile (uv)
- Python: 3.10+
- PyTorch: 2.x
- CUDA GPU recommended
Key libraries (see uv.lock): transformers, datasets, trl, peft, accelerate, sacrebleu, fasttext, huggingface_hub, typer, pyyaml (for YAML pipeline).
The original README targets an HPC module environment. Adapt the module load lines to your cluster.
ml load GCCcore/11.3.0 Python/3.10.4 CUDA/12.8.0curl -LsSf https://astral.sh/uv/install.sh | shuv venv .env
source .env/bin/activate
uv syncuv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128./scripts/bash/health.sh
# or
uv run scripts/python/health.pyThe instruct.py pipeline reads a YAML config (see configs/instruct/*.yaml) and supports:
- LoRA + optional quantization
- multiple datasets (train/eval) from disk (
datasets.load_from_disk) - structured logging (log file + metrics jsonl)
- evaluation hooks + “best checkpoint” tracking
Example:
uv run scripts/python/finetune/instruct.py --help
uv run scripts/python/finetune/instruct.py train configs/instruct/mt-fidelity-all-small-template-llama-8B.yamluv run scripts/python/generate/generate.py generate "Translate: Bonjour" --model-path SmolLM3-3B-aladdinFTI-sft-trl --method trluv run scripts/python/generate/generate.py generate-batch data/prompts.txt --model-path SmolLM3-3B-aladdinFTI-sft-trl --method trl --batch-size 16 --max-new-tokens 256The evaluator supports .txt, .out, and .csv inputs:
- MT metrics (SacreBLEU):
- ChrF++
- SpBLEU (optional) (
BLEU(tokenize="flores200"))
- Fidelity / dialect scores:
- ADI-family classification models (ALDI/NADI) + mapping utilities in
scripts/python/evaluate/maps.py
- ADI-family classification models (ALDI/NADI) + mapping utilities in
Example (see help for exact flags):
uv run scripts/python/evaluate/evaluator.py --helpUse mbr_rank.py to rank candidate generations using reference-free/metric-based selection.
uv run scripts/python/rank/mbr_rank.py --helpCluster-friendly wrappers exist in:
scripts/bash/rank/*scripts/slurm/rank/mbr_rank.sh
You can submit jobs directly with the provided SLURM scripts.
YAML training (example):
sbatch scripts/slurm/finetune/instruct.sh configs/instruct/mt-fidelity-all-small-template-llama-8B.yamlBatch generation:
sbatch scripts/slurm/generate/generate-batch.sh path/to/prompts.txt SmolLM3-3B-aladdinFTI-sft-trl trl