MULTYPO provides realistic, keyboard-based typographical noise across 12+ languages, enabling robust evaluation, stress testing, and synthetic data generation for NLP and LLM systems.
It was originally designed for multilingual robustness research, and is now packaged for public use.
- Multilingual support (English, German, French, Russian, Greek, Arabic, Hindi, Bengali, Tamil, Armenian, Georgian, Hebrew)
- Keyboard-aware typo modeling
replace: wrong nearby keyinsert: accidental double pressdelete: skipped keytranspose: swapped left/right-hand keys
- Horizontal + vertical keyboard neighbors with configurable weights
- Configurable typo distributions
- Built-in excluding sets (numbers, number words, etc.)
- Register custom keyboards and ignoring sets
pip install multypofrom multypo import generate_typos
text = "This is an example sentence. And here is another one."
noisy = generate_typos(
text=text,
language="english",
typo_rate=0.25,
)
print(noisy)Example output:
Thi is an ezample sentence. Amd here is anoter one.
from multypo import MultiTypoGenerator
gen = MultiTypoGenerator(
language="english",
use_excluding_set=True,
typo_distribution={"delete":0.2, "insert":0.2, "replace":0.4, "transpose":0.2},
horizontal_vs_vertical=(2.0, 1.0),
)
typoed = gen.insert_typos_in_text(
"This is a test sentence. Second sentence here.",
typo_rate=0.3
)
print(typoed)from multypo import register_keyboard_layout
custom_keyboard = [
list("qwertyuiop"),
list("asdfghjkl"),
list("zxcvbnm"),
]
register_keyboard_layout(
lang_code="en-custom",
language="english-custom",
keyboard_rows=custom_keyboard,
left_keys=list("qwertasdfgzxcvb"),
right_keys=list("yuiophjklbnm"),
ignoring_set={"million", "billion", "42"},
)Use like any other language:
generate_typos("This is custom.", language="english-custom", typo_rate=0.3)Set global defaults:
from multypo import set_default_typo_distribution
set_default_typo_distribution({
"delete": 0.1,
"insert": 0.2,
"replace": 0.5,
"transpose": 0.2,
})Or override per call:
noisy = generate_typos(
text="Example",
language="english",
typo_rate=0.3,
typo_distribution={"delete":0.2, "insert":0.2, "replace":0.4, "transpose":0.2},
)from multypo import get_supported_languages
print(get_supported_languages())Of course, you can always register new languages and their custom keyboard layouts!
MIT License.
@misc{liu2025evaluating,
title={Evaluating Robustness of Large Language Models Against Multilingual Typographical Errors},
author={Yihong Liu and Raoyuan Zhao and Lena Altinger and Hinrich Schütze and Michael A. Hedderich},
year={2025},
eprint={2510.09536},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.09536},
}