MMTIT-Bench

A Multilingual and Multi-Scenario Benchmark with Cognition–Perception–Reasoning Guided Text-Image Machine Translation

中文版 • Paper • GitHub • HuggingFace

Overview

MMTIT-Bench is a human-verified benchmark for end-to-end Text-Image Machine Translation (TIMT). It contains 1,400 images spanning 14 non-English and non-Chinese languages across diverse real-world scenarios, with bilingual (Chinese & English) translation annotations.

We also propose CPR-Trans (Cognition–Perception–Reasoning for Translation), a reasoning-oriented data paradigm that unifies scene cognition, text perception, and translation reasoning within a structured chain-of-thought framework.

Benchmark Statistics

Item	Details
Total Images	1,400
Languages	14 (AR, DE, ES, FR, ID, IT, JA, KO, MS, PT, RU, TH, TR, VI)
Translation Directions	Other→Chinese, Other→English
Scenarios	Documents, Menus, Books, Attractions, Posters, Commodities, etc.
Annotation	Human-verified OCR + Bilingual translations

Data Format

Directory Structure

MMTIT-Bench/
├── README.md
├── README_ZH.md
├── annotation.jsonl        # Benchmark annotations
├── images.zip              # Benchmark images
├── eval_comet_demo.py      # COMET evaluation script
└── prediction_demo.jsonl   # Example prediction file

Annotation (`annotation.jsonl`)

Each line is a JSON object:

{
    "image_id": "Korea_Menu_20843.jpg",
    "parsing_anno": "멜로우스트리트\n\n위치: 서울특별시 관악구...",
    "translation_zh": "梅尔街\n\n位置：首尔特别市 冠岳区...",
    "translation_en": "Mellow Street\n\nLocation: 1st Floor, 104 Gwanak-ro..."
}

Field	Description
`image_id`	Image filename, formatted as `{Language}_{Scenario}_{ID}.jpg`
`parsing_anno`	OCR text parsing annotation (source language)
`translation_zh`	Chinese translation
`translation_en`	English translation

Prediction File

Your prediction file should be a JSONL with the following fields:

{"image_id": "Korea_Menu_20843.jpg", "pred": "Your model's translation output"}

Evaluation

We use COMET (Unbabel/wmt22-comet-da) as the rule-based evaluation metric.

Install

pip install unbabel-comet

Run

# Other → Chinese
python eval_comet_demo.py \
    --prediction your_prediction.jsonl \
    --annotation annotation.jsonl \
    --direction other2zh \
    --batch_size 16 --gpus 0

# Other → English
python eval_comet_demo.py \
    --prediction your_prediction.jsonl \
    --annotation annotation.jsonl \
    --direction other2en \
    --batch_size 16 --gpus 1

Arguments

Argument	Default	Description
`--prediction`	(required)	Path to your prediction JSONL
`--annotation`	`annotation.jsonl`	Path to benchmark annotations
`--direction`	(required)	`other2zh` or `other2en`
`--batch_size`	`16`	Batch size for inference
`--gpus`	`0`	Number of GPUs (0 = CPU)
`--output`	`comet_results_{direction}.jsonl`	Output path for per-sample scores

Citation

@misc{li2026mmtitbench,
      title={MMTIT-Bench: A Multilingual and Multi-Scenario Benchmark with Cognition-Perception-Reasoning Guided Text-Image Machine Translation},
      author={Gengluo Li and Chengquan Zhang and Yupu Liang and Huawen Shen and Yaping Zhang and Pengyuan Lyu and Weinong Wang and Xingyu Wan and Gangyan Zeng and Han Hu and Can Ma and Yu Zhou},
      year={2026},
      journal={arXiv preprint arXiv:2603.23896},
      url={https://arxiv.org/abs/2603.23896},
}

License

This benchmark is released for research purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
.gitattributes		.gitattributes
.gitignore		.gitignore
MMTIT_Bench.pdf		MMTIT_Bench.pdf
README.md		README.md
README_ZH.md		README_ZH.md
annotation.jsonl		annotation.jsonl
eval_comet_demo.py		eval_comet_demo.py
images.zip		images.zip
prediction_demo.jsonl		prediction_demo.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMTIT-Bench

Overview

Benchmark Statistics

Data Format

Directory Structure

Annotation (`annotation.jsonl`)

Prediction File

Evaluation

Install

Run

Arguments

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MMTIT-Bench

Overview

Benchmark Statistics

Data Format

Directory Structure

Annotation (annotation.jsonl)

Prediction File

Evaluation

Install

Run

Arguments

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Annotation (`annotation.jsonl`)

Packages