TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Hanshen Zhu^1,*, Yuliang Liu¹, Xuecheng Wu², An-Lan Wang², Hao Feng², Dingkang Yang², Chao Feng², Can Huang², Jingqun Tang^2,†, Xiang Bai^1,✉

¹ Huazhong University of Science & Technology ² ByteDance ^† Project Lead. ^✉ Corresponding Author.

Abstract

Visual Text Rendering (VTR) remains a critical challenge in text‑to‑image generation, where even advanced models frequently produce text with structural anomalies such as distortion, blurriness, and misalignment. However, we find that leading MLLMs and specialist OCR models largely fail to perceive these structural anomalies, creating a critical bottleneck for both VTR evaluation and RL‑based optimization.
As a result, even state‑of‑the‑art generators (e.g., Seedream4.0, Qwen‑Image) still struggle to render structurally faithful text. To address this, we propose TextPecker, a plug-and-play structural anomaly perceptive RL strategy that mitigates noisy reward signals and works with any text-to-image generator. To enable this capability, we construct a recognition dataset with character‑level structural‑anomaly annotations and develop a stroke‑editing synthesis engine to expand structural‑error coverage. Experiments show that TextPecker consistently improves diverse text‑to‑image models; even on the well‑optimized Qwen‑Image, it significantly yields average gains of 4% in structural fidelity and 8.7% in semantic alignment for Chinese text rendering, establishing a new state-of-the-art in high-fidelity VTR. Our work fills a gap in VTR optimization, providing a foundational step towards reliable and structural faithful visual text generation.

📢 News

Feb 24, 2026: Our Arxiv Paper is now publicly available.
Feb 21, 2026: Our TextPecker has been accepted to CVPR 2026.
Feb 18, 2026: We released the LoRA weights for different TextPecker-optimized generative models, including: SD3.5-M, Flux.1-dev, Qwen-Image.
Feb 15, 2026: We released the official website, model, dataset for TextPecker.

🔥 Quick Start

Training, deployment, and evaluation of TextPecker are all built upon ms-swift. We currently provide two versions of model checkpoints: TextPecker-8B-Qwen3VL and TextPecker-8B-InternVL3. For detailed environment setup and model deployment/testing instructions, please refer to the official documentation.

1️⃣ Environment Setup

git clone https://github.com/CIawevy/TextPecker.git
cd TextPecker/train
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
conda create -n TextPecker python=3.11.13 -y
conda activate TextPecker
pip install -e .
cd ..
sh install_all.sh

2️⃣ Download Models & Dataset

We have uploaded our models and datasets to Hugging Face. You can download them using the provided scripts. Modify parameters (e.g., local paths, HF token) in scripts/download_models.sh and scripts/download_dataset.sh as needed, then run bash scripts/download_xxx.sh (for models / datasets). Additionally, refer to DATA to use our data engine for synthesizing your own datasets if needed.

3️⃣ Deployment （See TRAIN for more details.）

Example

bash train/deploy_textpecker.sh

4️⃣ Demo

After deployment, you can run the following command to try our demo:

python eval/TextPecker_eval/demo.py

🔥 Train & Eval

TextPecker training

TextPecker training, deployment, and evaluation are built on top of ms-swift. We provide backbone-specific training scripts under train folder. See TRAIN for more details.

VTR RL with TextPecker

Our RL framework builds on Flow-GRPO. We provide training code for optimizing text rendering models with TextPecker under ./RL/flow_grpo/. For details, please refer to RL.

Re-evaluate Benchmarks with TextPecker

TextPecker can evaluate text structural quality and image-level or box-level semantic consistency for any text generation or editing scenarios. We provide re-evaluation instructions for the following benchmarks: OneIG-Bench, CVTG-2K, LongText, TextAtlas, LeX-Bench, and TIIF-Bench. For more details, see EVAL.

🤗 Resource Collection

All fully open-sourced core resources for TextPecker are listed below:

Evaluator

Variant	Model
InternVL-3	TextPecker-8B-InternVL3
Qwen3-VL	TextPecker-8B-Qwen3VL

VTR Models

Variant	Model
SD3.5-M	SD3.5M-TextPecker-SQPA
Flux.1-dev	Flux.1-dev-TextPecker-SQPA
Qwen-Image	QwenImage-TextPecker-SQPA

Dataset & Engine

Type	Link
Evaluator Dataset	TextPecker-1.5M
VTR RL Dataset	TextPecker-RL
Engine	TextPecker-engine

Acknowledgement

We sincerely thank ms-swift, Flow-GRPO for their valuable methodological contributions.

Additionally, we appreciate the support of TextAtlas5M, LeX-10k, SynTIGER, WanJuan1.0, Flux.1-dev, Qwen-Image, SD3.5, CogView4, Kolors and Seedream4.0 for their role in data generation.

We also thank the evaluation benchmarks including CVTG-2K, LongText, OneIG-Bench, TIIF-Bench, TextAtlas and LeX-Bench for facilitating text rendering evaluation.

✍️ Citation

If you find TextPecker useful in your research or work, please cite our paper:

@article{zhu2026TextPecker,
  title   = {TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering},
  author  = {Zhu, Hanshen and Liu, Yuliang and Wu, Xuecheng and Wang, An-Lan and Feng, Hao and Yang, Dingkang and Feng, Chao and Huang, Can and Tang, Jingqun and Bai, Xiang},
  journal = {arXiv preprint arXiv:2602.20903},
  year    = {2026}
}

📜 License

TextPecker is licensed under the Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
RL		RL
assets		assets
eval		eval
examples		examples
scripts		scripts
train		train
README.md		README.md
span.log		span.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Abstract

📢 News

🔥 Quick Start

🔥 Train & Eval

TextPecker training

VTR RL with TextPecker

Re-evaluate Benchmarks with TextPecker

🤗 Resource Collection

Evaluator

VTR Models

Dataset & Engine

Acknowledgement

✍️ Citation

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Abstract

📢 News

🔥 Quick Start

🔥 Train & Eval

TextPecker training

VTR RL with TextPecker

Re-evaluate Benchmarks with TextPecker

🤗 Resource Collection

Evaluator

VTR Models

Dataset & Engine

Acknowledgement

✍️ Citation

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages