Skip to content

CIawevy/TextPecker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TextPecker

TextPecker Website TextPecker Paper on arXiv TextPecker Model TextPecker Demo TextPecker-1.5M Dataset

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Hanshen Zhu1,*, Yuliang Liu1, Xuecheng Wu2, An-Lan Wang2, Hao Feng2, Dingkang Yang2, Chao Feng2, Can Huang2, Jingqun Tang2,†, Xiang Bai1,✉

1 Huazhong University of Science & Technology 2 ByteDance Project Lead.  Corresponding Author.

Abstract

Visual Text Rendering (VTR) remains a critical challenge in text‑to‑image generation, where even advanced models frequently produce text with structural anomalies such as distortion, blurriness, and misalignment. However, we find that leading MLLMs and specialist OCR models largely fail to perceive these structural anomalies, creating a critical bottleneck for both VTR evaluation and RL‑based optimization.
As a result, even state‑of‑the‑art generators (e.g., Seedream4.0, Qwen‑Image) still struggle to render structurally faithful text. To address this, we propose TextPecker, a plug-and-play structural anomaly perceptive RL strategy that mitigates noisy reward signals and works with any text-to-image generator. To enable this capability, we construct a recognition dataset with character‑level structural‑anomaly annotations and develop a stroke‑editing synthesis engine to expand structural‑error coverage. Experiments show that TextPecker consistently improves diverse text‑to‑image models; even on the well‑optimized Qwen‑Image, it significantly yields average gains of 4% in structural fidelity and 8.7% in semantic alignment for Chinese text rendering, establishing a new state-of-the-art in high-fidelity VTR. Our work fills a gap in VTR optimization, providing a foundational step towards reliable and structural faithful visual text generation.

📢 News

  • Feb 24, 2026: Our Arxiv Paper is now publicly available.
  • Feb 21, 2026: Our TextPecker has been accepted to CVPR 2026.
  • Feb 18, 2026: We released the LoRA weights for different TextPecker-optimized generative models, including: SD3.5-M, Flux.1-dev, Qwen-Image.
  • Feb 15, 2026: We released the official website, model, dataset for TextPecker.

🔥 Quick Start

Training, deployment, and evaluation of TextPecker are all built upon ms-swift. We currently provide two versions of model checkpoints: TextPecker-8B-Qwen3VL and TextPecker-8B-InternVL3. For detailed environment setup and model deployment/testing instructions, please refer to the official documentation.

1️⃣ Environment Setup

git clone https://github.com/CIawevy/TextPecker.git
cd TextPecker/train
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
conda create -n TextPecker python=3.11.13 -y
conda activate TextPecker
pip install -e .
cd ..
sh install_all.sh

2️⃣ Download Models & Dataset

We have uploaded our models and datasets to Hugging Face. You can download them using the provided scripts. Modify parameters (e.g., local paths, HF token) in scripts/download_models.sh and scripts/download_dataset.sh as needed, then run bash scripts/download_xxx.sh (for models / datasets). Additionally, refer to DATA to use our data engine for synthesizing your own datasets if needed.

3️⃣ Deployment (See TRAIN for more details.)

Example

bash train/deploy_textpecker.sh

4️⃣ Demo

After deployment, you can run the following command to try our demo:

python eval/TextPecker_eval/demo.py

🔥 Train & Eval

TextPecker training

TextPecker training, deployment, and evaluation are built on top of ms-swift. We provide backbone-specific training scripts under train folder. See TRAIN for more details.

VTR RL with TextPecker

Our RL framework builds on Flow-GRPO. We provide training code for optimizing text rendering models with TextPecker under ./RL/flow_grpo/. For details, please refer to RL.

Re-evaluate Benchmarks with TextPecker

TextPecker can evaluate text structural quality and image-level or box-level semantic consistency for any text generation or editing scenarios. We provide re-evaluation instructions for the following benchmarks: OneIG-Bench, CVTG-2K, LongText, TextAtlas, LeX-Bench, and TIIF-Bench. For more details, see EVAL.

🤗 Resource Collection

All fully open-sourced core resources for TextPecker are listed below:

Evaluator

Variant Model
InternVL-3 TextPecker-8B-InternVL3
Qwen3-VL TextPecker-8B-Qwen3VL

VTR Models

Variant Model
SD3.5-M SD3.5M-TextPecker-SQPA
Flux.1-dev Flux.1-dev-TextPecker-SQPA
Qwen-Image QwenImage-TextPecker-SQPA

Dataset & Engine

Type Link
Evaluator Dataset TextPecker-1.5M
VTR RL Dataset TextPecker-RL
Engine TextPecker-engine

Acknowledgement

We sincerely thank ms-swift, Flow-GRPO for their valuable methodological contributions.

Additionally, we appreciate the support of TextAtlas5M, LeX-10k, SynTIGER, WanJuan1.0, Flux.1-dev, Qwen-Image, SD3.5, CogView4, Kolors and Seedream4.0 for their role in data generation.

We also thank the evaluation benchmarks including CVTG-2K, LongText, OneIG-Bench, TIIF-Bench, TextAtlas and LeX-Bench for facilitating text rendering evaluation.

✍️ Citation

If you find TextPecker useful in your research or work, please cite our paper:

@article{zhu2026TextPecker,
  title   = {TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering},
  author  = {Zhu, Hanshen and Liu, Yuliang and Wu, Xuecheng and Wang, An-Lan and Feng, Hao and Yang, Dingkang and Feng, Chao and Huang, Can and Tang, Jingqun and Bai, Xiang},
  journal = {arXiv preprint arXiv:2602.20903},
  year    = {2026}
}

📜 License

TextPecker is licensed under the Apache 2.0.

About

[CVPR2026] TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages