ETCHR: Editing To Clarify and Harness Reasoning

Beichen Zhang^* · Yuhong Liu^* · Jinsong Li · Yuhang Zang^† · Jiaqi Wang^† · Dahua Lin^†

^*Equal Contribution ^†Corresponding authors.

📢 News

🚀 [2026/05/24] We have released the training and evaluation code of ETCHR.
🚀 [2026/05/21] We have released the ETCHR-FLUX.2-klein-9B Model, ETCHR-SFT-400K Dataset and ETCHR GRPO-10K Dataset.

🌈 Overview

We are thrilled to introduce ETCHR (Editing To Clarify and Harness Reasoning), a novel question-conditioned, reasoning-aware image editor designed to serve as a decoupled visual reasoning assistant for Multimodal Large Language Models (MLLMs).

By decoupling the specialized image editor from the downstream understanding model, ETCHR bridges the critical bottleneck where a purely textual chain of thought fails in fine-grained focus or complex spatial transformations.

💡 Highlights

🔥 Decoupled & Plug-and-Play: ETCHR functions as a separate module, allowing it to assist diverse downstream MLLMs (such as Qwen3-VL-8B, Gemini-3.1-Flash-Lite, or Kimi K2.5) without requiring any task-specific fine-tuning on the understanding models themselves.
🔥 Naturally Reflective Pipeline: Introduces an Edit-Verify-Reason inference mechanism where the understanding model filters out noisy or flawed edits, reverting safely to the original image when verification fails.

📊 Results

We evaluate ETCHR across five distinct task families spanning fine-grained perception, chart understanding, logic reasoning, jigsaw restoration, and 3D understanding. Across all evaluated backbones, ETCHR consistently yields major improvements in Pass@1 accuracy:

🛠️ Evaluation

Prepare your environment:

git clone https://github.com/InternLM/ETCHR.git
conda create -n ETCHR python==3.11
conda activate ETCHR
cd RL/Pref-GRPO
bash env_setup.sh fastvideo
pip install "vllm>=0.11.0"
pip install qwen-vl-utils==0.0.14

We Provide an example code running ETCHR on DL3DV-2K Benchmark in Evaluation/inference_dl3dv.py, you can start the evaluation with the following two steps:

Step 1: start a VLLM server for an understanding model (eg. Qwen3-VL-8B, Kimi K2.5, ...).

cd Evaluation
bash launch_vllm.sh

Step 2: Run ETCHR atop any understanding model

python inference_dl3dv.py

🛠️ Training

We adopt a two-stage Training Pipeline. See SFT.md and RL.md for further details.

Cases

ETCHR can assist with a broad spectrum of understanding tasks, including fine-grained perception, chart reasoning, maze navigation, jigsaw puzzles, and 3D spatial understanding.

✒️Citation

If you find this project useful, please kindly cite:

📄 License

Our work is based on FLUX.2-klein-base-9B, so please follow FLUX Non-Commercial License.

❤️ Acknowledgement

The work is built upon DiffSynth-Studio and Pref-GRPO, two excellent codebases for Diffusion models training!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Evaluation		Evaluation
RL		RL
SFT		SFT
assets		assets
.gitattributes		.gitattributes
LICENSE-FLUX-NON-COMMERICAL.txt		LICENSE-FLUX-NON-COMMERICAL.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETCHR: Editing To Clarify and Harness Reasoning

📢 News

🌈 Overview

💡 Highlights

📊 Results

🛠️ Evaluation

🛠️ Training

Cases

✒️Citation

📄 License

❤️ Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ETCHR: Editing To Clarify and Harness Reasoning

📢 News

🌈 Overview

💡 Highlights

📊 Results

🛠️ Evaluation

🛠️ Training

Cases

✒️Citation

📄 License

❤️ Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages