AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

We release our ScreenSpot-Pro code for both UI-TARS and Qwen2.5-VL. All hyperparameters and prompts are not carefully tuned.

ScreenSpot-Pro

Please first download the data from ScreenSpot-Pro Hugging Face and put images and annotations folders under the same directory with code. Then, launch inference servers for different models (commands are listed below; the model names and ports have already been mapped inside the code). Finally, run bash run_ss_pro_xxx.sh.

You can use summarize_results.py to output ScreenSpot-Pro results categorically, following the order presented in our Table 1. One Example:

python summarize_results.py results/qwen25vl_AutoFocus.json 

# output: 
# results/qwen25vl_AutoFocus_72b_PPL.json 85.1 & 32.9 & 60.2 & 77.7 & 40.1 & 62.3 & 74.1 & 33.9 & 64.1 & 91.0 & 39.4 & 69.4 & 87.6 & 60.8 & 81.6 & 78.8 & 31.4 & 57.4 & 82.1 & 38.1 & 65.6 1545

You can turn on --debug inside eval_screenspot_pro_RegionFocus.py to save intermediate RegionFocus step images, such as image-as-map stars for judgment, zoom-ins, and projecting zoomed-in predictions back onto the original input.

Command for launching Qwen2.5-VL-72B

Please first install https://github.com/QwenLM/Qwen-Agent.

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
vllm serve Qwen/Qwen2.5-VL-72B-Instruct --port 8300  --dtype bfloat16   --limit-mm-per-prompt '{"images": 5}'   --tensor-parallel-size 8

Citation Information

If you find our code or paper useful, please consider citing:

@article{autofocus2026,
      title={AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding},
      author={Anonymous},
      year={2026},
}

Acknowledge

This codebase is partially based on RegionFocus. Many thanks!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
models		models
README.md		README.md
README_AutoFocus.md		README_AutoFocus.md
eval_screenspot_pro_AutoFocus.py		eval_screenspot_pro_AutoFocus.py
qwen_utils_agent_function_call.py		qwen_utils_agent_function_call.py
run_ss_pro_qwen25vl_AutoFocus.sh		run_ss_pro_qwen25vl_AutoFocus.sh
summarize_results.py		summarize_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

ScreenSpot-Pro

Citation Information

Acknowledge

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

ScreenSpot-Pro

Citation Information

Acknowledge

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages