Benchmarking Data Visualization Agents in Real-World Scenarios

🌐 Website | 📑 Paper | 🤗 Dataset | 🐥 Twitter

📰 News

2026-04-28: 🔥 We release the DV-World dataset and the paper.

Overview

DV-Sheet focuses on native spreadsheet visualization workflows. Instead of generating standalone plotting code, an agent must directly manipulate spreadsheet workbooks to create charts, repair broken visualizations, and assemble dashboards under realistic software constraints.

DV-Evolution targets cross-modal and cross-framework visualization adaptation. Given a reference visual artifact, a new dataset, and modification requirements, the agent must infer the original visual semantics and produce an updated executable visualization in a target framework such as Python, D3.js, Plotly.js, Vega-Lite, or Apache ECharts.

DV-Interact evaluates proactive clarification and intent alignment in ambiguous visualization tasks. The agent operates in a stateful environment and interacts with a user simulator, testing whether it can ask appropriate questions, resolve ambiguity through interaction, and avoid assumption-first execution.

🔍 Installation

Set up the environment using the following commands:

conda create -n dvworld python=3.12
conda activate dvworld

pip install -r requirements.txt

🚀 Quick access DV-World Dataset

The dataset is hosted at:

https://huggingface.co/datasets/DV-World/dvworld

After downloading, place the files into the corresponding gold and tasks folders:

dv-evolution/gold and dv-evolution/tasks
dv-interact/gold and dv-interact/tasks
dv-sheet/gold and dv-sheet/tasks

🚀 Quickstart

Each task family has its own baseline runner:

dv-evolution/dvworld-agent-evolution
dv-interact/dvworld-agent-interact
dv-sheet/dvworld_agent_sheet

The typical workflow is:

Download the dataset into the task-specific gold and tasks folders.
Configure the model in dvworld_agent_fcmode/agent/config.py inside the corresponding agent directory.
Run the agent with run.py.
Convert raw outputs into evaluation format with get_results.py.
Evaluate the converted results with the matching script in evaluation_suite.

Agent-specific usage guides:

⚖️ Evaluation

Evaluation is organized by task family inside evaluation_suite.

Converted candidate outputs are expected under:

evaluation_suite/results/<run_name>

Evaluation outputs are written to:

evaluation_suite/model_score/<run_name>

Task-specific evaluators:

evaluation_suite/dv_evolution/run_eval.py
evaluation_suite/dv_interact/run_eval.py
evaluation_suite/dvsheet_create/run_eval.py
evaluation_suite/dvsheet_dashboards/run_eval.py
evaluation_suite/dvsheet_fix/run_eval.py

Evaluation guides:

⚠️ Platform Notes

DV-Evolution and DV-Interact can be run in a standard Python environment.
DV-Sheet evaluation should be run on Windows.
In particular, dvsheet-create, dvsheet-dashboards, and dvsheet-fix rely on Excel-related workflows during evaluation.

📋 Leaderboard Submission

To submit your agent results to the leaderboard, please follow the instructions in DV-World Submission Guidelines.

✍️ Citation

If you find our work helpful, please cite as

@misc{meng2026dvworldbenchmarkingdatavisualization,
      title={DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios}, 
      author={Jinxiang Meng and Shaoping Huang and Fangyu Lei and Jingyu Guo and Haoxiang Liu and Jiahao Su and Sihan Wang and Yao Wang and Enrui Wang and Ye Yang and Hongze Chai and Jinming Lv and Anbang Yu and Huangjing Zhang and Yitong Zhang and Yiming Huang and Zeyao Ma and Shizhu He and Jun Zhao and Kang Liu},
      year={2026},
      eprint={2604.25914},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.25914}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
dv-evolution		dv-evolution
dv-interact		dv-interact
dv-sheet		dv-sheet
evaluation_suite		evaluation_suite
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking Data Visualization Agents in Real-World Scenarios

📰 News

Overview

🔍 Installation

🚀 Quick access DV-World Dataset

🚀 Quickstart

⚖️ Evaluation

⚠️ Platform Notes

📋 Leaderboard Submission

✍️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Data Visualization Agents in Real-World Scenarios

📰 News

Overview

🔍 Installation

🚀 Quick access DV-World Dataset

🚀 Quickstart

⚖️ Evaluation

⚠️ Platform Notes

📋 Leaderboard Submission

✍️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages