- 2026-04-28: 🔥 We release the DV-World dataset and the paper.
DV-Sheet focuses on native spreadsheet visualization workflows. Instead of generating standalone plotting code, an agent must directly manipulate spreadsheet workbooks to create charts, repair broken visualizations, and assemble dashboards under realistic software constraints.
DV-Evolution targets cross-modal and cross-framework visualization adaptation. Given a reference visual artifact, a new dataset, and modification requirements, the agent must infer the original visual semantics and produce an updated executable visualization in a target framework such as Python, D3.js, Plotly.js, Vega-Lite, or Apache ECharts.
DV-Interact evaluates proactive clarification and intent alignment in ambiguous visualization tasks. The agent operates in a stateful environment and interacts with a user simulator, testing whether it can ask appropriate questions, resolve ambiguity through interaction, and avoid assumption-first execution.
Set up the environment using the following commands:
conda create -n dvworld python=3.12
conda activate dvworld
pip install -r requirements.txt
The dataset is hosted at:
https://huggingface.co/datasets/DV-World/dvworld
After downloading, place the files into the corresponding gold and tasks folders:
dv-evolution/goldanddv-evolution/tasksdv-interact/goldanddv-interact/tasksdv-sheet/goldanddv-sheet/tasks
Each task family has its own baseline runner:
dv-evolution/dvworld-agent-evolutiondv-interact/dvworld-agent-interactdv-sheet/dvworld_agent_sheet
The typical workflow is:
- Download the dataset into the task-specific
goldandtasksfolders. - Configure the model in
dvworld_agent_fcmode/agent/config.pyinside the corresponding agent directory. - Run the agent with
run.py. - Convert raw outputs into evaluation format with
get_results.py. - Evaluate the converted results with the matching script in
evaluation_suite.
Agent-specific usage guides:
- dv-evolution/dvworld-agent-evolution/README.md
- dv-interact/dvworld-agent-interact/README.md
- dv-sheet/dvworld_agent_sheet/README.md
Evaluation is organized by task family inside evaluation_suite.
Converted candidate outputs are expected under:
evaluation_suite/results/<run_name>Evaluation outputs are written to:
evaluation_suite/model_score/<run_name>Task-specific evaluators:
evaluation_suite/dv_evolution/run_eval.pyevaluation_suite/dv_interact/run_eval.pyevaluation_suite/dvsheet_create/run_eval.pyevaluation_suite/dvsheet_dashboards/run_eval.pyevaluation_suite/dvsheet_fix/run_eval.py
Evaluation guides:
- evaluation_suite/dv_evolution/README.md
- evaluation_suite/dv_interact/README.md
- evaluation_suite/dvsheet_create/README.md
- evaluation_suite/dvsheet_dashboards/README.md
- evaluation_suite/dvsheet_fix/README.md
DV-EvolutionandDV-Interactcan be run in a standard Python environment.DV-Sheetevaluation should be run on Windows.- In particular,
dvsheet-create,dvsheet-dashboards, anddvsheet-fixrely on Excel-related workflows during evaluation.
To submit your agent results to the leaderboard, please follow the instructions in DV-World Submission Guidelines.
If you find our work helpful, please cite as
@misc{meng2026dvworldbenchmarkingdatavisualization,
title={DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios},
author={Jinxiang Meng and Shaoping Huang and Fangyu Lei and Jingyu Guo and Haoxiang Liu and Jiahao Su and Sihan Wang and Yao Wang and Enrui Wang and Ye Yang and Hongze Chai and Jinming Lv and Anbang Yu and Huangjing Zhang and Yitong Zhang and Yiming Huang and Zeyao Ma and Shizhu He and Jun Zhao and Kang Liu},
year={2026},
eprint={2604.25914},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2604.25914},
}

