- [2024.06.13] π£ ChartMimic is released.
ChartMimic aims at assessing the visually-grounded code generation capabilities of large multimodal models (LMMs). ChartMimic utilizes information-intensive visual charts and textual instructions as inputs, requiring LMMs to generate the corresponding code for chart rendering.
ChartMimic includes 1,000 human-curated (figure, instruction, code) triplets, which represent the authentic chart use cases found in scientific papers across various domains(e.g., Physics, Computer Science, Economics, etc). These charts span 18 regular types and 4 advanced types, diversifying into 191 subcategories. Furthermore, we propose multi-level evaluation metrics to provide an automatic and thorough assessment of the output code and the rendered charts. Unlike existing code generation benchmarks, ChartMimic places emphasis on evaluating LMMs' capacity to harmonize a blend of cognitive capabilities, encompassing visual understanding, code generation, and cross-modal reasoning.
Click to expand the table of contents
Here we provide a quick start guide to evaluate LMMs on ChartMimic.
conda env create -f environment.yaml
conda activate chartmimic
Set up the environment variables in .env
file.
PROJECT_PATH=${YOUR_PROJECT_PATH}
OPENAI_BASE_URL=${YOUR_OPEN_AI_BASE_URL}
OPENAI_API_KEY=${YOUR_OPENAI_API_KEY}
ANTHROPIC_API_KEY=${YOUR_ANTHROPIC_API_KEY}
GOOGLE_API_KEY=${YOUR_ANTHROPIC_API_KEY}
You can download the whole evaluation data by running the following command:
cd ChartMimic # cd to the root directory of this repository
mkdir dataset
wget https://huggingface.co/datasets/ChartMimic/ChartMimic/resolve/main/test.tar.gz
tar -xzvf test.tar.gz -C dataset
Example script for gpt-4-vision-preview
on the Direct Mimic
task:
export PROJECT_PATH=${YOUR_PROJECT_PATH}
# Step 1: Get Model Reponse
bash scripts/direct_mimic/run_generation.sh
# Step 2: Run the Code in the Response
bash scripts/direct_mimic/run_code.sh
# Step 3: Get Lowlevel Score
bash scripts/direct_mimic/run_evaluation_lowlevel.sh
# Step 4: Get Highlevel Score
bash scripts/direct_mimic/run_evaluation_highlevel.sh
Example script for gpt-4-vision-preview
on the Customized Mimic
task:
export PROJECT_PATH=${YOUR_PROJECT_PATH}
# Step 1: Get Model Reponse
bash scripts/customized_mimic/run_generation.sh
# Step 2: Run the Code in the Response
bash scripts/customized_mimic/run_code.sh
# Step 3: Get Lowlevel Score
bash scripts/customized_mimic/run_evaluation_lowlevel.sh
# Step 4: Get Highlevel Score
bash scripts/customized_mimic/run_evaluation_highlevel.sh
We now offer configuration for 14 SOTA LMM models (gpt-4-vision-preview
, claude-3-opus-20240229
, gemini-pro-vision
, Phi-3-vision-128k-instruct
,MiniCPM-Llama3-V-2_5
,InternVL-Chat-V1-5
, cogvlm2-llama3-chat-19B
,deepseekvl
,llava-v1.6-mistral-7b-hf
,llava-v1.6-34b-hf
, idefics2-8b
, llava-v1.6-vicuna-13b-hf
,llava-v1.6-vicuna-7b-hf
and qwenvl
).
You can download the whole evaluation data by running the following command:
cd ChartMimic # cd to the root directory of this repository
mkdir dataset
wget https://huggingface.co/datasets/ChartMimic/ChartMimic/resolve/main/test.tar.gz
tar -xzvf test.tar.gz -C dataset
To help researchers quickly understand evaluation data, we provide Dataset Viewer at Huggingface Dataset: π€ ChartMimic.
The file structure of evaluation data is as follows:
.
βββ customized_500/ # Data for Customized Mimic
βββ ori_500/ # Data for Direct Mimic
βββ test.jsonl # Data for both tasks
If you find this repository useful, please consider giving star and citing our paper:
@article{
shi2024chartmimic,
title={ChartMimic: Evaluating LMMβs Cross-Modal Reasoning Capability via Chart-to-Code Generation},
author={Chufan Shi and Cheng Yang and Yaxin Liu and Bo Shui and Junjie Wang and Mohan Jing and Linran Xu and Xinyu Zhu and Siheng Li and Yuxiang Zhang and Gongye Liu and Xiaomei Nie and Deng Cai and Yujiu Yang},
year={2024},
journal={arXiv preprint arXiv:2406.09961},
}
The ChartMimic data and codebase is licensed under a Apache-2.0 License.
We would like to express our gratitude to agentboard for their project codebase.