Skip to content

HKUST-LongGroup/CoMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

arXiv Static Badge

CoMM is a high-quality dataset designed to improve the coherence, consistency, and alignment of multimodal content. It sources raw data from diverse origins, focusing on instructional content and visual storytelling to establish a strong foundation. data comparison

🔔 News

Dataset

  • Download the dataset from Google Drive.
  • Unzip the downloaded file and put three split data to ./datasets.
  • Use the following command to download the images of the dataset: bash scripts/download_images.sh

Environment Setup

conda create -n comm python=3.8
conda activate comm
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

Evaluation

The format of the prediction results is shown in eval/example. And we provide the evaluation scripts for the four tasks in the CoMM dataset:

cd eval

results_path="/path/to/predict_results"
model_type="your model_name"

# Task1  Image-to-Text Sequence Generation
python -u eval_metric.py --predict_results_path ${results_path} --model_type ${model_type} --task_type task1 
python -u cal_gpt4o_score.py --predict_results_path ${results_path} --model_type ${model_type} --task_type task1 

# Task2  Text-to-Image Sequence Generation
python -u eval_metric.py --predict_results_path ${results_path} --model_type ${model_type} --task_type task2 
python -u cal_gpt4o_score.py --predict_results_path ${results_path} --model_type ${model_type} --task_type task2 

# Task3  Interleaved Image-Text Content Continuation
python -u eval_metric.py --predict_results_path ${results_path} --model_type ${model_type} --task_type task3
python -u cal_gpt4o_score.py --predict_results_path ${results_path} --model_type ${model_type} --task_type task3

# Task4   Question-based Interleaved Image-Text Generation
python -u eval_metric.py --predict_results_path ${results_path} --model_type ${model_type} --task_type task4
python -u cal_gpt4o_score.py --predict_results_path ${results_path} --model_type ${model_type} --task_type task4

TODO

  • Release the training and inference code
    • Emu2
    • SEED
    • MiniGPT5

Citation

If you find this dataset useful, please cite our paper:

@article{chen2024comm,
  title={CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation},
  author={Chen, Wei and Li, Lin and Yang, Yongqi and Wen, Bin and Yang, Fan and Gao, Tingting and Wu, Yu and Chen, Long},
  journal={arXiv preprint arXiv:2406.10462},
  year={2024}
}

About

Official repository for CoMM Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published