AIGCBench

🎯🎯 AIGCBench is a novel and comprehensive benchmark designed for evaluating the capabilities of state-of-the-art video generation algorithms. Official code for the paper:

AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI, BenchCouncil Transactions on Benchmarks, Standards and Evaluations (TBench).

Fanda Fan, Chunjie Luo, Wanling Gao, Jianfeng Zhan

Illustration of our AIGCBench. Our AIGCBench is divided into three modules: the evaluation dataset, the evaluation metrics, and the video generation models to be assessed.

Key Features of AIGCBench:

Diverse Datasets: AIGCBench incorporates a variety of datasets, including real-world video-text pairs and image-text pairs, to ensure a broad and realistic evaluation spectrum. Additionally, it includes a newly generated dataset created through an innovative text-to-image generation pipeline, enhancing the diversity and representativeness of the benchmark.
Extensive Evaluation Metrics: AIGCBench introduces a set of evaluation metrics that cover four crucial dimensions of video generation—control-video alignment, motion effects, temporal consistency, and video quality. Our evaluation metrics encompass both reference video-based metrics and video-free metrics.
Validated by Human Judgment: The benchmark's evaluation criteria are thoroughly verified against human preferences to confirm their reliability and alignment with human judgments.
In-Depth Analysis: Through extensive evaluations, AIGCBench reveals insightful findings about the current strengths and limitations of existing I2V models, offering valuable guidance for future advancements in the field.
Future Expansion: AIGCBench is not only comprehensive and scalable in its current form but also designed with the vision to encompass a wider range of video generation tasks in the future. This will allow for a unified and in-depth benchmarking of various aspects of AI-generated content (AIGC), setting a new standard for the evaluation of video generation technologies.

🔥News

[01/24/2024] Our paper has been accepted by BenchCouncil Transactions on Benchmarks, Standards and Evaluations (Tbench)!
[01/10/2024] The evaluation dataset and evaluation code have been released.

Dataset

😄The Hugging Face link for our dataset.

This dataset is intended for the evaluation of video generation tasks. Our dataset includes image-text pairs and video-text pairs. The dataset comprises three parts:

Ours - A custom generation of image-text samples.
Webvid val - A subset of 1000 video samples from the WebVid val dataset.
Laion-aesthetics - A subset of LAION dataset that includes 925 curated image-text samples.

Below are some images we generated, with the corresponding text:

Image	Description
	Amidst the lush canopy of a deep jungle, a playful panda is brewing a potion, captured with the stark realism of a photo.
	Behold a noble king in the throes of skillfully strumming the guitar surrounded by the tranquil waters of a serene lake, envisioned in the style of an oil painting.
	Amidst a sun-dappled forest, a mischievous fairy is carefully repairing a broken robot, captured in the style of an oil painting.
	Within the realm of the backdrop of an alien planet's red skies, a treasure-seeking pirate cleverly solving a puzzle, each moment immortalized in the style of an oil painting.

Metrics

We have encapsulated the evaluation metrics used in our paper in eval.py; for more details, please refer to the paper. To use the code, please first download the clip model file and replace the 'path_to_dir' with the actual path.

Below is a simple example:

batch_video_path = os.path.join('path_to_videos', '*.mp4')
video_path_list = sorted(glob.glob(batch_video_path))

sum_res = 0
cnt = 0
for video_path in video_path_list:
    res = compute_video_video_similarity(ref_video_path, video_path)
    sum_res += res['clip']
    cnt += res["state"]
print(sum_res / cnt)

Evaluation Results

Quantitative analysis for different Image-to-Video algorithms. An upward arrow indicates that higher values are better, while a downward arrow means lower values are preferable.

Dimensions	Metrics	VideoCrafter	I2VGen-XL	SVD	Pika	Gen2
Control-video Alignment	MSE (First) ↓	3929.65	4491.90	640.75	155.30	235.53
	SSIM (First) ↑	0.300	0.354	0.612	0.800	0.803
	Image-GenVideo Clip ↑	0.830	0.832	0.919	0.930	0.939
	GenVideo-Text Clip ↑	0.23	0.24	-	0.271	0.270
	GenVideo-RefVideo CliP (Keyframes) ↑	0.763	0.764	-	0.824	0.820
Motion Effects	Flow-Square-Mean	1.24	1.80	2.52	0.281	1.18
	GenVideo-RefVideo CliP (Corresponding frames) ↑	0.764	0.764	0.796	0.823	0.818
Temporal Consistency	GenVideo Clip (Adjacent frames) ↑	0.980	0.971	0.974	0.996	0.995
	GenVideo-RefVideo CliP (Corresponding frames) ↑	0.764	0.764	0.796	0.823	0.818
Video Quality	Frame Count ↑	16	32	25	72	96
	DOVER ↑	0.518	0.510	0.623	0.715	0.775
	GenVideo-RefVideo SSIM ↑	0.367	0.304	0.507	0.560	0.504

To validate the alignment of our proposed evaluation standards with human preferences, we conducted a study. We randomly selected 30 generated results from each of the five methods. Then, we asked participants to vote on the best algorithm outcomes across four dimensions: Image Fidelity, Motion Effects, Temporal Consistency, and Video Quality. A total of 42 individuals participated in the voting process. The specific results of the study are presented below:

Contact Us

📧 If you have any questions, please feel free to contact us via email at fanfanda@ict.ac.cn and jianfengzhan.benchcouncil@gmail.com.

Citation

If you find our work useful in your research, please consider citing our paper:

@article{fan2024aigcbench,
  title={AIGCBench: Comprehensive evaluation of image-to-video content generated by AI},
  author={Fan, Fanda and Luo, Chunjie and Gao, Wanling and Zhan, Jianfeng},
  journal={BenchCouncil Transactions on Benchmarks, Standards and Evaluations},
  pages={100152},
  year={2024},
  publisher={Elsevier}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
metrics		metrics
source		source
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics

metrics

source

source

LICENSE

LICENSE

README.md

README.md

eval.py

eval.py

utils.py

utils.py

Repository files navigation

AIGCBench

🔥News

Dataset

Metrics

Evaluation Results

Contact Us

Citation

About

Releases

Packages

Languages

License

BenchCouncil/AIGCBench

Folders and files

Latest commit

History

Repository files navigation

AIGCBench

🔥News

Dataset

Metrics

Evaluation Results

Contact Us

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages