Skip to content
/ FunQA Public

FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, and beyond.

License

Notifications You must be signed in to change notification settings

Nicous20/FunQA

Repository files navigation

paper page Dataset Youtube Bilibili

FunQA_Trailer_with_audio.mp4

Welcome to FunQA's Codebase Repository!

This repo provides the code for evaluating your model's output (json file).

Introducing FunQA

The motivation for the FunQA is straightforward: Humans enjoy surprising videos, including funny clips, creative performances, or visual illusions. We aim to evaluate and empower AI models with similar capabilities.

FunQA is a VideoQA dataset to evaluate and enhance the model's video reasoning capability upon counter-intuitive videos, including humorous and funny viral videos from TikTok, creative performance from Kasou Taishou (欽ちゃん&香取慎吾の全日本仮装大賞), and magic videos from YouTube and TikTok.

We establish rigorous QA tasks designed to assess the model's capability in counter-intuitive timestamp localization, detailed video description, and reasoning around counter-intuitiveness. We also pose higher-level tasks, such as attributing a fitting and vivid title to the video, and scoring the video creativity.

In total, the FunQA benchmark consists of 312K free-text QA pairs derived from 4.3K video clips, spanning a total of 24 video hours. Extensive experiments with existing VideoQA models reveal significant performance gaps for the FunQA videos across spatial-temporal reasoning, visual-centered reasoning, and free-text generation.

Updates

  • 16 June, 2023: 💥💥 The FunQA challenge with $1M prize starts! At the same time, we released the evaluation code.

Todo

  1. Release the FunQA dataset and arXiv paper.
  2. Release evaluation code.
  3. Release the FunQA Extended dataset.

Table of Contents

1 - FunQA Benchmark

1.1 - FunQA Main Tasks

FunQA comprises three subsets of surprising videos: 1) HumorQA, 2) CreativeQA, and 3) MagicQA. Each subset is associated with three common tasks: 1) counter-intuitive timestamp localization, 2) detailed video description, and 3) reasoning around counter-intuitiveness (see H1-3, C1-3, and M1-3). Furthermore, we offer higher-level tasks tailored for each video type, such as attributing a fitting and vivid title for HumorQA and CreativeQA (see H4, C4), etc. img.png

1.2 - FunQA Extended Tasks

FunQA Multi-choice Dataset

FunQA Multi-choice Dataset is prepared to provide training and testing for arbitrary models, in this dataset our QA pairs are in the form of multiple choice, the answer is a word, phrase, or short sentence, and the type of questions are all descriptions. FunQA_MC.png

FunQA Dialog Dataset

Most of the current LLMs are in the form of dialogues. To cater to their data input, we produced the FunQA Dialog dataset, in which we used GPT-3.5 to convert QA pairs into recursive dialogues with added context. img_1.png

2 - Data Preparation

Please download all the videos and annotation files from here.

For FunQA Dataset: there are four zip files:

  • train.zip, val.zip, test.zip: Videos for training, validation and test.
  • FunQA_train.json, FunQA_val.json, FunQA_test.json: Annotation files for FunQA Base Dataset.

For FunQA Multi-choice Dataset:

  • Funqa_mcqa_v1.json: Annotation files for FunQA-MC Dataset.

3 - Evaluation

cd FunQA
conda create -n funqa python=3.10

# install bleurt
git clone https://github.com/google-research/bleurt.git
cd bleurt
pip install .


# download recommended checkpoint for bleurt

wget https://storage.googleapis.com/bleurt-oss-21/BLEURT-20.zip .
unzip BLEURT-20.zip

pip install -r requirements.txt
conda activate funqa

Please move archive bleurt/bleurt to bleurt/ Then edit and run ./scripts/run_classic_eval.sh and ./scripts/run_gpt4_eval.sh for evalution.

Acknowledgement

This study is supported by the Ministry of Education, Singapore, under its MOE AcRF Tier 2 (MOE-T2EP20221- 0012), NTU NAP, and under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

If you're using FunQA in your research or applications, please cite using this BibTeX:

  @misc{xie2024funqasurprisingvideocomprehension,
        title={FunQA: Towards Surprising Video Comprehension}, 
        author={Binzhu Xie and Sicheng Zhang and Zitang Zhou and Bo Li and Yuanhan Zhang and Jack Hessel and Jingkang Yang and Ziwei Liu},
        year={2024},
        eprint={2306.14899},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2306.14899}, 
  }

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Looking forward to your feedback and please raise any issues or questions here.

About

FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, and beyond.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published