Video question answering dataset visualization

中文博客地址link

Load and visualize different datasets in video question answering.

Project structure
Dataset structure
Install
Usage
Dataset
Result

Project structure

$videoqa_dataset_visualization
    |──results              # result image
        |──msvd.png
        |──msrvtt.png
        |──tgif.png
        |──ui.png
    |──mainwindow.py        # ui file based on PyQt5
    |──mainwindow.ui        # can be modified with Qt Creator
    |──msvd_visualize.py    # msvd-qa dataset visualization
    |──msrvtt_visualize.py  # msrvtt-qa dataset visualization
    |──tgif_visualize.py    # tgif-qa dataset visualization
    |──ui.py                # use GUI to visualization above datasets
    |──vqa.png              # material picture
    |──requirements.txt
    |──README.md

Dataset structure

$all_dataset_path
    |──MSVD-QA
        |──video
            |──vid1.avi
            |── ...
            |──vid1970.avi
        |──train_qa.json
        |──val_qa.json
        |──test_qa.json
    |──MSRVTT-QA
        |──video
            |──video0.mp4
            |── ...
            |──video9999.mp4
        |──train_qa.json
        |──val_qa.json
        |──test_qa.json
    |──tgif-qa
        |──gifs
            |──tumblr_lgkgapWO4E1qbjyiko1_500.gif
            |── ...
        |──Test_action_question.csv
        |──Test_count_question.csv
        |──Test_frameqa_question.csv
        |──Test_transition_question.csv
        |──Total_action_question.csv
        |──Total_count_question.csv
        |──Total_frameqa_question.csv
        |──Total_transition_question.csv
        |──Train_action_question.csv
        |──Train_count_question.csv
        |──Train_frameqa_question.csv
        |──Train_transition_question.csv

Install

Clone the project

git clone https://github.com/Horizon2333/videoqa_dataset_visualization
cd videoqa_dataset_visualization

Install dependencies

pip install -r requirements.txt

Usage

Visualize individual dataset with matplotlib :

python msvd_visualize.py --path {your msvd-qa dataset path such as F:/Dataset/MSVD-QA}
python msrvtt_visualize.py --path {your msrvtt-qa dataset path such as F:/Dataset/MSRVTT-QA}
python tgif_visualize.py --path {your tgif-qa dataset path such as F:/Dataset/tgif-qa}

Use GUI to visualize different dataset:

python ui.py

Dataset

MSVD-QA

Dataset size: ~1.7G

Download links

MSVD Official link , MSVD video link (have a better name format), MSVD-QA annotation link

Annotation

Annotation format : dict stored in json file.

Loading annotation:

import json

with open("val_qa.json") as f:
    annotation = json.load(f)

The example of one annotation is like:

>>> annotation[0]
{'answer': 'someone', 'id': 30933, 'question': 'who pours liquid from a plastic container into a ziploc bag containing meat pieces?', 'video_id': 1201}

Get video name, question and answer:

video_name = 'vid' + annotation[index]['video_id'] + '.avi' 
# video_name = 'vid1201.avi'
question = annotation[index]['question']                    
# question = 'who pours liquid from a plastic container into a ziploc bag containing meat pieces?'
answer = annotation[index]['answer']
# answer = 'someone'

Then we can use video name and video path to load videos with the help of opencv.

MSRVTT-QA

Dataset size: ~6.3G

Download links

MSRVTT video and annotation link , MSRVTT-QA annotation link

Annotation

There are only few differences between MSRVTT-QA dataset annotation and MSVD-QA dataset annotation.

Annotation format : dict stored in json file.

Loading annotation:

import json

with open("val_qa.json") as f:
    annotation = json.load(f)

The example of one annotation is like:

>>> annotation[0]
{'answer': 'couch', 'category_id': 14, 'id': 158581, 'question': 'what are three people sitting on?', 'video_id': 6513}

Get video name, question and answer:

video_name = 'video' + annotation[index]['video_id'] + '.mp4' 
# video_name = 'video6513.mp4'
question = annotation[index]['question']                    
# question = 'what are three people sitting on?'
answer = annotation[index]['answer']
# answer = 'couch'

Then we can use video name and video path to load videos with the help of opencv.

tgif-qa

Dataset size: ~123G

Download links

tgif-qa gif and annotation link

Annotation

Dataset tgif-qa have 4 different types of QA pair, so the annotation format is also different.

Annotation format : array stored in csv file with delimiter \t.

Action

Loading annotation:

import numpy as np

tgif_test_action_annotation = np.loadtxt("Test_action_question.csv", dtype=str, delimiter='\t')

The first line of the csv is the content of different columns:

>>> tgif_test_action_annotation[0]
array(['gif_name', 'question', 'a1', 'a2', 'a3', 'a4', 'a5', 'answer', 'vid_id', 'key'], dtype='<U73')

The above output means that action is a multi-choice type task.

The example of one annotation is like:

>>> tgif_test_action_annotation[1]
array(['tumblr_nk172bbdPI1u1lr18o1_250',
       'What does the butterfly do 10 or more than 10 times ?',
       'stuff marshmallow', 'holds a phone towards face', 'fall over',
       'talk', 'flap wings', '4', 'ACTION4', '26'], dtype='<U73')

Get gif name, question and answer:

gif_name = tgif_test_action_annotation[index][0] + '.gif' 
# video_name = 'tumblr_nk172bbdPI1u1lr18o1_250.gif'
question = tgif_test_action_annotation[index][1]                    
# question = 'What does the butterfly do 10 or more than 10 times ?'
multi_choice = tgif_test_action_annotation[index][2:7]
# multi_choice = array(['stuff marshmallow', 'holds a phone towards face', 'fall over', 'talk', 'flap wings'], dtype='<U73')
answer = tgif_test_action_annotation[index][7]
# answer = '4', means correct answer is 'flap wings'.

Count

Loading annotation:

import numpy as np

tgif_test_count_annotation = np.loadtxt("Test_count_question.csv", dtype=str, delimiter='\t')

The first line of the csv is the content of different columns:

>>> tgif_test_count_annotation[0]
array(['gif_name', 'question', 'answer', 'vid_id', 'key'], dtype='<U97')

The above output means that count is a open-ended type task.

The example of one annotation is like:

>>> tgif_test_count_annotation[1]
array(['tumblr_nezfs4uELd1u1a7cmo1_250',
       'How many times does the man adjust waistband ?', '3', 'COUNT12',
       '52'], dtype='<U97')

Get gif name, question and answer:

gif_name = tgif_test_count_annotation[index][0] + '.gif' 
# video_name = 'tumblr_nezfs4uELd1u1a7cmo1_250.gif'
question = tgif_test_count_annotation[index][1]                    
# question = 'How many times does the man adjust waistband ?'
answer = tgif_test_count_annotation[index][2]
# answer = '3'

Other

Type frameqa is like type count. Type transition is like type action.

Results

Visualize MSVD-QA:

Visualize MSRVTT-QA:

Visualize tgif-qa:

Visualize GUI:

If there are something wrong with my code or any questions, please tell me, thanks a lot!

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
results		results
LICENSE		LICENSE
README.md		README.md
mainwindow.py		mainwindow.py
mainwindow.ui		mainwindow.ui
msrvtt_visualize.py		msrvtt_visualize.py
msvd_visualize.py		msvd_visualize.py
requirements.txt		requirements.txt
tgif_visualize.py		tgif_visualize.py
ui.py		ui.py
vqa.png		vqa.png

License

Horizon2333/videoqa_dataset_visualization

Folders and files

Latest commit

History

Repository files navigation

Video question answering dataset visualization

Project structure

Dataset structure

Install

Usage

Dataset

MSVD-QA

Download links

Annotation

MSRVTT-QA

Download links

Annotation

tgif-qa

Download links

Annotation

Action

Count

Other

Results

About

Resources

License

Stars

Watchers

Forks

Languages