Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Modality Bias in TVQA

The official github repository for the paper "On Modality Bias in the TVQA Dataset"


Our framework is built and adapted from the official TVQA repository. This repository includes access to the original dataset, the official website, the submission leaderboard and other projects, including TVQA+.

Modality Data Subsets:

Using the IEM inclusion-exclusion measure in our paper, we propose subsets that respond to a mixture of modalities and features.

Using our framework:

The essence of our framework can be used for any video-QA dataset with appropriate features. You'll have to adapt at least the dataloader and model classes to fit your new dataset. They function almost identically to the baseline TVQA classes, with added functionality. You may find it helpful to replicate our TVQA experiments first:

  1. git clone`
  2. pip install -r requirements.txt
  3. Now assemble the dataset to run:
  4. Install the pytorch block fusion package, and place it in this directory. You will need to edit imports in the model/ file to accomodate this fusion package for bilinear pooling.


Questions, Answers, Subtitles and ImageNet:

Clone the TVQA github repository and follow steps 1, 2 and 3 for data extraction. This will give you the processed json files for the validation and training set. The processed json files contain questions, answers and subtitles. ImageNet features are in an h5 file. The ImageNet file is large and will require a significant amount of memory to load into memory, but you can specify no core driver for loading for lazy reads to avoid this.

Visual Concepts:

Visual concepts are contained in det_visual_concepts_hq.pickle file.

Regional Features:

There are at most 20 regional features per frame, each 2048d, making this far too big to share. The original TVQA repository doesn't supply regional features or support them in the dataloader. We have implemented regional features seen in our paper under the name regional_topk (not regional).
You will need to follow the instruction here, and apply for the raw TVQA video frames, and extract them yourself.
Specifically, follow instructions from here. Once you have set up this repository, add our tools/ from our repository to the bottom-up-attention/tools/ directory. Adapt this file to your raw video file location and run, extracting an h5 file for the entire dataset of frames (In our scripts we have called our regional file 100.h5). It will take a while, but our generation script should help a lot, and shows you the exact structure our dataloader will expect form the h5 file.

See our example_data_directory as a guideline.


Scripts to run our experiments after data is collected, edit the relevant dataset and import paths in the main, config, utils and tvqa_dataset files to suit your repository structure and run these scripts.


Some tools used in our experiments for visualisation and convenience.


Published at BMVC 2020

title={On Modality Bias in the TVQA Dataset},
author={Winterbottom, T. and Xiao, S. and McLean, A. and Al Moubayed, N.},
booktitle={Proceedings of the British Machine Vision Conference ({BMVC})},


Feel free to contact me @ if you have any criticisms you'd like me to hear out or would like any help


Official github repository for "On Modality Bias in the TVQA Dataset" 2020 BMVC paper







No releases published


No packages published