Lightweight Recurrent Cross-modal Encoder

Setup and Configurations

Environment

Install all the dependencies using conda environment by typing:

conda env create -f env.yaml
conda activate lrce
pip install 'git+https://github.com/katsura-jp/pytorch-cosine-annealing-with-warmup'

Dataset

MSVD-QA

Download the annotations and videos. Extract them into a single directory and place all of the videos under a folder named video. Download the idx-video-mapping.pkl and place it on the same directory. The dataset directory should look as follows:

MSVD-QA
├── idx-video-mapping.pkl
├── readme.txt
├── test_qa.json
├── train_qa.json
├── val_qa.json
└── video
    ├── 00jrXRMlZOY_0_10.avi
    ├── 02Z-kuB3IaM_2_13.avi
    ...
    └── zzit5b_-ukg_5_20.avi

MSRVTT-QA

Download the annotations and videos. Extract them into a single directory and place all of the videos under a folder named video. Download the idx-video-mapping.pkl and place it on the same directory. The dataset directory should look as follows:

MSRVTT-QA
├── category.txt
├── idx-video-mapping.pkl
├── readme.txt
├── test_qa.json
├── train_qa.json
├── val_qa.json
└── video
    ├── video0.mp4
    ├── video1000.mp4
    ...
    └── video9.mp4

TGIF-QA

Download the annotations and gifs from the official repo. Combine all the files into a single directory and restructure it as follows:

TGIF-QA
├── annotations
│   ├── README.md
│   ├── Test_action_question.csv
│   ├── Test_count_question.csv
│   ├── Test_frameqa_question.csv
│   ├── Test_transition_question.csv
│   ├── Total_action_question.csv
│   ├── Total_count_question.csv
│   ├── Total_frameqa_question.csv
│   ├── Total_transition_question.csv
│   ├── Train_action_question.csv
│   ├── Train_count_question.csv
│   ├── Train_frameqa_question.csv
│   └── Train_transition_question.csv
└── gifs
    ├── tumblr_ku4lzkM5fg1qa47qco1_250.gif
    ├── tumblr_ky2syrOMmW1qawjc8o1_250.gif
    ...
    └── tumblr_nrlo5nKKip1uz642so1_400.mp4

GPU

This code will utilize all of the GPU in your machine by default. To only use some of the GPUs, you can set the CUDA_VISIBLE_DEVICES variable in your environment. For example, to use only the first GPU, type:

export CUDA_VISIBLE_DEVICES=0

Feature Extractor

Download the pre-trained video swin transformer here. Then, place it under the pretrained_models directory of this project.

Performance

Training

To see all of the possible arguments and their explanation when performing training, you can type:

python train.py -h

We provided the arguments that we used to reproduce the reported performance in the paper as follows:

MSVD-QA

python train.py --dataset msvd-qa-oe \
--dataset-dir <path/to/dataset> --ckpt-interval 2 --batch-size 10 \
--epoch 8 --drop-out-rate 0.1 --lr 5e-5 --reg-strength 0.001 --num-workers 4 \
--use-cosine-scheduler --lr-restart-epoch 1 --lr-restart-mul 2 \
--lr-decay-factor 0.5 --lr-warm-up 0.1 --min-lr 1e-8 \
--temporal-scale 3 --eval-per-epoch 3

MSRVTT-QA

python train.py --dataset msrvtt-qa-oe \
--dataset-dir <path/to/dataset> --ckpt-interval 2 --batch-size 10 \
--epoch 7 --drop-out-rate 0.1 --lr 2e-5 --reg-strength 0.001 --num-workers 4 \
--use-cosine-scheduler --lr-restart-epoch 1 --lr-restart-mul 2 \
--lr-decay-factor 1 --lr-warm-up 0.05 --min-lr 1e-8 \
--temporal-scale 3 --eval-per-epoch 3

TGIF-FrameQA

python train.py --dataset tgif-frameqa \
--dataset-dir <path/to/dataset> --ckpt-interval 3 --batch-size 10 \
--epoch 15 --drop-out-rate 0.1 --lr 1e-4 --reg-strength 0.001 --num-workers 4 \
--use-cosine-scheduler --lr-restart-epoch 1 --lr-restart-mul 2 \
--lr-decay-factor 0.5 --lr-warm-up 0.1 --min-lr 1e-8 \
--temporal-scale 3 --eval-per-epoch 3

TGIF-Transition

python train.py --dataset tgif-transition \
--dataset-dir <path/to/dataset> --ckpt-interval 3 --batch-size 9 \
--epoch 5 --drop-out-rate 0.1 --lr 2e-5 --reg-strength 0.001 --num-workers 4 \
--use-cosine-scheduler --lr-restart-epoch 1 --lr-restart-mul 2 \
--lr-decay-factor 1 --lr-warm-up 0 --min-lr 1e-8 \
--temporal-scale 3 --eval-per-epoch 3

TGIF-Action

python train.py --dataset tgif-action \
--dataset-dir <path/to/dataset> --ckpt-interval 3 --batch-size 16 \
--epoch 10 --drop-out-rate 0.1 --lr 3e-5 --reg-strength 0.001 --num-workers 4 \
--use-cosine-scheduler --lr-restart-epoch 1 --lr-restart-mul 2 \
--lr-decay-factor 1 --lr-warm-up 0.1 --min-lr 1e-8 \
--temporal-scale 3 --eval-per-epoch 3

Note: We trained our models with 4 GPUs and utilize the ddp training strategy, so the results might vary when the models is trained under different number of GPUs due to the batch size.

Evaluation

To perform evaluation on a trained model, you can type in the following:

python eval.py --dataset <dataset/name> \
--dataset-dir <path/to/dataset> \
--batch-size 32 --num-workers 4 --temporal-scale 3 \
--model-path <path/to/model>

The dataset arguments can be:

msvd-qa-oe for MSVD-QA
msvrvtt-qa-oe for MSVD-QA
tgif-frameqa for TGIF-FrameQA
tgif-transition for TGIF-Transition
tgif-action for TGIF-Action

To get our reported performance, download our best training checkpoints here.

Citation

@article{Immanuel2023,
    author  = {S. A. Immanuel and C. Jeong},
    title   = {Lightweight recurrent cross-modal encoder for video question answering},
    journal = {Knowledge-Based Systems},
    volume  = {},
    number  = {},
    pages   = {},
    month   = {6},
    year    = {2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
configs		configs
lrce		lrce
pretrained_models		pretrained_models
runs		runs
LICENSE		LICENSE
args.py		args.py
calculate_flops.py		calculate_flops.py
constants.py		constants.py
env.yaml		env.yaml
eval.py		eval.py
parser.py		parser.py
readme.md		readme.md
train.py		train.py
train_ddp.py		train_ddp.py
utils.py		utils.py

License

Sejong-VLI/VQA-LRCE-KBS-2023

Folders and files

Latest commit

History

Repository files navigation

Lightweight Recurrent Cross-modal Encoder

Setup and Configurations

Environment

Dataset

MSVD-QA

MSRVTT-QA

TGIF-QA

GPU

Feature Extractor

Performance

Training

Evaluation

Citation

Credits

About

Resources

License

Stars

Watchers

Forks

Languages