VLG: General Video Recognition with Web Textual Knowledge

Usage

First, install PyTorch 1.7.1+, torchvision 0.8.2+ and other required packages as follows:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install mmcv==1.3.14
pip install decord
pip install git+https://github.com/ildoonet/pytorch-randaugment

Data Preparation

Kinetics-Close/Kinetics-LT

Download the Kinetics videos from here.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/kinetics400/
  videos_train/
    vid1.mp4
    ...
  videos_val/
    vid2.mp4
    ...
  wiki/
    desc_0.txt
    ...
  k400_LT_train_videos.txt
  k400_LT_val_videos.txt
  kinetics_video_train_list.txt
  kinetics_video_val_list.txt
  labels.txt

Kinetics-Fewshot

We used the split from CMN for Kinetics-Fewshot.

Download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/kinetics100_base
  wiki/
    desc_0.txt
    ...
  k100_base_train_list.txt
  labels.txt
./data/kinetics100_test
  wiki/
    desc_0.txt
    ...
  k100_support_query_list.txt
  labels.txt

Kinetics-Fewshot-C-way

we used the split from Efficient-Prompt for Kinetics-Fewshot-C-way.

Download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/kinetics400_fewshot_C
  wiki/
    desc_0.txt
    ...
  k400_fewshot_c_train_split_0.txt
  k400_fewshot_c_train_split_1.txt
  ...
  k400_fewshot_c_train_split_9.txt
  kinetics_video_val_list.txt
  labels.txt

Kinetics-Openset

Download the split from here for Kinetics-Openset.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/kinetics400_openset
  wiki/
    desc_0.txt
    ...
  k400_openset_train_list.txt
  k400_openset_val_list.txt
  labels.txt

Evaluation

To evaluate VLG, you can run:

Pre-training stage:

bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval --eval-pretrain

Fine-tuning stage:

bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval

For fewshot cases, you can run:

bash dist_train_arun_fewshot.sh ${CONFIG_PATH} 8

For openset cases, you can run:

bash dist_train_arun_openset.sh ${CONFIG_PATH} 8 --test --dist-eval --eval

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Training

To train VLG on a single node with 8 GPUs for:

Pre-training stage, run:

bash dist_train_arun.sh ${CONFIG_PATH} 8

Fine-tuning stage:

First, select the salient sentences by running this:

bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval --select

Then, running this:

bash dist_train_arun.sh ${CONFIG_PATH} 8

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Pretrained Models:

The checkpoints are provided in Baidu Netdisk, and the corresponding code is nc6e.

Citation

If you are interested in our work, please cite as follows:

@article{lin2022vlg,
  title={VLG: General Video Recognition with Web Textual Knowledge},
  author={Lin, Jintao and Liu, Zhaoyang and Wang, Wenhai and Wu, Wayne and Wang, Limin},
  journal={arXiv preprint arXiv:2212.01638},
  year={2022}
}

Acknowledge

This repo contains modified codes from: VL-LTR, ActionCLIP, and OpenMax.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
mcloader		mcloader
models		models
.gitignore		.gitignore
README.md		README.md
datasets.py		datasets.py
dist_train_arun.sh		dist_train_arun.sh
dist_train_arun_fewshot.sh		dist_train_arun_fewshot.sh
dist_train_arun_openset.sh		dist_train_arun_openset.sh
dist_train_srun.sh		dist_train_srun.sh
engine.py		engine.py
engine_openmax.py		engine_openmax.py
fewshot.py		fewshot.py
fewshot_datasets.py		fewshot_datasets.py
fewshot_linear_probe.py		fewshot_linear_probe.py
hubconf.py		hubconf.py
losses.py		losses.py
main_dist.py		main_dist.py
mixup.py		mixup.py
openmax.py		openmax.py
openset.py		openset.py
optim_factory.py		optim_factory.py
requirements.txt		requirements.txt
samplers.py		samplers.py
solver.py		solver.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLG: General Video Recognition with Web Textual Knowledge

Usage

Data Preparation

Kinetics-Close/Kinetics-LT

Kinetics-Fewshot

Kinetics-Fewshot-C-way

Kinetics-Openset

Evaluation

Training

Pretrained Models:

Citation

Acknowledge

About

Releases 1

Packages

Languages

MCG-NJU/VLG

Folders and files

Latest commit

History

Repository files navigation

VLG: General Video Recognition with Web Textual Knowledge

Usage

Data Preparation

Kinetics-Close/Kinetics-LT

Kinetics-Fewshot

Kinetics-Fewshot-C-way

Kinetics-Openset

Evaluation

Training

Pretrained Models:

Citation

Acknowledge

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages