Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition

Kiyoon Kim, Shreyank N Gowda, Oisin Mac Aodha, Laura Sevilla-Lara
In BMVC 2022. arXiv Presentation video

Installation

conda create -n videoai python=3.9
conda activate videoai
conda install pytorch==1.12.1 torchvision cudatoolkit=10.2 -c pytorch
### For RTX 30xx GPUs,
#conda install pytorch==1.12.1 torchvision cudatoolkit=11.3 -c pytorch
 

git clone --recurse-submodules https://github.com/kiyoon/channel_sampling
cd channel_sampling
git submodule update --recursive
cd submodules/video_datasets_api
pip install -e .
cd ../experiment_utils
pip install -e .
cd ../..
pip install -e .

Optional: Pillow-SIMD and libjepg-turbo to improve dataloading performance.
Run this at the end of the installation:

conda uninstall -y --force pillow pil jpeg libtiff libjpeg-turbo
pip   uninstall -y         pillow pil jpeg libtiff libjpeg-turbo
conda install -yc conda-forge libjpeg-turbo
CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
conda install -y jpeg libtiff

Getting started

Preparing the datasets

Something-Something-V1

Download the dataset and annotations. Rename the directories into frames and annotations, and put them in data/something-something-v1.
Generate splits.

conda activate videoai
python tools/datasets/generate_somethingv1_splits.py data/something-something-v1/splits_frames data/something-something-v1/annotations --root data/something-something-v1/frames --mode frames

Something-Something-V2

Download the dataset and annotations. Rename the directories into videos and annotations, and put them in data/something-something-v2.
Extract videos into frames of images, to folder data/something-something-v2/frames_q5.

submodules/video_datasets_api/tools/something-something-v2/extract_frames.sh data/something-something-v2/videos data/something-something-v2/frames_q5

Generate splits.

conda activate videoai
python tools/datasets/generate_somethingv2_splits.py data/something-something-v2/splits_frames data/something-something-v2/annotations data/something-something-v2/frames_q5 --mode frames

About the paper

Core implementation of reordering methods is in pyvideoai/utils/tc_reordering.py.
See exp_configs/ch_tcgrey for experiment settings.
For example, in order to run TSM model, GreyST method on the Something-Something-V1 dataset, you should run exp_configs/ch_tcgrey/something_v1/tsm_resnet50_nopartialbn-GreyST_8frame.py, the command of which would be:

# Run training
tools/run_singlenode.sh train 4 -R ~/experiment_root -D something_v1 -M tsm_resnet_nopartialbn -E GreyST_8frame -c:e tcgrey
# Run evaluation
tools/run_singlenode.sh eval 4 -R ~/experiment_root -D something_v1 -M tsm_resnet_nopartialbn -E GreyST_8frame -c:e tcgrey

Once you prepared the datasets, just modify the script below and run.

#!/bin/bash
exp_root="$HOME/experiments"  # Experiment results will be saved here.

export CUDA_VISIBLE_DEVICES=0
num_gpus=1

subfolder="test_run"           # Name subfolder as you like.

## Choose the dataset
dataset=something_v1
#dataset=something_v2
#dataset=cater_task2
#dataset=cater_task2_cameramotion

## Choose the model
model=tsn_resnet50
#model=trn_resnet50
#model=mtrn_resnet50
#model=tsm_resnet50_nopartialbn     # NOTE: use tsm_resnet50 for CATER experiments
#model=mvf_resnet50_nopartialbn     # NOTE: use mvf_resnet50 for CATER experiments

## Choose the sampling method.
## NOTE: Use 32 frame for CATER experiments.
exp_name="RGB_8frame"
#exp_name="TC_8frame"
#exp_name="TCPlus2_8frame"
#exp_name="GreyST_8frame"

# Training script
# -S creates a subdirectory in the name of your choice. (optional)
tools/run_singlenode.sh train $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e tcgrey -S "$subfolder" #--wandb_project kiyoon_kim_tcgrey

# Evaluating script
# -l -2 loads the best model
# -p saves the predictions. (optional)
tools/run_singlenode.sh eval $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e tcgrey -S "$subfolder" -l -2 -p #--wandb

Citing the paper

If you find our work or code useful, please cite:

@inproceedings{kim2022capturing,
  author    = {Kiyoon Kim and Shreyank N Gowda and Oisin Mac Aodha and Laura Sevilla-Lara},
  title     = {Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition},
  booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
  publisher = {{BMVA} Press},
  year      = {2022},
  url       = {https://bmvc2022.mpi-inf.mpg.de/0355.pdf}
}

Framework Used

This repository is a fork of PyVideoAI framework.
Learn how to use it with PyVideoAI-examples notebooks.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
dataset_configs		dataset_configs
exp_configs		exp_configs
model_configs		model_configs
pyvideoai		pyvideoai
submodules		submodules
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py
versioneer.py		versioneer.py

License

LeiWangR/channel_sampling

Folders and files

Latest commit

History

Repository files navigation

Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition

Installation

Getting started

Preparing the datasets

Something-Something-V1

Something-Something-V2

About the paper

Once you prepared the datasets, just modify the script below and run.

Citing the paper

Framework Used

About

Resources

License

Stars

Watchers

Forks

Languages