Look More but Care Less in Video Recognition (NeurIPS 2022)

Comparisons between existing methods and our proposed Ample and Focal Network (AFNet). Most existing works reduce the redundancy in data at the beginning of the deep networks which leads to the loss of information. We propose a two-branch design which processes frames with different computational resources within the network and preserves all input information as well.

Requirements

python 3.7
pytorch 1.7.0
torchvision 0.9.0

Datasets

Please follow the instruction of TSM to prepare the Something-Something V1/V2 dataset.

Pretrained Models

Here we provide the pretrained AF-MobileNetv3, AF-ResNet50, AF-ResNet101 on ImageNet and all the pretrained models on Something-Something V1 dataset.

Results on ImageNet

Checkpoints are available through the link.

Model	Top-1 Acc.	GFLOPs
AF-MobileNetv3	72.09%	0.2
AF-ResNet50	77.24%	2.9
AF-ResNet101	78.36%	5.0

Results on Something-Something V1

Checkpoints and logs are available through the link.

Less is More:

Model	Frame	Top-1 Acc.	GFLOPs
TSN	8	18.6%	32.7
AFNet(RT=0.50)	8	26.8%	19.5
AFNet(RT=0.25)	8	27.7%	18.3

More is Less:

Model	Backbone	Frame	Top-1 Acc.	GFLOPs
TSM	ResNet50	8	45.6%	32.7
AFNet-TSM(RT=0.4)	AF-ResNet50	12	49.0%	27.9
AFNet-TSM(RT=0.8)	AF-ResNet50	12	49.9%	31.7
AFNet-TSM(RT=0.4)	AF-MobileNetv3	12	45.3%	2.2
AFNet-TSM(RT=0.8)	AF-MobileNetv3	12	45.9%	2.3
AFNet-TSM(RT=0.4)	AF-ResNet101	12	49.8%	42.1
AFNet-TSM(RT=0.4)	AF-ResNet101	12	50.1%	48.9

Training AFNet on Something-Something V1

Specify the directory of datasets with root_dataset in train_sth.sh.
Please download pretrained backbone on ImageNet from Google Drive.
Specify the directory of the downloaded backbone with path_backbone in train_sth.sh.
Specify the ratio of selected frames with rt and run bash train_sth.sh.

Evaluate pretrained models on Something-Something V1

Note that there is a small variance during evaluation because of Gumbel-Softmax and the testing results may not align with the numbers in our paper. We provide the logs in Tab 2 for verification.

Specify the directory of datasets with root_dataset in eval_sth.sh.
Please download pretrained models from Google Drive.
Specify the directory of the pretrained model with resume in eval_sth.sh.
Run bash eval_sth.sh.

Reference

If you find our code or paper useful for your research, please cite:

@article{zhang2022look,
  title={Look More but Care Less in Video Recognition},
  author={Zhang, Yitian and Bai, Yue and Wang, Huan and Xu, Yi and Fu, Yun},
  journal={arXiv preprint arXiv:2211.09992},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
fig		fig
ops		ops
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
eval_sth.sh		eval_sth.sh
main.py		main.py
opts.py		opts.py
train_sth.sh		train_sth.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Look More but Care Less in Video Recognition (NeurIPS 2022)

Requirements

Datasets

Pretrained Models

Results on ImageNet

Results on Something-Something V1

Training AFNet on Something-Something V1

Evaluate pretrained models on Something-Something V1

Reference

About

Releases

Packages

Languages

License

BeSpontaneous/AFNet-pytorch

Folders and files

Latest commit

History

Repository files navigation

Look More but Care Less in Video Recognition (NeurIPS 2022)

Requirements

Datasets

Pretrained Models

Results on ImageNet

Results on Something-Something V1

Training AFNet on Something-Something V1

Evaluate pretrained models on Something-Something V1

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages