arXiv | Primary contact: Yitian Zhang
Comparisons between existing methods and our proposed Ample and Focal Network (AFNet). Most existing works reduce the redundancy in data at the beginning of the deep networks which leads to the loss of information. We propose a two-branch design which processes frames with different computational resources within the network and preserves all input information as well.
- python 3.7
- pytorch 1.7.0
- torchvision 0.9.0
Please follow the instruction of TSM to prepare the Something-Something V1/V2 dataset.
Here we provide the pretrained AF-MobileNetv3, AF-ResNet50, AF-ResNet101 on ImageNet and all the pretrained models on Something-Something V1 dataset.
Checkpoints are available through the link.
Model | Top-1 Acc. | GFLOPs |
---|---|---|
AF-MobileNetv3 | 72.09% | 0.2 |
AF-ResNet50 | 77.24% | 2.9 |
AF-ResNet101 | 78.36% | 5.0 |
Checkpoints and logs are available through the link.
Less is More:
Model | Frame | Top-1 Acc. | GFLOPs |
---|---|---|---|
TSN | 8 | 18.6% | 32.7 |
AFNet(RT=0.50) | 8 | 26.8% | 19.5 |
AFNet(RT=0.25) | 8 | 27.7% | 18.3 |
More is Less:
Model | Backbone | Frame | Top-1 Acc. | GFLOPs |
---|---|---|---|---|
TSM | ResNet50 | 8 | 45.6% | 32.7 |
AFNet-TSM(RT=0.4) | AF-ResNet50 | 12 | 49.0% | 27.9 |
AFNet-TSM(RT=0.8) | AF-ResNet50 | 12 | 49.9% | 31.7 |
AFNet-TSM(RT=0.4) | AF-MobileNetv3 | 12 | 45.3% | 2.2 |
AFNet-TSM(RT=0.8) | AF-MobileNetv3 | 12 | 45.9% | 2.3 |
AFNet-TSM(RT=0.4) | AF-ResNet101 | 12 | 49.8% | 42.1 |
AFNet-TSM(RT=0.4) | AF-ResNet101 | 12 | 50.1% | 48.9 |
- Specify the directory of datasets with
root_dataset
intrain_sth.sh
. - Please download pretrained backbone on ImageNet from Google Drive.
- Specify the directory of the downloaded backbone with
path_backbone
intrain_sth.sh
. - Specify the ratio of selected frames with
rt
and runbash train_sth.sh
.
Note that there is a small variance during evaluation because of Gumbel-Softmax and the testing results may not align with the numbers in our paper. We provide the logs in Tab 2 for verification.
- Specify the directory of datasets with
root_dataset
ineval_sth.sh
. - Please download pretrained models from Google Drive.
- Specify the directory of the pretrained model with
resume
ineval_sth.sh
. - Run
bash eval_sth.sh
.
If you find our code or paper useful for your research, please cite:
@article{zhang2022look,
title={Look More but Care Less in Video Recognition},
author={Zhang, Yitian and Bai, Yue and Wang, Huan and Xu, Yi and Fu, Yun},
journal={arXiv preprint arXiv:2211.09992},
year={2022}
}