GitHub - June01/icip19-tad: Implementation of Exploring Feature Representation and Training strategies in Temporal Action Localization

Exploring Feature Representation and Training strategies in Temporal Action Localization

This repo holds the codes and models for the temporal action localization framework presented on ICIP 2019.

Exploring Feature Representation and Training strategies in Temporal Action Localization Tingting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis Patras, ICIP 2019

[Arxiv Preprint]

If you find this helps your research, please cite:

@article{xie2019exploring,
  title={Exploring Feature Representation and Training strategies in Temporal Action Localization},
  author={Xie, Tingting and Yang, Xiaoshan and Zhang, Tianzhu and Xu, Changsheng and Patras, Ioannis},
  journal={arXiv preprint arXiv:1905.10608},
  year={2019}
}

In this paper, unit-level two-stream feature was using in thumos14 dataset. The RGB feature could be downloaded here: val set, test set; the denseflow features can be downloaded here: val set, test set. Note that, val set is used for training, as the train set for THUMOS-14 does not contain untrimmed videos.

Get the code

The training and testing in the work is implemented in Tensorflow for ease of use. We need the following software mainly to run it.

Python3
Tensorflow1.14

GPUs are required for running this code. Usually 1 GPU and 3~4GB of the memory would ensure a smooth training experience.

Then clone this repo with git.

git clone git@github.com:June01/icip19-tad.git

Note: Before running the code, please remember to change the path of the features(named byself.prefix) in config.py.

Proposals

The test action proposals are provided in ./props/test_proposals_from_TURN.txt. If you want to generate your own proposals, please go to TURN repository. Also, in this paper we report the performance according to different Average Number(AN) proposals, which are also provided in ./props/.

Train the network

In the original paper, we train the network with the following command.

python main.py --pool_level=k --fusion_type=fusion_type

k is the granularity we used to divide each proposal into units. Mostly, we usek=5 by default. fusion_type represents the way we deal with two-stream features, such as RGB, Flow, early fusion. As to the late fusion, please turn to postprocessing.

Note: All the results in the paper was reported on THUMOS14 evaluation 2014. However, there is another one THUMOS14 evaluation 2015, which is not obviously stated on the website even though it should have been done years ago. (We figured out the differences between these two evaluation codes, please file an issue if any explanation about it needed.) Based on the new evaluation metric, we make some changes during training, you can train your own model with the following command. Also, the results on it could be found in the next section.

python main.py --pool_level=k --fusion_type=fusion_type --dropout=True --opm_type='adam_wd' --l1_loss=True

Use reference models for evaluation

We provide the pretrained reference models in tensorflow ckpt format, which could be downloaded here. And the results correspond to each model could be found here.

First, you need to get the detection scores for all proposals by running:

python main.py --pool_level=k --fusion_type=fusion_type  --mode=test --cas_step=3 --test_model_path=MODEL_PATH

Postprocessing

Then, the result pickle file PKL_FILE will be saved in ./eval/test_results/, and it could be used to compute the class it belongs to and the corresponding offsets.

python gen_prop_outputs.py PKL_FILE_1 PKL_FILE_2 T

For rgb, flow and early fusion results, PKL_FILE_1 and PKL_FILE_2 should be set the same; while for late fusion, PKL_FILE_1 should be set to be the rgb pkl file and PKL_FILE_2 should be set to be the flow pkl file. After this step, you may get the FUSION_PKL_FILE. Note:T=1 should be set to the baseline method in the paper and T=3 to the improved version.

Finally, NMS is used to supppress the redundant proposals. The final predicted actions list will be saved in ./eval/after_postprocessing/.

python postproc.py FUSION_PKL_FILE 0.5

Temporal Action Detection Performance on THUMOS14

The mAP@0.5 performance of the baseline model we provide is 44.85% under the evaluation method 2014. Based on evaluation method 2015, we also report some important results on it as follows, which is also comparable with the state-of-the-art 36.9%.

Table 1: mAP@tIoU(%) with different k(cascade step = 3)

|      mAP@IoU (%)    |  0.3  |  0.4  |  0.5  |  0.6  |  0.7  |
---------------------------------------------------------------
| STPP(L=3)           | 52.08 | 45.11 | 35.32 | 23.62 | 11.61 |
| BSP(2/4/2)          | 51.17 | 43.92 | 34.59 | 22.02 | 10.94 |
| Ours(k=1)           | 46.69 | 40.48 | 31.23 | 19.95 | 9.78  |
| Ours(k=2)           | 50.20 | 43.67 | 34.31 | 23.77 | 10.83 |
| Ours(k=5)           | 51.66 | 46.56 | 36.83 | 25.39 | 12.69 |
| Ours(k=10)          | 52.49 | 46.58 | 37.37 | 24.54 | 12.43 |
---------------------------------------------------------------

Table 2: mAP@tIoU (%) with different fusion methods(k=5).

|      mAP@IoU (%)    |  0.3  |  0.4  |  0.5  |  0.6  |  0.7  |
---------------------------------------------------------------
| RGB                 | 39.07 | 33.67 | 23.55 | 13.15 | 5.70  |
| Flow                | 47.12 | 42.05 | 33.80 | 22.89 | 12.13 |
| Early Fusion        | 51.66 | 46.56 | 36.83 | 25.39 | 12.69 |
| Late Fusion         | 49.77 | 44.45 | 34.98 | 21.33 | 10.36 |
---------------------------------------------------------------

Other Info

Related project

Anet-2016: The two-stream based feature extractor used in this paper.
CBR: The foundmental network we based on.
TURN-TAP: The first stage proposals generated from.

Contact

For any question, please file an issue or contact

Tingting Xie: t.xie@qmul.ac.uk

Also, I would like to thank Yu-le Li and Christos Tzelepis for his valuable suggestions and discussions both in this project and the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
eval		eval
props		props
util		util
.gitignore		.gitignore
README.md		README.md
background_samples.txt		background_samples.txt
cbr_model.py		cbr_model.py
config.py		config.py
dataset.py		dataset.py
main.py		main.py
thumos14_video_length_test.txt		thumos14_video_length_test.txt
thumos14_video_length_val.txt		thumos14_video_length_val.txt
tools.py		tools.py
val_training_samples.txt		val_training_samples.txt
val_training_samples_newlabel.txt		val_training_samples_newlabel.txt
vs_multilayer.py		vs_multilayer.py

June01/icip19-tad

Folders and files

Latest commit

History

Repository files navigation

Exploring Feature Representation and Training strategies in Temporal Action Localization

Contents

Usage Guide

Download data

Get the code

Proposals

Train the network

Use reference models for evaluation

Postprocessing

Temporal Action Detection Performance on THUMOS14

Table 1: mAP@tIoU(%) with different k(cascade step = 3)

Table 2: mAP@tIoU (%) with different fusion methods(k=5).

Other Info

Related project

Contact

About

Resources

Stars

Watchers

Forks

Languages