VSGN

This repo holds the codes of paper: "Video Self-Stitching Graph Network for Temporal Action Localization", accepted to ICCV 2021.

Updates

Aug. 15th: Code and pre-trained model on THUMOS14 are released.

VSGN Introduction

Temporal action localization (TAL) in videos is a challenging task, especially due to the large variation in action temporal scales. Short actions usually occupy a major proportion in the datasets, but tend to have the lowest performance. In this paper, we confront the challenge of short actions and propose a multi-level cross-scale solution dubbed as video self-stitching graph network (VSGN). We have two key components in VSGN: video self-stitching (VSS) and cross-scale graph pyramid network (xGPN). In VSS, we focus on a short period of a video and magnify it along the temporal dimension to obtain a larger scale. We stitch the original clip and its magnified counterpart in one input sequence to take advantage of the complementary properties of both scales. The xGPN component further exploits the cross-scale correlations by a pyramid of cross-scale graph networks, each containing a hybrid module to aggregate features from across scales as well as within the same scale. Our VSGN not only enhances the feature representations, but also generates more positive anchors for short actions and more short training samples. Experiments demonstrate that VSGN obviously improves the localization performance of short actions as well as achieving the state-of-the-art overall performance on THUMOS-14 and ActivityNet-v1.3.

Project Architecture

An overview of the project architecture in repo is shown below.

    VSGN                            
    ├── Models/*                    # Network modules and losses
    ├── Utils/*                     # Data loading and hyper-parameters
    ├── Evaluation/*                # Post-processing and performance evaluation
    ├── DETAD/*                     # DETAD evaluation to generate performance for different action duration   
    ├── Cut_long_videos.py          # Cutting long videos      
    ├── Train.py                    # Training starts from here      
    ├── Infer.py                    # Inference starts from here    
    ├── Eval.py                     # Evaluation starts from here             
    └── ...

Functions for video cutting in VSS are in Cut_long_videos.py.
Functions for clip up-scaling and self-stitching in VSS are in Utils/dataset_thumos.py.
Network model is defined in Model/VSGN.py, with detailed implementation of of different modules in different files in Models.
Losses are defined in Models/Loss.py
We use pre-extracted video features. THUMOS14 features can be found here.

Pre-trained Models and Performance

THUMOS14

In the following table, we show the results in terms of mAP at different tIoU thresholds (0.3-0.7) as well as average mAP and mAP for short actions. The results are a bit different from the ones reported in the paper due to randomness.

Method	Model	0.3	0.4	0.5	0.6	0.7	Average	Short
VSGN	Pre-trained VSGN THUMOS14	67.92	61.09	52.99	41.78	29.24	56.99	56.5

Environment Installation

Create a conda environment and install required packages from scratch following the steps below

    conda create -n pytorch160 python=3.7 
    conda activate pytorch160   
    conda install pytorch=1.6.0 torchvision cudatoolkit=10.1.243 -c pytorch   
    conda install -c anaconda pandas    
    conda install -c anaconda h5py  
    conda install -c anaconda scipy 
    conda install -c conda-forge tensorboardx   
    conda install -c anaconda joblib    
    conda install -c conda-forge matplotlib 
    conda install -c conda-forge urllib3

OR you can create a conda environment from our env.yml file using the following command

    conda env create -f env.yml

Code and Data Preparation

Download the TSN features of the THUMOS14 dataset from here, and save it in [DATA_PATH].

Clone this repo with git

    git clone git@github.com:coolbay/VSGN.git

Run the Code

Prepare input by cutting videos

    python Cut_long_videos.py [--use_VSS]

Training

     python Train.py [--use_VSS] [--use_xGPN] --is_train true --dataset thumos --feature_path DATA_PATH  --checkpoint_path CHECKPOINT_PATH

Inference

     python Infer.py [--use_VSS] [--use_xGPN] --is_train false --dataset thumos --feature_path DATA_PATH --checkpoint_path CHECKPOINT_PATH  --output_path OUTPUT_PATH

Evaluation

     python Eval.py --dataset thumos --output_path OUTPUT_PATH

Run training / inference / evaluation in one command

    bash run_vsgn.sh traininfereval     # Run train, infer, and eval 
    bash run_vsgn.sh train              # Only run train
    bash run_vsgn.sh infer              # Only run infer
    bash run_vsgn.sh eval               # Only run eval
    bash run_vsgn.sh traininfer         # Run train and infer

Cite this paper

Please cite the following paper if this codebase is useful for your work.

  @inproceedings{zhao2021video,
    title={Video Self-Stitching Graph Network for Temporal Action Localization},
    author={Zhao, Chen and Thabet, Ali K and Ghanem, Bernard},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
    pages={13658--13667},
    year={2021}
  }

Acknowledgements

VSGN is built by referring to the implementation of G-TAD, BSN, ATSS and the description in PBRNet. If you use our model, please consider citing these works as well.

G-TAD: https://github.com/frostinassiky/gtad
DETAD: https://github.com/HumamAlwassel/DETAD
BSN: https://github.com/wzmsltw/BSN-boundary-sensitive-network.pytorch
ATSS: https://github.com/sfzhang15/ATSS
PBRNet: Qinying Liu, Zilei Wang, Progressive Boundary Refinement Network for Temporal Action Detection, AAAI'20.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VSGN

Updates

VSGN Introduction

Project Architecture

Pre-trained Models and Performance

THUMOS14

Environment Installation

Code and Data Preparation

Run the Code

Prepare input by cutting videos

Training

Inference

Evaluation

Run training / inference / evaluation in one command

Cite this paper

Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
DETAD		DETAD
Evaluation		Evaluation
Models		Models
Utils		Utils
.gitignore		.gitignore
Cut_long_videos.py		Cut_long_videos.py
Eval.py		Eval.py
Infer.py		Infer.py
LICENSE		LICENSE
README.md		README.md
Train.py		Train.py
VSGN_overview.png		VSGN_overview.png
env.yml		env.yml
hp.txt		hp.txt
run_vsgn.sh		run_vsgn.sh

License

coolbay/VSGN

Folders and files

Latest commit

History

Repository files navigation

VSGN

Updates

VSGN Introduction

Project Architecture

Pre-trained Models and Performance

THUMOS14

Environment Installation

Code and Data Preparation

Run the Code

Prepare input by cutting videos

Training

Inference

Evaluation

Run training / inference / evaluation in one command

Cite this paper

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages