Skip to content

BoPang1996/Semi-Coupled-Structure-for-visual-sequental-tasks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SCS Framework

Source Code

Usage

  1. Dependencies:

    • Python 2.7
    • Pytorch >= 0.4
    • torchvision
    • Numpy
    • Pillow
    • tqdm
  2. Download Kinetics-400 from the official website or from the copy of facebookresearch/video-nonlocal-net, and organize the image files (from the videos) the same as UCF101 and HMDB:

    Dataset
    ├── train_frames
    │   ├── action0
    │   │   ├── video0
    |   |   |   ├── frame0
    ├── test_frames
    
  3. Extract optical flow of the original RGB frames. Note that the stride between the two RGB frames used to extract optical flow need to be the same with the original inputs. The optical_flow only need two channel (h and v), but we still save it as jpg padding the third channel to 0. Store the optical flows in train_ofs:

     Dataset
    ├── train_frames
    │   ├── action0
    │   │   ├── video0
    |   |   |   ├── frame0
    ├── train_ofs
    │   ├── action0
    │   │   ├── video0
    |   |   |   ├── frame0
    ├── test_frames
    
  4. In this standalone model, we only commit the action recognition task:

    a. Run the following command to train.

    # start from scratch
    python main.py --train 
    
    # start from our pre-trained model
    python main.py --model_path [path_to_model] --model_name [model's name] --resume --train
    

    b. Run the following command to test.

    python main.py --test
    
  5. Action recognition results on standalone RNN models:

    Architecture Kinetics UCF-101 HMDB-51
    Shallow LSTM with Backbone 53.9 86.8 49.7
    C3D 56.1 79.9 49.4
    Two-Stream 62.8 93.8 64.3
    3D-Fused 62.3 91.5 66.5
    Deep RBM without Backbone 60.2 91.9 61.7

Demo

  1. Dependencies:

    • Python 3.5
    • Pytorch >= 1.1.0
    • torchvision
    • Numpy
    • Pillow
    • tqdm
    • PyQt5
  2. Usage

    1. Download the pre-trained model from Google Drive and put it into Demo/Code/
    2. Run the demo by:
    python main.py
    
    1. After get the main window :

      1570264032384

      1. Click "Select Image":

        1570264153936

      2. Click "Choose Object" and drag out a bbox for the target

        1570264192044

      3. Click "Annotate" and wait for a moment (5s on i7 CPU):

        1570264415370

        (Because we automatically assign the left-bottom corner as the start point, the result may be not so good in some specific scenes.)

About

Code and demo of paper “Complex Sequential Understanding through the Awareness of Spatial and Temporal Concepts” submitted to Nature Machine Intelligence

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages