Code for "Predictive-Corrective Networks for Action Detection"
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
config
debug
experimental
layers
scripts
tests
util
video_util @ 4a3d4da
.gitignore
.gitmodules
LICENSE
README.md
__init__.py
data_loader.lua
data_source.lua
download.sh
dump_git_info.sh
evaluator.lua
last_step_criterion.lua
lmdb_data_source.lua
main.lua
notes.md
reproduce_experiment.sh
samplers.lua
trainer.lua

README.md

Predictive Corrective Networks for Action Detection

This is the source code for training and evaluating "Predictive-Corrective Networks".

Please file an issue if you run into any problems, or contact me.

Download models and data

To download models+data, run

bash download.sh

This will create a directory with the following structure

data/
    <dataset>/
        models/
            vgg16-init.t7: Initial VGG-16 model pre-trained on Imagenet.
            vgg16-trained.t7: Trained VGG-16 single-frame model.
            pc_c33-1_fc7-8.t7: Trained predictive-corrective model.
        labels/
            trainval.h5: Train labels
            test.h5: Test labels

Currently only models for THUMOS/MultiTHUMOS are included, but we will release Charades models as soon as possible.

Dumping frames

Before running on any videos, you will need to dump frames (resized to 256x256) into a root directory which contains one subdirectory for each video. Each video subdirectory should contain frames of the form frame%04d.png (e.g. frame0012.png), extracted at 10 frames per second. If you would like to train or evaluate models at different frame rates, please file an issue or contact me and I can point you in the right direction.

You may find my dump_frames and resize_images scripts useful for this.

Running a pre-trained model

Store frames from your videos in one directory frames_root, with frames at frames_root/video_name/frame%04d.png as described above.

To evaluate the predictive-corrective model, run

th scripts/evaluate_model.lua \
    --model data/multithumos/models/pc_c33-1_fc7-8.t7 \
    --frames /path/to/frames_root \
    --output_log /path/to/output.log \
    --sequence_length 8 \
    --step_size 1 \
    --batch_size 16 \
    --output_hdf5 /path/to/output_predictions.h5

Training a model

Single-frame

To train a single frame model, look at config/config-vgg.yaml. Documentation for each config parameter is available in main.lua, but the only ones you really need to change are the path to training and test frames.

train_source_options:
    frames_root: '/path/to/multithumos/test/frames'
    labels_hdf5: 'data/multithumos/labels/test.h5'

val_source_options:
    frames_root: '/path/to/multithumos/trainval/frames'
    labels_hdf5: 'data/multithumos/labels/trainval.h5'

Once you have updated these, run

th main.lua config/config-vgg.yaml /path/to/output/directory

Predictive Corrective

First, generate a predictive-corrective model initialized from a trained single-frame model, as follows:

th scripts/make_predictive_corrective.lua \
--model data/multithumos/models/vgg16-trained.t7 \
--output data/multithumos/models/pc_c33-1_fc7-8-init.t7

Next, update config/config-predictive-corrective.yaml to point to your dumped frames, as described above. Then, run

th main.lua config/config-predictive-corrective.yaml /path/to/output/directory

This usually takes 2-3 days to run on 4 GPUs.

Required packages

Note: This is an incomplete list! TODO(achald): Document all required packages.

  • argparse
  • classic
  • cudnn
  • cutorch
  • luaposix
  • lyaml
  • nnlr
  • rnn

Caution: Other configs/scripts

Please note that there are a number of other scripts and configs in this repository that are not well documented. I am sharing them in case any of them are useful to look at, to see how I use the model, etc., but beware that they may be broken and I may not be able to help you fix them.

Extra: Generate labels hdf5 files

For convenience, we provide the labels for the datasets we use as HDF5 files. However, it is possible to generate these yourself. Here is the script I used to generate MultiTHUMOS labels HDF5, and here is a similar script for Charades. These are not very well documented, but feel free to contact me if you run into any issues.