this is the implementation of the paper accepted by Science Information. Journal.
the code is consists of there pattern:
- multi-view dynamic images generation.
- multi-view CNN training.
- faster rcnn based human motion detection
The Dataset (such as NTU RGBD) and the pretrained model imagenet-vgg-f
libnear library for matlab LIBLEANER is used to generate the dynamic images.
matconvnet-1.0-bata23 MatConvNet is used for the CNN training stage.
Caffe build for Faster R-CNN If you are using Windows, you may download a compiled mex file by running fetch_data/fetch_caffe_mex_windows_vs2013_cuda65.m
Prepare your data path then run the function file called 'MVDI_Generation/dynamic_mutil_general_NTU.m'. All the involved sub-function is contained in the first folder.
After the MVDI data is generated by the former step, then feed the MVDI data to the CNN. 'View_Shared_CNN_Training/train_depth_share_ntu_view_DMM/cnn_dicnn.m' will be easy to run if your data path setting is correct.
For human detection, we construct the training sample by the human skeleton information, which described in the subfunction 'get_bounding_box_skelen.m'.
'create_output.m & create_train_test.m' is used to create the train and test sample for training.
'script_faster_rcnn_demo_testall_ntu.m' is the final training function.
More detailed faster-rcnn usage can be refered to faster_rcnn.
Please cite the following paper if you use this repository in your research.
@article{xiao2018action,
title={Action Recognition for Depth Video using Multi-view Dynamic Images},
author={Xiao, Yang and Chen, Jun and Wang, Yancheng and Cao, Zhiguo and Zhou, Joey Tianyi and Bai, Xiang},
journal={Information Sciences},
year={2018},
publisher={Elsevier}
}
For any question, feel free to contact
Yancheng Wang : yancheng_wang@hust.edu.cn
Yang Xiao :Yang_Xiao@hust.edu.cn