By Bo Pang, Kaiwen Zha, Cewu Lu.


ADHA is the first human action adverb recognition dataset. This hybrid model is the baseline of this dataset. The model is a fusion of two-stream model, pose-based LSTM model and expression model. The expression information is acting as a feature that combined into the CNN feature of the PBLSTM and Two-Stream model. The framework of the model is like this:

RMPE Framework


  1. Get the code.
git clone
cd Hybrid-model-for-human-action-adverb-recognition
  1. Get the dataset: You can download the ADHA dataset from here

  2. PBLSTM:

  • Get the pose info using Open Pose. The output is skeleton videos.
  • Use ./pose/ to get the input of the PBLSTM model.
  • ./PBLSTM/ & ./PBLSTM/ to train and output the result of the model.
  1. Two-Stream model
  • Use ./Two_Stream/get_input_data/get_optical_flow to get the optical flow of the raw video.
  • Use ./Two_Stream/get_input_data/ to get the input of the two stream video. The output has two folder: "of" and "rgb".("of" folder for motion stream and "rgb" folder for spatial stream)
  • Use ./Two-Stream/motion/ and ./Two-Stream/spatial/ to train the model and use ./Two-Stream/Fusion/ to output the result.
  1. Expression
  • Use this hybrid model to get the expression result of the video. This model is the winner of EmotiW2016. The result is saved as txt file.
  • To combine the expression feature into the above two models, set the parameter "withexpression" to "True" in the and and set the parameter "expression_path" to the expression result folder.
  • Retrain the models.
  1. Fusion to get the final result
  • Run ./Hybrid_Fusion/ to get the final reuslt of the hybrid model.


Please cite the paper in your publications if it helps your research:

  title={Human Action Adverb Recognition: ADHA Dataset and A Hybrid Model},
  author={Bo, Pang and Zha, Kaiwen and Lu, Cewu},
  booktitle={ArXiv preprint},


Thanks to OpenPose and Hybrid expression model.


