Skip to content

Latest commit

 

History

History

action_recognition

Action Recognition

We follow the 3D skeleton-based action recognition setup and implementation from Shi et al. [2]

Task

Sample (n_frames, feat_dim): Each action segment (start-end span) from BABEL is divided into contiguous 5-second chunks. See the paper for more details. Label <int>: Index of the ground-truth action label of the segment that the current chunk belongs to.

Features

We extract the joint positions (in x, y, z co-ordinates) from the AMASS mocap sequences in NTU RGB+D [1] skeleton format. There are 25 joints, resulting in feat_dim=25*3=75.

Each sample is a 5-second chunk @ 30fps, resulting in n_frames=150.

Pre-preprocessing of the skeleton joints follows Shi et al. [2]. Download the pre-processed sample features and corresponding labels:

# BABEL Dense
cd data/
wget https://human-movement.is.tue.mpg.de/babel_feats_labels.tar.gz
tar -xzvf babel_feats_labels.tar.gz -C ./

# BABEL Dense+Extra
wget https://human-movement.is.tue.mpg.de/babel_dense_and_extra_feats_labels.tar.gz
tar -xzvf babel_dense_and_extra_feats_labels.tar.gz -C ./

Note: We only train and test with Dense annotations. For details regarding Dense and Extra annotations, please see BABEL's Data page.

Training and Inference

Set up and activate a virtual environment:

python3 -m venv babel-env
source $PWD/babel-env/bin/activate
$PWD/babel-env/bin/pip install --upgrade pip setuptools
$PWD/babel-env/bin/pip install -r requirements.txt

Model

We use this implementation for the 2S-AGCN [2] model for 3D skeleton-based action recognition. Note that we use only the Joint-stream alone.

Training

To train a model with CE loss:

From the top directory babel/, enter the following to train a model with the Cross-Entropy loss:

python action_recognition/train_test.py --config action_recognition/config/babel_v1.0/train_60.yaml

To train a model with Focal loss [3] with class-balancing [4]:

python action_recognition/train_test.py --config action_recognition/config/babel_v1.0/train_60_wfl.yaml

You can use the repsective configuration files inside config/babel_v1.0 to train the model with 120 classes in both ways.

Inference

Provide the path to the trained model in the weights key in the respective config file.

To perform inference, use the same command as when training, and pass the test config file as argument. E.g.:

python action_recognition/main.py --config action_recognition/config/babel_v1.0/test_60.yaml

or

python action_recognition/main_wl.py --config action_recognition/config/babel_v1.0/test_60_wfl.yaml

To save the predicted scores to disk, in the config file, set save_score: True.

Pre-trained models

Download the checkpoints from the links below and place them in action_recognition/ckpts/.

Performing inference on the validation set should result in the following performance.

# Classes Loss type Ckpt Top-5 Top-1 Top-1-norm
BABEL-60 CE ntu_sk_60_agcn_joint_const_lr_1e-3-17-6390.pt 0.74 0.42 0.24
BABEL-60 Focal wfl_ntu_sk_60_agcn_joint_const_lr_1e-3-93-33370.pt 0.69 0.34 0.30
BABEL-120 CE ntu_sk_120_agcn_joint_const_lr_1e-3-15-12240.pt 0.72 0.4 0.16
BABEL-120 Focal wfl_ntu_sk_120_agcn_joint_const_lr_1e-3-157-60356.pt 0.59 0.29 0.23

Note: The models are only trained with dense labels from train.json (See project webpage for more details about the data).

Metrics

Description

  1. Top-1 measures the accuracy of the highest-scoring prediction.
  2. Top-5 evaluates whether the ground-truth category is present among the top 5 highest-scoring predictions.
    1. It accounts for labeling noise and inherent label ambiguity.
    2. It also accounts for the possible association of multiple action categories with a single input movement sequence. For instance, a person walking in a circle is mapped to the two action categories walk and circular movement. Ideal models will predict high scores for all the categories relevant to the movement sample.
  3. Top-1-norm is the mean Top-1 across categories. The magnitude of Top-1-norm - Top-1 illustrates the class-specific bias in the model performance. In Babel, it reflects the impact of class imbalance on learning.

Challenge

To make a submission:

  1. Store the predictions (variable pred_scores in L591 of train_test.py) as a python pickle.
    • pred_scores is list of tuples, each containing the following 4 elements — (sequence ID, segment ID, chunk ID, score). Here score is an np.array of size (N, C) where N is # samples in the test set and C is the # classes.
    • By default, train_test.py stores this pickle file as <work_dir>/epoch1_test_score.pkl (see L604).
  2. In the command line, type the following commands:
    1. cd action_recognition/challenge/
    2. python create_submission.py --pred_path <work_dir>/epoch1_test_score.pkl --sub_path <path on disk to write submission file>
    • Note: This code assumes that the GT test samples (test_label_{60, 120}.pkl) are present in the following path: action_recognition/data/release/
  3. Submit the .npz submission file to the BABEL Action Recognition Challenge evaluation server.

References

[1] Shahroudy, Amir, et al. "NTU RGB+D: A large scale dataset for 3d human activity analysis." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[2] Shi, Lei, et al. "Two-stream adaptive graph convolutional networks for skeleton-based action recognition." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
[3] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017.
[4] Cui, Yin, et al. "Class-balanced loss based on effective number of samples." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.