We follow the 3D skeleton-based action recognition setup and implementation from Shi et al. [2]
Sample (n_frames, feat_dim)
: Each action segment (start-end span) from BABEL is divided into contiguous 5-second chunks. See the paper for more details.
Label <int>
: Index of the ground-truth action label of the segment that the current chunk belongs to.
We extract the joint positions (in x, y, z
co-ordinates) from the AMASS mocap sequences in NTU RGB+D [1] skeleton format. There are 25 joints, resulting in feat_dim=25*3=75
.
Each sample is a 5-second chunk @ 30fps, resulting in n_frames=150
.
Pre-preprocessing of the skeleton joints follows Shi et al. [2]. Download the pre-processed sample features and corresponding labels:
# BABEL Dense
cd data/
wget https://human-movement.is.tue.mpg.de/babel_feats_labels.tar.gz
tar -xzvf babel_feats_labels.tar.gz -C ./
# BABEL Dense+Extra
wget https://human-movement.is.tue.mpg.de/babel_dense_and_extra_feats_labels.tar.gz
tar -xzvf babel_dense_and_extra_feats_labels.tar.gz -C ./
Note: We only train and test with Dense annotations. For details regarding Dense and Extra annotations, please see BABEL's Data page.
Set up and activate a virtual environment:
python3 -m venv babel-env
source $PWD/babel-env/bin/activate
$PWD/babel-env/bin/pip install --upgrade pip setuptools
$PWD/babel-env/bin/pip install -r requirements.txt
We use this implementation for the 2S-AGCN [2] model for 3D skeleton-based action recognition. Note that we use only the Joint-stream alone.
To train a model with CE loss:
From the top directory babel/
, enter the following to train a model with the Cross-Entropy loss:
python action_recognition/train_test.py --config action_recognition/config/babel_v1.0/train_60.yaml
To train a model with Focal loss [3] with class-balancing [4]:
python action_recognition/train_test.py --config action_recognition/config/babel_v1.0/train_60_wfl.yaml
You can use the repsective configuration files inside config/babel_v1.0
to train the model with 120
classes in both ways.
Provide the path to the trained model in the weights
key in the respective config file.
To perform inference, use the same command as when training, and pass the test config file as argument. E.g.:
python action_recognition/main.py --config action_recognition/config/babel_v1.0/test_60.yaml
or
python action_recognition/main_wl.py --config action_recognition/config/babel_v1.0/test_60_wfl.yaml
To save the predicted scores to disk, in the config file, set save_score: True
.
Download the checkpoints from the links below and place them in action_recognition/ckpts/
.
Performing inference on the validation set should result in the following performance.
# Classes | Loss type | Ckpt | Top-5 | Top-1 | Top-1-norm |
---|---|---|---|---|---|
BABEL-60 | CE | ntu_sk_60_agcn_joint_const_lr_1e-3-17-6390.pt | 0.74 | 0.42 | 0.24 |
BABEL-60 | Focal | wfl_ntu_sk_60_agcn_joint_const_lr_1e-3-93-33370.pt | 0.69 | 0.34 | 0.30 |
BABEL-120 | CE | ntu_sk_120_agcn_joint_const_lr_1e-3-15-12240.pt | 0.72 | 0.4 | 0.16 |
BABEL-120 | Focal | wfl_ntu_sk_120_agcn_joint_const_lr_1e-3-157-60356.pt | 0.59 | 0.29 | 0.23 |
Note: The models are only trained with dense labels from train.json
(See project webpage for more details about the data).
Description
- Top-1 measures the accuracy of the highest-scoring prediction.
- Top-5 evaluates whether the ground-truth category is present among the top 5 highest-scoring predictions.
- It accounts for labeling noise and inherent label ambiguity.
- It also accounts for the possible association of multiple action categories with a single input movement sequence. For instance, a person
walking in a circle
is mapped to the two action categorieswalk
andcircular movement
. Ideal models will predict high scores for all the categories relevant to the movement sample.
- Top-1-norm is the mean
Top-1
across categories. The magnitude ofTop-1-norm
-Top-1
illustrates the class-specific bias in the model performance. In Babel, it reflects the impact of class imbalance on learning.
To make a submission:
- Store the predictions (variable
pred_scores
in L591 oftrain_test.py
) as a python pickle.pred_scores
is list of tuples, each containing the following 4 elements — (sequence ID, segment ID, chunk ID, score). Here score is annp.array
of size(N, C)
whereN
is # samples in the test set andC
is the # classes.- By default,
train_test.py
stores this pickle file as<work_dir>/epoch1_test_score.pkl
(see L604).
- In the command line, type the following commands:
cd action_recognition/challenge/
python create_submission.py --pred_path <work_dir>/epoch1_test_score.pkl --sub_path <path on disk to write submission file>
- Note: This code assumes that the GT test samples (
test_label_{60, 120}.pkl
) are present in the following path:action_recognition/data/release/
- Submit the
.npz
submission file to the BABEL Action Recognition Challenge evaluation server.
[1] Shahroudy, Amir, et al. "NTU RGB+D: A large scale dataset for 3d human activity analysis." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[2] Shi, Lei, et al. "Two-stream adaptive graph convolutional networks for skeleton-based action recognition." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
[3] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017.
[4] Cui, Yin, et al. "Class-balanced loss based on effective number of samples." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.