Towards Human Pose Prediction using the Encoder-Decoder LSTM

Ranked 3rd at Social Motion Forecasting (SoMoF) Benchmark, a challenge held by Stanford University.

Absract:

Human pose prediction is defined as predicting the human keypoints locations for future frames, given the observed ones for past frames. It has numerous applications in various fields like autonomous driving. This task can be seen as a fine-grained task while human bounding box prediction deals with more coarse-grained information. The former has been investigated less and here, we modify one of the previously used architectures of bounding box prediction to do a harder task of pose prediction in the SoMoF challenge. The results show the effectiveness of the proposed method in evaluation metrics.

Introduction:

This is the official code for the extended abstract "Towards Human Pose Prediction using the Encoder-Decoder LSTM", accepted and published in "ICCVW 2021"

Proposed Method

The proposed method is a sequence to sequence LSTM model based on pv-lstm. It takes as input the velocities and the positions of observed past joints and outputs the predicted velocities of the future joints, from which the future positions can be computed. As figure below shows, the model encodes the position and the velocity of each person into a hidden layer which will be used as the initial state for the decoder. Using the encoded state, the decoder takes the velocity of the last observed frame as input and generates the predicted velocity for the first future frame which will be used as the input to the next LSTM cell. To train this model, the l1 loss between the predicted and ground-truth velocities is leveraged.

Results

You can see the comparative results of our model with many different baselines such as zero-vel, SC-MPF and TRiPOD on both PoseTrack and 3DPW down below.

Installation:

Start by cloning this repositiory:

git clone https://github.com/Armin-Saadat/SoMoF.git
cd decoupled-pose-prediction

Create a virtual environment:

virtualenv myenv
source myenv/bin/activate

And install the dependencies:

pip install -r requirements.txt

Dataset:

We use the preprocessed posetrack and 3dpw datasets in SoMoF challenge. For easy usage, these datasets are preprocessed. The clean version of dataset is available at /preprocess_csvs.

Training / Validation / Prediction:

In order to train the model for posetrack:

python3 -m train_scripts.lstmvel_posetrack

To train the model on 3DPW:

python3 -m train_scripts.lstmvel_3dpw

Model also is validating each epoch on training section.

The output will be the vim and vam values also you can visualize your outpurs using utils/vis.py .

Test the trained network on posetrack:

python -m predict.lstmvel_posetrack.py --load_ckpt=<path_to_saved_snapshot.pth>

Test and predict the trained network on 3dpw:

python -m predict.lstmvel_3dpw.py --load_ckpt=<path_to_saved_snapshot.pth>

where other options are similar to the training.

We also have implemented many other models that you can see in models/ directory. If you want to run those, you have to repeat aforementioned procedure for those models.

Arguments

This is a description to provide details about arguments of Posepred API. Pospred is an open-source toolbox for pose prediction in PyTorch. It is a part of the VitaLab project.

usage: python -m train_scripts.lstm_vel_posetrack [-h] [--dataset_name] [--dataset_path] [--data_usage]                          	
	                             [--obs_frames_num] [--pred_frames_num] [--keypoint_dim]
				     [--interactive] [--use_mask] [--skip_num]  
	                             [--use_video_once] [--output_name] [--annotaion]
  
mandatory arguments:  
  --batch_size          size of batch size (int, default=80) (only use in training) 
  --epochs              Number of epochd (int, default=200)  (only use in training) 
  --learning_rate       learning_rate  (float, default=0.01) (only use in training) 
  --lr_decay_rate       Decay learning rate by <lr_decay_rate> on plataeu (flaot, default=0.25)  (only use in training) 
  --output              Number of frames to predict (int, default=14)     
  --hidden_size         Size of hidden layer in LSTM (int, defalut=1000) (only use in training) 
  --load_ckpt          Load model from <load_ckpt> path (use only in prediction)

    
optional arguments:  
  -h, --help               Show this help message and exit  
  --num_workers            How many workers to use (default=1)
  --pin_memory             Pin memory or not (default=False)
  --device		               On which device does the training appear (default='cuda') 
  --n_layers        	      Number of frame to skip between each two used frames (int, default=1) (only use in training) 
  --name                   Name of saved snapshot (str, default=None)
  --dropout_encoder        How much of data should be dropped through training (default=0) (use only in training)
  --dropout_pose_decoder   How much of pose data should be dropped in pose decoder through training (default=0) (only use in training) 
  --dropout_mask_decoder   How much of mask data should be dropped in mask decoder through training (default=0) (only use in training) 
  -- save_folder           To which folder should model weight be saved (only use in training) 
  -- save_freq             Every <save_freq> epochs save the model (only use in training)

Example for predictiong:

python3 -m predict.lstmvel_posetrack --load_ckpt="../snapshots/lstm_vel_epoch250.pth"

Example for training:

python3 -m train_scripts.lstmvel_posetrack --learning_rate=0.01 --lr_decay_rate=0.8 --batch_size=3000 --save_freq=100 --epochs=250 --name='local_lr0.01_dec0.8'

Tested Environments:

Ubuntu 20.04, CUDA 10.1

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
dataloader		dataloader
models		models
predict		predict
processed_csvs		processed_csvs
statics		statics
train		train
utils		utils
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Towards Human Pose Prediction using the Encoder-Decoder LSTM

Absract:

Introduction:

Table of Contents

Proposed Method

Results

Installation:

Dataset:

Training / Validation / Prediction:

Arguments

Tested Environments:

About

Releases

Packages

Contributors 2

Languages

License

Armin-Saadat/pose-prediction-autoencoder

Folders and files

Latest commit

History

Repository files navigation

Towards Human Pose Prediction using the Encoder-Decoder LSTM

Absract:

Introduction:

Table of Contents

Proposed Method

Results

Installation:

Dataset:

Training / Validation / Prediction:

Arguments

Tested Environments:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages