Robust pose transfer with dynamic details using neural video rendering framwork

Paper | Video

Abstract

Pose transfer of human videos aims to generate a high-fidelity video of a target person imitating actions of a source person. A few studies have made great progress either through image translation with deep latent features or neural rendering with explicit 3D features. However, both of them rely on large amounts of training data to generate realistic results, and the performance degrades on more accessible Internet videos due to insufficient training frames. In this paper, we demonstrate that the dynamic details can be preserved even when trained from short monocular videos. Overall, we propose a neural video rendering framework coupled with an image-translation-based dynamic details generation network (D$^2$G-Net), which fully utilizes both the stability of explicit 3D features and the capacity of learning components. To be specific, a novel hybrid texture representation is presented to encode both the static and pose-varying appearance characteristics, which is then mapped to the image space and rendered as a detail-rich frame in the neural rendering stage. Through extensive comparisons, we demonstrate that our neural human video renderer is capable of achieving both clearer dynamic details and more robust performance.

Environment

conda create -n JittorEnv python=3.8 
conda activate JittorEnv

First, install jittor

There are also some other dependencies

click to expand

opencv-python
scikit-image
scipy
dominate
tensorflow (for tensorboard)

Dataset

The data used in this code can be any single-person video. It should be organized like this:

dataName
├── bg.jpg
├── dataName (this is video frames folder)
├── densepose
├── flow
├── flow_inv
├── LaplaceProj (this is the 3D pose label)
├── mask
├── openpose_json
└── texture.jpg

The bg.jpg is the background image extracted with inpainted apporach from the video (or your can even use a common video frame). It will update during the training process.

The densepose is extracted by the Densepose.

The flow and flow_inv is the forward and backward flow calclulated by FlowNet2.

The LaplaceProj contains the 3D pose labels explained in this paper. It's optional in our approach.

The mask is extracted by Human-Segmentation-PyTorch. The predicted mask will also refine automatically and becomes more accuarte than this "ground truth".

The openpose_json is extracted by Openpose[https://github.com/CMU-Perceptual-Computing-Lab/openpose].

The texture.jpg is the initial texture calculated from densepose results and video frames by executing python3 unfold_texture.py $video_frame_dir $densepose_dir.

Training

Our approach needs to pre-train the UV generator at start. This training is person-agnostic.

Pre-train of UV generator:

bash pretrainTrans.sh

The data required for pretrain includeds pose_path(2d keypoints json / 2d+3d pose labels), densepose_path and mask_path.

Train the whole model end-to-end:

bash train_start/train.sh

To train successfully, one need to specificy the aforementioned prepared dataset.

To view training results, please checkout intermediate results in $checkpoints_dir/$name/web/index.html. If you have tensorflow installed, you can see tensorboard logs in $checkpoints_dir/$name/logs.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
imgs		imgs
models		models
options		options
train_start		train_start
util		util
README.md		README.md
kpt_render.py		kpt_render.py
pre_train.py		pre_train.py
pretrainTrans.sh		pretrainTrans.sh
test.py		test.py
test_dataset.py		test_dataset.py
train.py		train.py
unfold_texture.py		unfold_texture.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robust pose transfer with dynamic details using neural video rendering framwork

Paper | Video

Abstract

Environment

Dataset

Training

Transfer results

About

Releases

Packages

Languages

IGLICT/DynamicHumanGeneration_Jittor

Folders and files

Latest commit

History

Repository files navigation

Robust pose transfer with dynamic details using neural video rendering framwork

Paper | Video

Abstract

Environment

Dataset

Training

Transfer results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages