This is a project for COMS 4995 Deep Learning. We propose a new task, which is text guided pose synthesis on large scale of human activities. We first analyze the state-of-the-art model for pose keypoints estimation, the power of human semantic parsing, and existing text guided image synthesis. Further, we proposed an approach that can solve our target task with 3 stages. We then talked about the potential improvement and drawbacks of our model.
Install all dependencies in requrements.txt
.
pytorch-ssim should also be installed for cgan training.
data
: processed MPII data and data path csv.intermediate
: directory for preprocessed data and pretrained models.model
: implementation of various neural networks, including annotation classifier, and conditional GAN.output
: sample results for synthesized pose keypoints and semantic parsing.utils
: some helper methods to process images and texts.data_clustering.py
: k-means clustering algorithm for images.pose_dataset.py
: customized dataset and dataloader using PyTorch.test.py
: test code to run the whole pipeline.train*.py
: training code for different neural networks.
Download our preprocessed data, pretrained classifier, generators, and discriminators model from google drive and put them inside the ./intermediate
folder. Run python3 test.py
for testing. It will allow you to enter a brief annotation (no longer than 15 words) and generate the corresponding pose and semantic parsing.
- Unsupervised Person Image Generation with Semantic Parsing Transformation
- OpenPose
- Detectron2
- LIP_JPPNet
- pytorch-ssim
- PyTorch-GAN
- vae-torch
- More references are cited in the project report.