# VIBE: Video Inference for Human Body Pose and Shape Estimation

Demo of the original PyTorch based implementation provided here: https://github.com/mkocabas/VIBE

## Note
Before running this notebook make sure that your runtime type is 'Python 3 with GPU acceleration'. Go to Edit > Notebook settings > Hardware Accelerator > Select "GPU".

## More Info
- Paper: https://arxiv.org/abs/1912.05656
- Repo: https://github.com/mkocabas/VIBE

In [None]:
# Clone the repo
!git clone https://github.com/mkocabas/VIBE.git

In [None]:
%cd VIBE/

In [1]:
# Install the other requirements
# !pip install torch==1.4.0 numpy==1.17.5
# !pip install git+https://github.com/giacaglia/pytube.git --upgrade
# # !pip install git+https://github.com/mkocabas/multi-person-tracker.git
# !pip install -r requirements.txt
%cd /content
!git clone -b dev https://github.com/camenduru/VIBE
%cd /content/VIBE

!source scripts/prepare_data.sh

!pip install -q git+https://github.com/mkocabas/multi-person-tracker
!pip install -q git+https://github.com/mkocabas/yolov3-pytorch
!pip install -q git+https://github.com/mattloper/chumpy
!pip install -q git+https://github.com/giacaglia/pytube
!pip install -q yacs smplx trimesh pyrender progress filterpy scikit-video

/content
Cloning into 'VIBE'...
remote: Enumerating objects: 418, done.[K
remote: Counting objects: 100% (225/225), done.[K
remote: Compressing objects: 100% (93/93), done.[K
remote: Total 418 (delta 164), reused 133 (delta 132), pack-reused 193 (from 1)[K
Receiving objects: 100% (418/418), 15.10 MiB | 15.57 MiB/s, done.
Resolving deltas: 100% (209/209), done.
/content/VIBE
Downloading...
From (original): https://drive.google.com/uc?id=1untXhYOLQtpNEy4GTY_0fL_H-k6cTf_r
From (redirected): https://drive.google.com/uc?id=1untXhYOLQtpNEy4GTY_0fL_H-k6cTf_r&confirm=t&uuid=e471a00d-66a4-44ac-b1bc-f661797ffbb3
To: /content/VIBE/data/vibe_data.zip
100% 561M/561M [00:06<00:00, 90.5MB/s]
Archive:  vibe_data.zip
   creating: vibe_data/
  inflating: vibe_data/smpl_mean_params.npz  
  inflating: vibe_data/vibe_model_w_3dpw.pth.tar  
  inflating: vibe_data/gmm_08.pkl    
  inflating: vibe_data/J_regressor_h36m.npy  
  inflating: vibe_data/vibe_model_wo_3dpw.pth.tar  
  inflating: vibe_data/SMPL_N

In [None]:
# Download pretrained weights and SMPL data
!source scripts/prepare_data.sh

In [2]:
# Mount Google Drive
from google.colab import drive
import os
import sys

drive.mount('/content/gdrive')
# Define base folder path
base_path = '/content/gdrive/MyDrive/RGB_data_stream'
sys.path.append(os.path.abspath("VIBE"))

Mounted at /content/gdrive


In [3]:
video_dir = os.path.join(base_path, 'short')
output_dir = os.path.join(base_path, 'VIBE')
os.makedirs(output_dir, exist_ok = True)

In [13]:
test_video = os.path.join(video_dir, '1002.mp4')

### Run the demo code.

Check https://github.com/mkocabas/VIBE/blob/master/doc/demo.md for more details about demo.

**Note:** Final rendering is slow compared to inference. We use pyrender with GPU accelaration and it takes 2-3 FPS per image. Please let us know if you know any faster alternative.

In [None]:
# Run the demo
!python demo.py --vid_file {test_video} --output_folder {output_dir} --sideview

# You may use --sideview flag to enable from a different viewpoint, note that this doubles rendering time.
# !python demo.py --vid_file sample_video.mp4 --output_folder output/ --sideview

# You may also run VIBE on a YouTube video by providing a link
# python demo.py --vid_file https://www.youtube.com/watch?v=c4DAnQ6DtF8 --output_folder output/ --display

Running "ffmpeg -i /content/gdrive/MyDrive/RGB_data_stream/short/1002.mp4 -f image2 -v error /tmp/1002_mp4/%06d.png"
Images saved to "/tmp/1002_mp4"
Input video number of frames 151
Running Multi-Person-Tracker
  self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])
100% 13/13 [00:09<00:00,  1.33it/s]
Finished. Detection + Tracking FPS 15.46
  checkpoint = torch.load(pretrained)
  pretrained_dict = torch.load(pretrained)['model']
=> loaded pretrained model from 'data/vibe_data/spin_model_checkpoint.pth.tar'
  ckpt = torch.load(pretrained_file)
Performance of pretrained model on 3DPW: 56.56075477600098
Loaded pretrained weights from "data/vibe_data/vibe_model_wo_3dpw.pth.tar"
Running VIBE on each tracklet...
100% 1/1 [00:03<00:00,  3.90s/it]
VIBE FPS: 38.67
Total time spent: 16.13 seconds (including model loading time).
Total FPS (including model loading time): 9.36.
Saving output results to "/content/gdrive/MyDrive/RGB_data_stream/VIBE

In [None]:
A# Play the generated video
from IPython.display import HTML
from base64 import b64encode

def video(path):
  mp4 = open(path,'rb').read()
  data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
  return HTML('<video width=500 controls loop> <source src="%s" type="video/mp4"></video>' % data_url)

video_result = os.path.join('')
video(output)
video('output/sample_video/sample_video_vibe_result.mp4')

In [None]:
# Inspect the output file content
import joblib
output = joblib.load('output/sample_video/vibe_output.pkl')
print('Track ids:', output.keys(), end='\n\n')

print('VIBE output file content:', end='\n\n')
for k,v in output[1].items():
  if k != 'joints2d':
    print(k, v.shape)