# Pose Detection from Youtube Instructional video with VIBE

This notebook uses the open-source project [/mkocabas/VIBE](https://github.com/mkocabas/VIBE) to detect person shape and pose from instructional videos from Youtube.

It is inspired from [/tugstugi/dl-colab-notebooks](https://github.com/tugstugi/dl-colab-notebooks/blob/master/notebooks/OpenPose.ipynb) and makes use of  [youtube-dl program](https://github.com/ytdl-org/youtube-dl) to load and extract frames from Youtube videos.


## Install Vibe

In [None]:
# Clone the repo
!git clone https://github.com/mkocabas/VIBE.git

%cd VIBE/

# Install the other requirements
!pip install torch==1.4.0 numpy==1.17.5
!pip install git+https://github.com/giacaglia/pytube.git --upgrade
!pip install -r requirements.txt

# Download pretrained weights and SMPL data
!source scripts/prepare_data.sh

# Load Video

In [None]:
from IPython.display import YouTubeVideo
YOUTUBE_ID = 'Ae3AkGYpWsM'  # 00:17 
YouTubeVideo(YOUTUBE_ID)

In [None]:
# install python dependencies
!pip install -q youtube-dl

!rm -rf youtube.mp4

# download the youtube with the given ID
!youtube-dl -f 'bestvideo[ext=mp4]' --output "youtube.%(ext)s" https://www.youtube.com/watch?v=$YOUTUBE_ID

# cut the first 7 seconds, starting from the second 36
!ffmpeg -y -loglevel info -i youtube.mp4 -ss 00:00:36 -t 7 video.mp4

# Apply VIBE

To apply VIBE, we run the [VIBE/demo.py](https://github.com/mkocabas/VIBE/blob/master/demo.py) with the default **bbox** tracking method and **Yolo** detector as it gives a good tradeoff between speed and accuracy. 

Please refer to [VIBE/demo.md](https://github.com/mkocabas/VIBE/blob/master/doc/demo.md) for further details about the demo.

In [None]:
%cd /content/VIBE

!python demo.py --vid_file video.mp4 --output_folder ../

# Show results

In [None]:
# this function is borrowed from https://github.com/tugstugi/dl-colab-notebooks/blob/master/notebooks/OpenPose.ipynb
def show_local_mp4_video(file_name, width=640, height=480):
  import io
  import base64
  from IPython.display import HTML
  video_encoded = base64.b64encode(io.open(file_name, 'rb').read())
  return HTML(data='''<video width="{0}" height="{1}" alt="test" controls>
                        <source src="data:video/mp4;base64,{2}" type="video/mp4" />
                      </video>'''.format(width, height, video_encoded.decode('ascii')))

In [None]:
show_local_mp4_video('/content/video/video_vibe_result.mp4', width=960, height=720)

## VIBE Output format

In [None]:
import joblib
output = joblib.load('/content/video/vibe_output.pkl')
print(output.keys()) 

In [None]:
for k,v in output[1].items(): 
  if (k!="joints2d"):
    print(k, v.shape) 

VIBE outputs  a dictionary associating to each tracked person id a vector of the pose and shape predictions along the frames in SMPL format:



```
pred_cam (n_frames, 3)      # weak perspective camera parameters in cropped image space (s,tx,ty)
orig_cam (n_frames, 4)      # weak perspective camera parameters in original image space (sx,sy,tx,ty)
verts (n_frames, 6890, 3)   # SMPL mesh vertices
pose (n_frames, 72)         # SMPL pose parameters
betas (n_frames, 10)        # SMPL body shape parameters
joints3d (n_frames, 49, 3)  # SMPL 3D joints
joints2d (n_frames, 21, 3)  # 2D keypoint detections by STAF if pose tracking enabled otherwise None
bboxes (n_frames, 4)        # bbox detections (cx,cy,w,h)
frame_ids (n_frames,)       # frame ids in which subject with tracking id #1 appears
```

