# **Human Action Recognition with MMPose and Spatio-Tempoal Graph Convolutional Network**
HigherHRNet48 + STGCN

# Download the Florence 3D action dataset.

## Florence 3D actions dataset

The dataset collected at the University of Florence during 2012, has been captured using a Kinect camera. It includes 9 activities: wave, drink from a bottle, answer phone,clap, tight lace, sit down, stand up, read watch, bow. During acquisition, 10 subjects were asked to perform the above
actions for 2/3 times. This resulted in a total of 215 activity samples.
We suggest a leave-one-actor-out protocol: train your classifier using all the sequences from 9 out of 10 actors and test on the remaining one. Repeat this procedure for all actors and average the 10 classification accuracy values.

Actions 
1.	wave
2.	drink from a bottle
3.	answer phone
4.	clap
5.	tight lace
6.	sit down
7.	stand up
8.	read watch
9.	bow

Videos depicting the actions are named GestureRecording_Id\<ID_GESTURE\>actor\<ID_ACTOR\>idAction\<ID_ACTION\>category\<ID_CATEGORY\>.avi
The file The file Florence_dataset_Features.txt contains all the pose features with annotate actor and actions. Each line is formatted according to the following:

%idvideo idactor idcategory  f1....fn

where f1-f24 are our normalized body part coordinates
and f25 is the normalized frame value.

Specifically:  
  elbows: f1-f6; (1-3 left elbow, 4-6 right elbow, same applies for all other joints)  
  wrists: f13-f18  
  knees: f7-f12  
  ankles: f19-f24  
  normalized frame value: f25  

The file Florence_dataset_WorldCoordinates.txt
Contains the world coordinates for all the joints. Thanks to Maxime Devanne for parsing this data! Each line is formatted according to the following:

%idvideo idactor idcategory  f1....fn
where f1-f45 are world coordinates of all the 15 joints.

Specifically:  
  Head: f1-f3  
  Neck: f4-f6  
  Spine: f7-f9  
  Left Shoulder: f10-f12  
  Left Elbow: f13-f15  
  Left Wrist: f16-f18  
  Right Shoulder: f19-f21  
  Right Elbow: f22-f24  
  Right Wrist: f25-f27  
  Left Hip: f28-f30  
  Left Knee: f31-f33  
  Left Ankle: f34-f36  
  Right Hip: f37-f39  
  Right Knee: f40-f42  
  Right Ankle: f43-f45  


In [None]:
%%shell
curl https://www.micc.unifi.it/vim/wp-content/uploads/datasets/florence3d_actions.zip -o florence3d_actions.zip
unzip -o -q florence3d_actions.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  303M  100  303M    0     0  1143k      0  0:04:31  0:04:31 --:--:-- 3214k




# Functions

In [None]:
import cv2
from tqdm.notebook import trange, tqdm

def frame_iter(capture, description = ""):
  def _iterator():
    while capture.grab():
      yield capture.retrieve()

  return tqdm(
    _iterator(),
    desc=description,
    total=int(capture.get(cv2.CAP_PROP_FRAME_COUNT)),
    leave=False,
  )


def process_mmdet_results(mmdet_results, cat_id=0):
  """Process mmdet results, and return a list of bboxes.

  :param mmdet_results:
  :param cat_id: category id (default: 0 for human)
  :return: a list of detected bounding boxes
  """
  if isinstance(mmdet_results, tuple):
      det_results = mmdet_results[0]
  else:
      det_results = mmdet_results
  return det_results[cat_id]


# visualization
def show_local_mp4_video(file_name, width=640, height=480):
  import io
  import base64
  from IPython.display import HTML
  video_encoded = base64.b64encode(io.open(file_name, 'rb').read())
  return HTML(data='''<video width="{0}" height="{1}" alt="test" controls>
                        <source src="data:video/mp4;base64,{2}" type="video/mp4" />
                      </video>'''.format(width, height, video_encoded.decode('ascii')))

# Installation

## Install MMPose

In [None]:
%%shell
pip install -q tqdm
# This installation takes a long time due to the compiation of mmcv-full
pip install -q mmdet #mmcv-full

git clone https://github.com/open-mmlab/mmpose.git
cd mmpose
pip install -q -r requirements.txt
python setup.py -q develop

[?25l[K     |▊                               | 10kB 29.8MB/s eta 0:00:01[K     |█▍                              | 20kB 34.5MB/s eta 0:00:01[K     |██▏                             | 30kB 17.2MB/s eta 0:00:01[K     |██▉                             | 40kB 12.1MB/s eta 0:00:01[K     |███▌                            | 51kB 7.4MB/s eta 0:00:01[K     |████▎                           | 61kB 8.7MB/s eta 0:00:01[K     |█████                           | 71kB 8.6MB/s eta 0:00:01[K     |█████▋                          | 81kB 9.1MB/s eta 0:00:01[K     |██████▍                         | 92kB 8.6MB/s eta 0:00:01[K     |███████                         | 102kB 8.9MB/s eta 0:00:01[K     |███████▊                        | 112kB 8.9MB/s eta 0:00:01[K     |████████▌                       | 122kB 8.9MB/s eta 0:00:01[K     |█████████▏                      | 133kB 8.9MB/s eta 0:00:01[K     |█████████▉                      | 143kB 8.9MB/s eta 0:00:01[K     |██████████▋            



# Pose estimation using HigherHRNet 48

In [None]:
%cd /content/mmpose
import os
import re
import cv2
import glob
import pickle
import os.path as osp
from tqdm.notebook import trange, tqdm

from mmpose.apis import inference_bottom_up_pose_model, init_pose_model, vis_pose_result

def frame_iter(capture, description = ""):
  def _iterator():
    while capture.grab():
      yield capture.retrieve()

  return tqdm(
    _iterator(),
    desc=description,
    total=int(capture.get(cv2.CAP_PROP_FRAME_COUNT)),
    leave=False,
  )

def inference_pose_estimation_model(video_path, 
                            return_heatmap = False, 
                            save_out_video = True, 
                            out_video_root = '/content/video_results'):
  # build the pose model from a config file and a checkpoint file
  pose_model = init_pose_model(
      '/content/mmpose/configs/bottom_up/higherhrnet/coco/higher_hrnet48_coco_512x512.py', # model configuration
      'https://download.openmmlab.com/mmpose/bottom_up/higher_hrnet48_coco_512x512-60fedcbc_20200712.pth', # pretrained model
      device='cuda:0')

  dataset = pose_model.cfg.data['test']['type']
  cap = cv2.VideoCapture(video_path)

  if save_out_video:
    os.makedirs(out_video_root, exist_ok=True)
    fps = cap.get(cv2.CAP_PROP_FPS)
    size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),
            int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    videoWriter = cv2.VideoWriter(
        os.path.join(out_video_root,f'vis_{os.path.basename(video_path)}'), fourcc, fps, size)

  # e.g. use ('backbone', ) to return backbone feature
  output_layer_names = ()#('backbone', )
  results = []
  video = frame_iter(cap)
  video.set_postfix({'filename': osp.basename(video_path)})
  for flag, img in video:
    if not flag:
      break

    # inference the model
    pose_results, returned_outputs = inference_bottom_up_pose_model(
      pose_model,
      img,
      return_heatmap=return_heatmap,
      outputs=output_layer_names)
    results.append(pose_results)

    if save_out_video:
      # show the results
      vis_img = vis_pose_result(
        pose_model,
        img,
        pose_results,
        dataset=dataset,
        kpt_score_thr=0.3,
        show=False)
      
      videoWriter.write(vis_img)

  video.close()
  cap.release()
  if save_out_video:
      videoWriter.release()
  cv2.destroyAllWindows()
  
  return results

/content/mmpose


In [None]:
save_out_video = True

video_list = glob.glob('/content/Florence_3d_actions/*.avi')
video_list.sort(reverse=True)
pose_estimation_results = {}
with tqdm(total=len(video_list)) as pbar:
  for video_path in video_list:
    filename = osp.basename(video_path)
    results = inference_pose_estimation_model(video_path, save_out_video = save_out_video)
    pose_estimation_results[filename] = results
    pbar.update(1)
    if save_out_video:
      video_result = video_path.replace('Florence_3d_actions/','video_results/vis_')
      video_result1 = video_result.replace('avi','mp4')
      !ffmpeg -y -loglevel panic -i $video_result -vcodec libx264 $video_result1
      !rm $video_result

HBox(children=(FloatProgress(value=0.0, max=215.0), HTML(value='')))

Downloading: "https://download.openmmlab.com/mmpose/bottom_up/higher_hrnet48_coco_512x512-60fedcbc_20200712.pth" to /root/.cache/torch/hub/checkpoints/higher_hrnet48_coco_512x512-60fedcbc_20200712.pth


HBox(children=(FloatProgress(value=0.0, max=255950931.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=13.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=8.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=21.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=15.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=14.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=14.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=12.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=24.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=14.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=10.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=13.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=8.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=10.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=21.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=18.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=27.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=10.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=10.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=9.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=10.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=15.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=15.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=18.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=18.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=19.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=22.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=14.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=21.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=26.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=20.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=15.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=35.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=23.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=12.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=15.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=12.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=19.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=23.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=20.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=24.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=15.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=20.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=20.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=17.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=24.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=22.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=23.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=32.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=17.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=17.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=19.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=15.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=22.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=19.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=21.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=14.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=15.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=14.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=21.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=17.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=17.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=14.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=13.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=14.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=20.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=18.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=23.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=28.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=17.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=24.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=20.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=17.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=19.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=28.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=20.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=20.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=17.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=27.0), HTML(value='')))

In [None]:
print(pose_estimation_results.keys())
pickle.dump( pose_estimation_results, open( "pose_estimation_results_higherhrnet48.pkl", "wb" ) )

In [None]:
import random
video_result_list = glob.glob('/content/video_results/*.mp4')
video_result = random.choice(video_result_list)
show_local_mp4_video(video_result)

In [None]:
# Change current working directory to /content
%cd /content
!zip -q -r video_results_higherhrnet48.zip video_results

# Action Recognition


In [None]:
%%shell
git clone https://github.com/taznux/st-gcn-pytorch
cd st-gcn-pytorch
mkdir dataset models
ln -sf /content/Florence_3d_actions dataset/

## Original Florence 3D skeletion data

In [None]:
%cd st-gcn-pytorch
!python preprocess.py # skeleton conversion
!python main.py # train and test