# **Human Action Recognition with MMAction2**


# Download datasets

## Florence 3D actions dataset

The dataset collected at the University of Florence during 2012, has been captured using a Kinect camera. It includes 9 activities: wave, drink from a bottle, answer phone,clap, tight lace, sit down, stand up, read watch, bow. During acquisition, 10 subjects were asked to perform the above
actions for 2/3 times. This resulted in a total of 215 activity samples.
We suggest a leave-one-actor-out protocol: train your classifier using all the sequences from 9 out of 10 actors and test on the remaining one. Repeat this procedure for all actors and average the 10 classification accuracy values.

Actions 
1.	wave
2.	drink from a bottle
3.	answer phone
4.	clap
5.	tight lace
6.	sit down
7.	stand up
8.	read watch
9.	bow

Videos depicting the actions are named GestureRecording_Id\<ID_GESTURE\>actor\<ID_ACTOR\>idAction\<ID_ACTION\>category\<ID_CATEGORY\>.avi
The file The file Florence_dataset_Features.txt contains all the pose features with annotate actor and actions. Each line is formatted according to the following:

%idvideo idactor idcategory  f1....fn

where f1-f24 are our normalized body part coordinates
and f25 is the normalized frame value.

Specifically:  
  elbows: f1-f6; (1-3 left elbow, 4-6 right elbow, same applies for all other joints)  
  wrists: f13-f18  
  knees: f7-f12  
  ankles: f19-f24  
  normalized frame value: f25  

The file Florence_dataset_WorldCoordinates.txt
Contains the world coordinates for all the joints. Thanks to Maxime Devanne for parsing this data! Each line is formatted according to the following:

%idvideo idactor idcategory  f1....fn
where f1-f45 are world coordinates of all the 15 joints.

Specifically:  
  Head: f1-f3  
  Neck: f4-f6  
  Spine: f7-f9  
  Left Shoulder: f10-f12  
  Left Elbow: f13-f15  
  Left Wrist: f16-f18  
  Right Shoulder: f19-f21  
  Right Elbow: f22-f24  
  Right Wrist: f25-f27  
  Left Hip: f28-f30  
  Left Knee: f31-f33  
  Left Ankle: f34-f36  
  Right Hip: f37-f39  
  Right Knee: f40-f42  
  Right Ankle: f43-f45  


In [None]:
%%shell
curl https://www.micc.unifi.it/vim/wp-content/uploads/datasets/florence3d_actions.zip -o florence3d_actions.zip
unzip -o -q florence3d_actions.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  303M  100  303M    0     0  16.2M      0  0:00:18  0:00:18 --:--:-- 19.2M




## Download CCAM actions data

In [None]:
file_id = "1wUFcGp8sXcoiGSRmxL3a4sRw13I56Hk1"
file_name = "ccam_actions.zip"
!wget --load-cookies ~/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies ~/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=$file_id' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=$file_id" -O $file_name && rm -rf ~/cookies.txt
!unzip -o -q ccam_actions.zip

!mkdir ccam_actions_org
file_list = [
             ("WaveHand_cam_rosberry1_color_image_raw.mp4","1FIenejZENb9FrnxGUFjG4kI-ANmLokFu"),
             ("TightLace_cam_rosberry1_color_image_raw.mp4", "1WtqrJ2eK88jqn8ryfmSBroHHE51Qvg4X"),
             ("Stationary_cam_rosberry1_color_image_raw.mp4", "1BE1BvQOmM6-PAn8H3m4_Hv1pN8cEB-nT"),
             ("SitDownStandUp_cam_rosberry1_color_image_raw.mp4", "1nsR3VRgK_8HPLCb99149KvJih1ctZR96"),
             ("ReadWatch_cam_rosberry1_color_image_raw.mp4", "1p4kUOaJtDxrruWk7cBULQ9zEh0ndPJb8"),
             ("G230_Master_cam_rosberry1_color_image_raw.mp4", "1Kj7aUfFe5ufg9tqMUcC_j0oHSx7EHLkn"),
             ("DrinkFromBottle_cam_rosberry1_color_image_raw.mp4", "1iQlLiHjT2ewN4piBiAvGLcTwVTe1Dhtz"),
             ("Clapping_cam_rosberry1_color_image_raw.mp4", "1jkJmIAV9SfcqRPY45BEP5cb6rOac5j3f"),
             ("Bow_cam_rosberry1_color_image_raw.mp4", "1LKVDZ_FORByXGRGSp7GgSMsd_Q4ZrEWZ"),
             ("AnswerPhone_cam_rosberry1_color_image_raw.mp4","1FQhu9bRJqe7PGNniktk6TUaXGn0PjFRd"),
             ]

for file_name, file_id in file_list:
  print(file_id, file_name)
  file_name = "ccam_actions_org/"+file_name
  !wget --load-cookies ~/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies ~/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=$file_id' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=$file_id" -O $file_name && rm -rf ~/cookies.txt


/content
--2021-02-15 01:00:19--  https://docs.google.com/uc?export=download&confirm=T_ZT&id=1wUFcGp8sXcoiGSRmxL3a4sRw13I56Hk1
Resolving docs.google.com (docs.google.com)... 142.250.73.238, 2607:f8b0:4004:82a::200e
Connecting to docs.google.com (docs.google.com)|142.250.73.238|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-10-1s-docs.googleusercontent.com/docs/securesc/t977n383jgm2ijfjb06ad0tjtdjuurck/p87s7fsk7qgvb3328glq0njm59ao9e8m/1613350800000/13244527368295625576/11480402705573962086Z/1wUFcGp8sXcoiGSRmxL3a4sRw13I56Hk1?e=download [following]
--2021-02-15 01:00:19--  https://doc-10-1s-docs.googleusercontent.com/docs/securesc/t977n383jgm2ijfjb06ad0tjtdjuurck/p87s7fsk7qgvb3328glq0njm59ao9e8m/1613350800000/13244527368295625576/11480402705573962086Z/1wUFcGp8sXcoiGSRmxL3a4sRw13I56Hk1?e=download
Resolving doc-10-1s-docs.googleusercontent.com (doc-10-1s-docs.googleusercontent.com)... 142.250.73.225, 2607:f8b0:4004:82a::2001
Connecting

In [None]:
%%shell
curl https://gist.githubusercontent.com/willprice/f19da185c9c5f32847134b87c1960769/raw/9dc94028ecced572f302225c49fcdee2f3d748d8/kinetics_400_labels.csv  -o /content/kinetics_400_labels.csv
curl https://gist.githubusercontent.com/willprice/f19da185c9c5f32847134b87c1960769/raw/9dc94028ecced572f302225c49fcdee2f3d748d8/kinetics_600_labels.csv  -o /content/kinetics_600_labels.csv
curl https://gist.githubusercontent.com/willprice/f19da185c9c5f32847134b87c1960769/raw/9dc94028ecced572f302225c49fcdee2f3d748d8/kinetics_700_labels.csv  -o /content/kinetics_700_labels.csv
for file in /content/*.csv; do sed '1d' $file > $file.txt; done

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  7454  100  7454    0     0  87694      0 --:--:-- --:--:-- --:--:-- 87694
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11507  100 11507    0     0   114k      0 --:--:-- --:--:-- --:--:--  113k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13470  100 13470    0     0   154k      0 --:--:-- --:--:-- --:--:--  154k




# Functions

In [None]:
import cv2
from tqdm.notebook import trange, tqdm

def frame_iter(capture, description = ""):
  def _iterator():
    while capture.grab():
      yield capture.retrieve()

  return tqdm(
    _iterator(),
    desc=description,
    total=int(capture.get(cv2.CAP_PROP_FRAME_COUNT)),
    leave=False,
  )


def process_mmdet_results(mmdet_results, cat_id=0):
  """Process mmdet results, and return a list of bboxes.

  :param mmdet_results:
  :param cat_id: category id (default: 0 for human)
  :return: a list of detected bounding boxes
  """
  if isinstance(mmdet_results, tuple):
      det_results = mmdet_results[0]
  else:
      det_results = mmdet_results
  return det_results[cat_id]


# visualization
def show_local_mp4_video(file_name, width=640, height=480):
  import io
  import base64
  from IPython.display import HTML
  video_encoded = base64.b64encode(io.open(file_name, 'rb').read())
  return HTML(data='''<video width="{0}" height="{1}" alt="test" controls>
                        <source src="data:video/mp4;base64,{2}" type="video/mp4" />
                      </video>'''.format(width, height, video_encoded.decode('ascii')))

# Installation

## Install MMAction

In [None]:
%%shell
pip install -q tqdm
# This installation takes a long time due to the compiation of mmcv-full
pip install -q mmdet mmcv mmaction2 decord

git clone https://github.com/open-mmlab/mmaction2.git

[K     |████████████████████████████████| 542kB 18.2MB/s 
[K     |████████████████████████████████| 235kB 63.4MB/s 
[K     |████████████████████████████████| 235kB 55.9MB/s 
[K     |████████████████████████████████| 14.1MB 253kB/s 
[K     |████████████████████████████████| 194kB 62.6MB/s 
[?25h  Building wheel for mmcv (setup.py) ... [?25l[?25hdone
  Building wheel for mmpycocotools (setup.py) ... [?25l[?25hdone
  Building wheel for terminaltables (setup.py) ... [?25l[?25hdone
Cloning into 'mmaction2'...
remote: Enumerating objects: 69, done.[K
remote: Counting objects: 100% (69/69), done.[K
remote: Compressing objects: 100% (66/66), done.[K
remote: Total 9201 (delta 13), reused 14 (delta 0), pack-reused 9132[K
Receiving objects: 100% (9201/9201), 35.32 MiB | 41.81 MiB/s, done.
Resolving deltas: 100% (6489/6489), done.




In [None]:
%%shell
apt install ffmpeg imagemagick

export FFMPEG_BINARY='/usr/bin/ffmpeg'
export IMAGEMAGICK_BINARY='/usr/bin/convert'

sed -i '/<policy domain="path" rights="none" pattern="@\*"/d' /etc/ImageMagick-6/policy.xml

#cat /etc/ImageMagick-6/policy.xml

Reading package lists... Done
Building dependency tree       
Reading state information... Done
ffmpeg is already the newest version (7:3.4.8-0ubuntu0.2).
The following additional packages will be installed:
  fonts-droid-fallback fonts-noto-mono ghostscript gsfonts
  imagemagick-6-common imagemagick-6.q16 libcupsfilters1 libcupsimage2
  libdjvulibre-text libdjvulibre21 libgs9 libgs9-common libijs-0.35
  libjbig2dec0 liblqr-1-0 libmagickcore-6.q16-3 libmagickcore-6.q16-3-extra
  libmagickwand-6.q16-3 libnetpbm10 libwmf0.2-7 netpbm poppler-data
Suggested packages:
  fonts-noto ghostscript-x imagemagick-doc autotrace cups-bsd | lpr | lprng
  enscript gimp gnuplot grads hp2xx html2ps libwmf-bin mplayer povray radiance
  sane-utils texlive-base-bin transfig ufraw-batch inkscape libjxr-tools
  libwmf0.2-7-gtk poppler-utils fonts-japanese-mincho | fonts-ipafont-mincho
  fonts-japanese-gothic | fonts-ipafont-gothic fonts-arphic-ukai
  fonts-arphic-uming fonts-nanum
The following NEW packages 



In [None]:
%cd mmaction2/
import torch
from mmaction.apis import init_recognizer, inference_recognizer

config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'
device = 'cuda:0' # or 'cpu'
device = torch.device(device)

model = init_recognizer(config_file, device=device)
# inference the demo video
inference_recognizer(model, 'demo/demo.mp4', 'demo/label_map_k400.txt')

/content/mmaction2


[('tapping guitar', 0.71775305),
 ('riding or walking with horse', 0.1292942),
 ('high jump', 0.10205663),
 ('waiting in line', 0.022162875),
 ('archery', 0.012408177)]

# Action Recognition


In [None]:
import os
import os.path as osp
import glob
from moviepy.editor import (CompositeVideoClip, ImageSequenceClip,
                                    TextClip, VideoFileClip)

label_map = "demo/label_map_k400.txt"

# config = "configs/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb.py"
# # config = "configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py"
# checkpoint = "https://download.openmmlab.com/mmaction/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb/i3d_r50_video_32x2x1_100e_kinetics400_rgb_20200826-e31c6f52.pth"


# config = "configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py"
# config = "configs/recognition/r2plus1d/r2plus1d_r34_video_inference_8x8x1_180e_kinetics400_rgb.py"
# checkpoint = "https://download.openmmlab.com/mmaction/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb_20200826-ab35a529.pth"


# config = "configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py"
# config = "configs/recognition/slowonly/slowonly_r50_video_inference_4x16x1_256e_kinetics400_rgb.py"
# checkpoint = "https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth"


config = "configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py"
config = "configs/recognition/slowfast/slowfast_r50_video_inference_4x16x1_256e_kinetics400_rgb.py"
checkpoint = "https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb/slowfast_r50_video_4x16x1_256e_kinetics400_rgb_20200826-f85b90c5.pth"


# config = "configs/recognition/tsm/tsm_r50_video_1x1x8_50e_kinetics400_rgb.py"
# config = "configs/recognition/tsm/tsm_r50_video_inference_1x1x8_100e_kinetics400_rgb.py"
# checkpoint = "https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_1x1x8_100e_kinetics400_rgb_20200702-a77f4328.pth"



# config = "configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py"
# checkpoint = "https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth"

# config = "configs/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb.py"
# checkpoint = "https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb_20201014-5ae1ee79.pth"

# config = "configs/recognition/tsn/tsn_r50_video_imgaug_1x1x8_100e_kinetics400_rgb.py"
# config = "configs/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb.py"
# checkpoint = "https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb_20200703-0f19175f.pth"

# config = "configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py" 
# checkpoint = "https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb/tsn_r50_video_1x1x8_100e_kinetics400_rgb_20200702-568cde33.pth"

# label_map = "/content/kinetics_600_labels.csv.txt"
# config = "configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb.py"
# checkpoint = "https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics600_rgb/tsn_r50_video_1x1x8_100e_kinetics600_rgb_20201015-4db3c461.pth"

# label_map = "/content/kinetics_700_labels.csv.txt"
# config = "configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb.py"
# checkpoint = "https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics700_rgb/tsn_r50_video_1x1x8_100e_kinetics700_rgb_20201015-e381a6c7.pth"






device = torch.device('cuda:0')
# build the recognizer from a config file and checkpoint file/url
model = init_recognizer(
    config,
    checkpoint,
    device=device,
    use_frames=False)

# e.g. use ('backbone', ) to return backbone feature
output_layer_names = None

video_list = glob.glob('/content/ccam_actions/*.mp4')
for i, video_path in enumerate(video_list):
  print(i+1, video_path)
  results = inference_recognizer(model, video_path, label_map, use_frames=False)

  print(' The top-5 labels with corresponding scores are:')
  for result in results:
    print(f'    {result[0]:30}: {result[1]:0.2}')
  print()

# write results

# out_filename = '/content/test.mp4'

# target_resolution = (None, None)
# resize_algorithm='bicubic'
# fps=30
# font_size=20
# font_color='white'
# resize_algorithm='bicubic'
# video_clips = VideoFileClip(
#     video_path,
#     target_resolution=target_resolution,
#     resize_algorithm=resize_algorithm)

# duration_video_clip = video_clips.duration
# text_clips = TextClip(label, fontsize=font_size, color=font_color)
# text_clips = (
#     text_clips.set_position(
#         ('right', 'bottom'),
#         relative=True).set_duration(duration_video_clip))

# video_clips = CompositeVideoClip([video_clips, text_clips])
# video_clips.write_videofile(out_filename, remove_temp=True)

Downloading: "https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb/slowfast_r50_video_4x16x1_256e_kinetics400_rgb_20200826-f85b90c5.pth" to /root/.cache/torch/hub/checkpoints/slowfast_r50_video_4x16x1_256e_kinetics400_rgb_20200826-f85b90c5.pth


HBox(children=(FloatProgress(value=0.0, max=138274276.0), HTML(value='')))


1 /content/ccam_actions/AnswerPhone_cam_rosberry3_color_image_raw_act3.mp4
 The top-5 labels with corresponding scores are:
    archery                       : 0.19
    brush painting                : 0.062
    building cabinet              : 0.062
    trimming or shaving beard     : 0.05
    playing harmonica             : 0.031

2 /content/ccam_actions/AnswerPhone_cam_rosberry2_color_image_raw_comingin.mp4
 The top-5 labels with corresponding scores are:
    building cabinet              : 0.26
    pull ups                      : 0.031
    snatch weight lifting         : 0.024
    squat                         : 0.024
    changing wheel                : 0.022

3 /content/ccam_actions/AnswerPhone_cam_rosberry2_color_image_raw_act2.mp4
 The top-5 labels with corresponding scores are:
    building cabinet              : 0.29
    spray painting                : 0.091
    checking tires                : 0.04
    pumping gas                   : 0.033
    exercising arm                : 0.