# Detection of similar tennis-shot motions using BlazePose

## Preparation beforehand

### Mounting Google Drive
Reading and writing in this program will be done on Google Drive.  
First, let's configure it to be able to access the folder on Google Drive using the following code.  
Note: This needs to be executed every time.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import os

# The location of the folder visible when accessing Google Drive
BASE_PATH = 'drive/MyDrive'

# The location of the folder to save the data
project_name = 'project'
try:
    os.makedirs(f'{BASE_PATH}/{project_name}')
except FileExistsError:
    pass

### Specifying the project
The videos you want to compare will be organized in a single project folder for management.  
Please modify the "project_name" below and give a name to your project.  
A folder will be created on Google Drive.  

In [4]:
# Creating other folders for use
try:
    os.makedirs(f'{BASE_PATH}/{project_name}/video')
    os.makedirs(f'{BASE_PATH}/{project_name}/gif')
    os.makedirs(f'{BASE_PATH}/{project_name}/splitted_gif')
    os.makedirs(f'{BASE_PATH}/{project_name}/vector')
except FileExistsError:
    pass

Project Structure  
  
We assume the following management structure for the project:  
  
＜project_name＞ (Note: name can be chosen freely)  
  |-- video ・・・location for original videos  
  |-- gif ・・・videos converted to GIF (as mentioned later)  
  |-- splitted_gif ・・・GIFs divided into smaller parts (as mentioned later)  
  |-- vector ・・・vector files containing skeletal information for each splitted GIF. The format of these files is in numpy file format.  
  
Furthermore, you are free to create multiple projects as needed.

### Uploading Videos  
To add videos to the project, please upload them under the "video" folder located in the "project_name" folder on Google Drive.

In [26]:
# Determining the frame rate
fps=15

### Converting to GIF (not required if you don't upload new videos)  
Since video files have large file sizes, we will convert them to GIF files.  
__For each individual file you have uploaded__, please execute the following command, replacing the file name accordingly, to convert them to GIF data:

In [None]:
# !ffmpeg -i drive/MyDrive/＜project_name＞/video/＜video file name＞ -r ＜frame rate＞ drive/MyDrive/＜project_name＞/gif/＜video file name＞.gif
!ffmpeg -i drive/MyDrive/＜project_name＞/video/サーブ1.mov -r 15 drive/MyDrive/＜project_name＞/gif/サーブ1.gif

ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lib

## Execution of the codes

### Installing Related Libraries  
You will need to install the necessary libraries from an external source.  
Since the installed libraries will be lost, you will need to execute this installation each time you start Colab.

In [7]:
!pip install mediapipe
!pip install tensorflow
!pip install tensorflow_hub
!apt-get install -y libgl1-mesa-dev
!pip install tslearn
!pip install moviepy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libgles-dev libgles1 libglvnd-dev libopengl-dev
The following NEW packages will be installed:
  libgl1-mesa-dev libgles-dev libgles1 libglvnd-dev libopengl-dev
0 upgraded, 5 newly installed, 0 to remove and 24 not upgraded.
Need to get 79.9 kB of archives.
After this operation, 954 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libgles1 amd64 1.3.2-1~ubuntu0.20.04.2 [10.3 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libgles-dev amd64 1.3.2-1~ubuntu0.20.04.2 [47.9 kB

### Importing Related Libraries

In [8]:
import cv2
import mediapipe as mp
import numpy as np

# Preparation of BlazePose
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_pose = mp.solutions.pose

### Preparing Functions
This section define the functions to be used.

Here is a class for storing information:
* Landmark class: A class for storing the coordinate information of the extracted skeleton.

In [9]:
class Landmark:
    ind:int = 0 # The skeleton number
    x:float = 0.0 # x-coordinate
    y:float = 0.0 # y-coordinate
    z:float = 0.0 # z-coordinate
    visibility:float = 0.0 # confidence of BlazePose output
    
    def __init__(self, ind, x, y, z, visibility):
        self.x = x
        self.y = y
        self.z = z
        self.ind = ind
        self.visibility = visibility
    
    def get_xyz(self):
        return [self.x, self.y, self.z]

Here are the functions for converting skeleton information:  
* rotation2D: Takes an angle and XY coordinates as input and rotates the XY coordinates around the origin.  
* radian2D: Calculates the angle from two vectors.  
* get_standard_radian: Calculates the angle of the reference vector using the above two functions.  
* landmark_conversion: Converts the coordinates of all landmarks based on the reference angle.  
* get_std_landmark: Retrieves information about the reference vector and performs all the above processes.

In [10]:
import math
# 2D rotation based on the angle
def rotation2D(radians, x, y):
    sin = np.sin(np.radians(radians))
    cos = np.cos(np.radians(radians))
    new_x = cos * x - sin * y
    new_y = sin * x + cos * y
    return new_x, new_y

# Calculating the angle
def radian2D(x, y):
    tan = y/x
    atan = np.arctan(tan)*180/math.pi
    return atan

def get_standard_radian(standard_vector):
    # xy-plane
    radian_xy = radian2D(standard_vector[0], standard_vector[1])
    new_x, new_y = rotation2D(-1.0*radian_xy, standard_vector[0], standard_vector[1])
    standard_vector[0] = new_x
    standard_vector[1] = new_y
    # # yz-plane
    # radian_yz = radian2D(standard_vector[1], standard_vector[2])
    # new_y, new_z = rotation2D(-1.0*radian_yz, standard_vector[1], standard_vector[2])
    # standard_vector[1] = new_y
    # standard_vector[2] = new_z
    # zx-plane
    radian_zx = radian2D(standard_vector[2], standard_vector[0])
    new_z, new_x = rotation2D(-1.0*radian_zx, standard_vector[2], standard_vector[0])
    standard_vector[2] = new_z
    standard_vector[0] = new_x
    
    return radian_xy, radian_zx

def landmark_conversion(landmark_list, anchor_vector, standard_vector):
    radian_xy, radian_zx = get_standard_radian(standard_vector)
    for i, landmark in enumerate(landmark_list):
        # Translation
        landmark.x -= anchor_vector[0]
        landmark.y -= anchor_vector[1]
        landmark.z -= anchor_vector[2]
        # Rotation
        # xy-plane
        new_x, new_y = rotation2D(-1.0*radian_xy, landmark.x, landmark.y)
        landmark.x = new_x
        landmark.y = new_y
        # zx-plane
        new_z, new_x = rotation2D(-1.0*radian_zx, landmark.z, landmark.x)
        landmark.z = new_z
        landmark.x = new_x
        
        landmark_list[i] = landmark
    return landmark_list
        

# Setting both shoulders as reference vectors (12: right shoulder, 11: left shoulder)
def get_std_landmark(landmark_list, anchor=12, target=11):
    anchor_vector = np.array([landmark_list[anchor].x,landmark_list[anchor].y,landmark_list[anchor].z])
    target_vector = np.array([landmark_list[target].x,landmark_list[target].y,landmark_list[target].z])
    standard_vector = target_vector - anchor_vector
    landmark_list = landmark_conversion(landmark_list, anchor_vector, standard_vector)
    return landmark_list

Here are the functions for reading and writing videos:

In [11]:
from PIL import Image
import imageio

# loading gif
def vread(path, T):
    cap = cv2.VideoCapture(path)
    gif = [cap.read()[1][...,::-1] for i in range(T)]
    gif = np.array(gif)
    cap.release()
    return gif

# writing gif
def make_gif(frames, filename, duration=1./60.):
    imageio.mimsave(filename, frames, 'GIF', **{'duration': duration})

The following is a function to extract skeleton information from an image:  
If you are dealing with a video (gif), you can treat it as multiple images. Read each frame one by one and process it using this function.

In [12]:
def blazepose_extractlandmark(IMAGE_FILES):
    num_frames, y, x, _ = IMAGE_FILES.shape
    landmarks = []
    # Executing skeleton detection
    with mp_pose.Pose(
        static_image_mode=True,
        model_complexity=2,
        enable_segmentation=False,
        min_detection_confidence=0.2,
        min_tracking_confidence=0.0
        ) as pose:
        for idx in range(num_frames):
            image = IMAGE_FILES[idx]
            image_height, image_width, _ = image.shape
            # Retrieving skeleton information
            results = pose.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
            if not results.pose_landmarks:
                # Skip if there is no person detected
                print('skip')
                continue
            landmark_list = []
            for ind, pose_landmarks in enumerate(results.pose_landmarks.landmark):
                landmark = Landmark(ind, pose_landmarks.x, pose_landmarks.y, pose_landmarks.z, pose_landmarks.visibility)
                landmark_list.append(landmark)
            # Conversion of XYZ coordinates
            converted_landmark_list = get_std_landmark(landmark_list)
            xyz_list = []
            for landmark in converted_landmark_list:
                xyz_list.append(landmark.get_xyz())
            landmarks.append(xyz_list)
            
    return landmarks

The following is a function to organize skeleton information:

In [13]:
# Retaining information only for the specified skeleton numbers (list)
def impt_keypoint_extraction(np_landmarks):
    landmarks = []
    for ext in extract:
        landmarks.append(np_landmarks[ext])
    return np.asarray(landmarks)

### Execute functions

Convert the GIF image into skeletal information and save it under the "vector" folder.

In [14]:
# Time and stride of segmented video
time:int = 2 # seconds
stride:float = 1.0 # seconds

# The skeleton numbers to be retained.
extract = [
    0, 11, 12, 13, 14, 15, 16, 23, 24, 25, 26, 27, 28
]

# The corresponding weights for the above skeleton numbers
extract_weight = [
    0.1, 1.0, 1.0, 2.0, 2.0, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5
]

In [None]:
# Defining the videos to be compared as a list.
all_path = [f'{BASE_PATH}/{project_name}/gif/サーブ1.gif', f'{BASE_PATH}/{project_name}/gif/サーブ2.gif']

In [None]:
for image_path in all_path:
    image_name = image_path.split('/')[-1].split('.')[0]
    IMAGE_FILES = Image.open(image_path)
    IMAGE_FILES = vread(image_path, T=IMAGE_FILES.n_frames)
    for i, frame in enumerate(range(0, len(IMAGE_FILES), int(fps*stride))):
        GIF_TARGET = IMAGE_FILES[frame:frame+fps*time]
        make_gif(GIF_TARGET, f'{BASE_PATH}/{project_name}/splitted_gif/{image_name}_{frame}_{fps}_{stride}.gif', duration=1./60.)
        # Obtaining the skeleton using BlazePose.
        np_landmarks = np.array(blazepose_extractlandmark(GIF_TARGET))
        # skip if the video is short
        if len(np_landmarks) <= fps*0.5:
            print('passed')
            continue
        # transpose
        np_landmarks = np_landmarks.transpose(1, 0, 2)
        # Extracting only the necessary features.
        np_landmarks = impt_keypoint_extraction(np_landmarks)
        # Save the features
        np.save(f'{BASE_PATH}/{project_name}/vector/{image_name}_{frame}_{fps}_{stride}.npy', np_landmarks)

The file saved here will follow the naming convention "＜Original filename＞_＜frame number＞_＜FPS＞_＜offset＞.npy". In the given example, "サーブ1_105_15_1.0.npy" indicates the video starting from the 105th frame onwards (i.e., from the 7th second) of the "サーブ1" video, converted to 15 frames per second.

With the saved skeletal information of the video, it becomes possible to perform tasks such as detecting similar videos. 

#### Search for similar motions for a single video.

In [23]:
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.utils import to_time_series_dataset
from tslearn import metrics
import unicodedata
import glob

In [28]:
# The file name to search for
search_file = 'サーブ1_120_15_1.0.gif'
# The similarity threshold (lower values indicate stricter criteria)
threshold = 12.0

search_name = '.'.join(search_file.split('.')[:-1])
search_vector = f'{BASE_PATH}/{project_name}/vector/{search_name}.npy'
all_files = glob.glob(f'{BASE_PATH}/{project_name}/vector/*')

print(f'Searching {search_name}...')

landmark = np.load(search_vector)
results = []
similar_found_flag = False
for i in range(len(all_files)):
    vector_cmp = all_files[i]
    if unicodedata.normalize("NFKC", all_files[i]) == unicodedata.normalize("NFKC", search_vector):
        continue
    landmark_cmp = np.load(vector_cmp)
    dtw_list = []
    for k, ext in enumerate(extract):
        scaler = TimeSeriesScalerMeanVariance(mu=0., std=1.)
        t = scaler.fit_transform(np.nan_to_num(to_time_series_dataset([landmark[k], landmark_cmp[k]])))
        dtw_list.append(metrics.dtw(t[0], t[1]))
    for k, (dtw, weight) in enumerate(zip(dtw_list, extract_weight)):
        dtw_list[k] = dtw * weight
    dtw = np.mean(dtw_list)
    result_dict = {}
    result_dict['filename'] = vector_cmp
    result_dict['mean'] = dtw
    result_dict['all'] = dtw_list
    results.append(result_dict)
    if (dtw < threshold):
        print(f'　{all_files[i].split("/")[-1]} has been identified as a similar video.')
        print(f'　- score: {dtw}')
        similar_found_flag = True
if similar_found_flag != True:
  print('　No match found.')

Searching サーブ1_120_15_1.0...
　サーブ1_30_15_1.0.npy has been identified as a similar video.
　- score: 11.915247283203133


If you want to retrieve them in order of similarity, you can execute the following code:

In [30]:
print(f'Searching {search_name}...')

sorted_results = sorted(results, key=lambda x: x['mean'])
# Number of videos you want to search
analysis_num = 4
# Number of outputs for positions
rank = 3
# Correspondence table of skelton numbers and names
keypoint_dict = {
    0: '鼻',
    11: '右肩', 
    12: '左肩', 
    13: '右肘', 
    14: '左肘', 
    15: '右手首', 
    16: '左手首', 
    23: '右腰', 
    24: '左腰', 
    25: '右膝', 
    26: '左膝', 
    27: '右足首',
    28: '左足首'
}

for i, res in enumerate(sorted_results[:analysis_num]):
    print(f'{i+1}th similar video：{res["filename"].split("/")[-1]}')
    print(f'　・total score: {res["mean"]}')
    score = [x / y for (x, y) in zip(res["all"], extract_weight)]
    indexes = np.argsort(-np.array(score))
    for r in range(rank):
        index = indexes[r]
        keypoint_num = extract[index]
        keypoint_name = keypoint_dict[keypoint_num]
        print(f'　　- {r+1}th similar position: {keypoint_name}　　score: {score[index]}')

サーブ1_120_15_1.0を検索中・・・
1th similar video：サーブ1_30_15_1.0.npy
　・total score: 11.915247283203133
　　- 1th similar position: 左腰　　スコア: 11.420595006365607
　　- 2th similar position: 右足首　　スコア: 10.93372653483073
　　- 3th similar position: 左膝　　スコア: 10.762509288871968
2th similar video：サーブ2_15_15_1.0.npy
　・total score: 12.014208048458661
　　- 1th similar position: 右手首　　スコア: 10.864235574887994
　　- 2th similar position: 右肘　　スコア: 10.826748392252686
　　- 3th similar position: 左手首　　スコア: 10.405362154237762
3th similar video：サーブ2_30_15_1.0.npy
　・total score: 12.467526102894539
　　- 1th similar position: 左手首　　スコア: 11.045189948146346
　　- 2th similar position: 左肘　　スコア: 10.839383818672479
　　- 3th similar position: 左足首　　スコア: 10.786438588116596
4th similar video：サーブ2_150_15_1.0.npy
　・total score: 12.519438227013678
　　- 1th similar position: 左肘　　スコア: 11.328751332133683
　　- 2th similar position: 右肘　　スコア: 11.175861486913474
　　- 3th similar position: 左手首　　スコア: 11.013964442779573


Additional: Output in tabular format

You can save the comparison results in CSV format.  
Note: the left shoulder becoming zero is because the trajectory is currently calculated with the left shoulder as the reference point.  
The positions and angles after detecting the skeleton are based on the left shoulder as the reference point.  

In [31]:
import pandas as pd

print(f'Searching {search_name}...')

sorted_results = sorted(results, key=lambda x: x['mean'])
# Correspondence table of skelton numbers and names
keypoint_dict = {
    0: '鼻',
    11: '右肩', 
    12: '左肩', 
    13: '右肘', 
    14: '左肘', 
    15: '右手首', 
    16: '左手首', 
    23: '右腰', 
    24: '左腰', 
    25: '右膝', 
    26: '左膝', 
    27: '右足首',
    28: '左足首'
}

# csvファイルの雛型作成
columns = [v for v in keypoint_dict.values()]
columns.append('file_name')
columns.append('similarity')
df = pd.DataFrame(columns=columns)

for i, res in enumerate(sorted_results):
    print(f'{i+1}th similar video：{res["filename"].split("/")[-1]}')
    print(f'　・total score:{res["mean"]}')
    # 重みを元に戻す
    score = [x / y for (x, y) in zip(res["all"], extract_weight)]

    score_data = {}
    for name, score in zip(keypoint_dict.values(), score):
        score_data[name] = score
    score_data['file_name'] = res["filename"].split("/")[-1]
    score_data['similarity'] = res["mean"]
    df = df.append(score_data, ignore_index=True)

Searching サーブ1_120_15_1.0...
1th similar video：サーブ1_30_15_1.0.npy
　・total score:11.915247283203133
2th similar video：サーブ2_15_15_1.0.npy
　・total score:12.014208048458661
3th similar video：サーブ2_30_15_1.0.npy
　・total score:12.467526102894539
4th similar video：サーブ2_150_15_1.0.npy
　・total score:12.519438227013678
5th similar video：サーブ2_60_15_1.0.npy
　・total score:12.568235338372405
6th similar video：サーブ2_90_15_1.0.npy
　・total score:12.618646678099246
7th similar video：サーブ1_90_15_1.0.npy
　・total score:12.653199625914905
8th similar video：サーブ1_105_15_1.0.npy
　・total score:12.657629926892051
9th similar video：サーブ2_120_15_1.0.npy
　・total score:12.658618575605137
10th similar video：サーブ1_150_15_1.0.npy
　・total score:12.685224278892294
11th similar video：サーブ2_240_15_1.0.npy
　・total score:12.716884327338487
12th similar video：サーブ1_135_15_1.0.npy
　・total score:12.772010888923289
13th similar video：サーブ1_15_15_1.0.npy
　・total score:12.780175665776731
14th similar video：サーブ2_210_15_1.0.np

  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_index=True)
  df = df.append(score_data, ignore_inde

In [None]:
# csvの保存（下記コードで{project_name}フォルダ下に保存できます）
df.to_csv(f'{BASE_PATH}/{project_name}/results.csv', index=False)

In [32]:
# 内容を表示
df

Unnamed: 0,鼻,右肩,左肩,右肘,左肘,右手首,左手首,右腰,左腰,右膝,左膝,右足首,左足首,file_name,similarity
0,10.530944,10.003211,0.0,9.404942,8.278753,9.510084,9.361551,10.705535,11.420595,10.557432,10.762509,10.933727,10.704278,サーブ1_30_15_1.0.npy,11.915247
1,9.460973,9.722023,0.0,10.826748,10.305222,10.864236,10.405362,9.500116,9.517612,9.388684,9.975478,9.519267,9.876415,サーブ2_15_15_1.0.npy,12.014208
2,10.646173,10.071616,0.0,10.355234,10.839384,10.186174,11.04519,9.828375,9.552508,10.356968,10.484279,10.75686,10.786439,サーブ2_30_15_1.0.npy,12.467526
3,10.493186,10.665916,0.0,11.175861,11.328751,10.934781,11.013964,10.019562,9.957584,9.919444,9.654678,9.916218,9.843133,サーブ2_150_15_1.0.npy,12.519438
4,10.973795,10.916623,0.0,10.77758,10.632383,10.987153,11.189973,9.969554,9.636027,9.771134,9.989706,10.561488,10.751081,サーブ2_60_15_1.0.npy,12.568235
5,10.598616,10.997832,0.0,9.422247,10.716,10.182743,11.008348,10.644884,10.60252,10.595572,10.992383,10.621867,11.223889,サーブ2_90_15_1.0.npy,12.618647
6,11.421835,11.119179,0.0,9.968349,10.038041,10.24543,11.245733,11.004838,11.175724,10.504591,10.87461,10.522607,10.907462,サーブ1_90_15_1.0.npy,12.6532
7,11.131419,9.116492,0.0,10.275647,11.045237,11.270842,11.059599,10.771642,11.146334,10.484306,10.518569,10.229593,10.47754,サーブ1_105_15_1.0.npy,12.65763
8,10.694967,10.020308,0.0,10.49225,10.903459,10.381126,10.502706,11.141283,10.586713,11.049262,10.503767,10.933999,10.24163,サーブ2_120_15_1.0.npy,12.658619
9,9.848438,11.392883,0.0,10.551944,10.209263,10.569222,10.613404,10.612303,10.962648,10.526537,10.5505,11.064394,10.513234,サーブ1_150_15_1.0.npy,12.685224
