# Using mediapipe for motion tracking

In this module, we learn how to run mediapipe to automatically track movements in videos. <br>
This notebook is based on https://github.com/WimPouw/envisionBOX_modulesWP/blob/main/MultimodalMerging/Masking_Mediapiping.ipynb.

***

## Overview of the script
The script performs following processes in the listed order:

1. Import packages and define path for input and output folders
1. Check how many video files are in the input folder
1. Load mediapipe module and define landmarks (which part of the body do we want track?)
1. Make empty objects (similar to dataframes) with column names for face, body, and hands separetely <-- later we will add data to them in step 6-1
1. Make a folder for each video
1. Perform mediapipe frame-by-frame for each video file
    1. For each frame, add coordinates of each landmarks as a row to the face, body, and hands objects
    1. Once processed all the frames in the video, save the face, body, and hands objects as csv files in the output folder
    1. Move onto the next video file (if any)


### Import packages and define path for input and output folders
Let's first import required packages.

<font color = "mandarin">If you haven't installed the packages, follow the steps below</font>

1. Open terminal/anaconda prompt at the folder in which you store this notebook
    - Mac:
        1. go to the "mediapipe" folder
        1. right-click the "mediapipe" folder
        1. click "open terminal at this folder" <br><br>
    - Windows:
        1. go to the "mediapipe" folder
        1. copy the path to the folder
        1. open Anaconda Prompt
        1. type cd and paste the path after a space (e.g., cd D:/users/shoakamine/mld_study_group/Python/mediapipe)
        1. type D:
1. Activate your conda environment (e.g., conda activate mld_study_group)
1. Run this code: pip install -r requirements.txt

In [2]:
#load in required packages
import mediapipe as mp #mediapipe
import cv2 #opencv
import math #basic operations
import numpy as np #basic operations
import pandas as pd #data wrangling
import csv #csv saving
import os #some basic functions for inspecting folder structure etc.
import time as tm #for timing the processing time
from pathlib import Path #for setting paths
from os import listdir
from os.path import isfile, join, exists
from tqdm import tqdm #for progress bars


### define folders
cwd = Path(os.getcwd())
root_f = cwd.parent.parent.parent.parent.absolute() #get the path to three folders above the current folder (root folder)
media_f = os.path.join(root_f, "05_data", "01_interaction", "02_media")
# list up all the folders that doesn't have "_" and "p" in the folder name
media_folders = [f for f in os.listdir(media_f) if "_" not in f and "p" not in f]
media_folders = media_folders[:-1] #remove the last folder
print(media_folders)

input_flipped = "flipped_videos/"
outputf_video = "output_videos/"
outputf_ts = "output_timeseries/"
outputf_video_flipped = "output_videos_flipped/"
outputf_ts_flipped = "output_timeseries_flipped/"

['001', '002', '003', '005', '006', '007', '008', '009', '010', '011', '012', '013', '015', '016', '017', '018', '019', '020', '021', '022', '023', '024', '025', '026', '027', '028', '029', '030', '031', '032', '033', '034', '035', '036', '037', '038', '039', '040', '041', '042', '043', '044', '045', '046', '047']


### Check number of videos files to process

In [3]:
# vfiles = [f for f in listdir(inputf_video) if join(inputf_video, f).endswith(".mp4")] #loop through the filenames and collect them in a list
vfiles = []
vfilenames = []
vfiles_flipped = []
vfilenames_flipped = []

# iterate over files in the videos folder and make a list of the video files to be processed
for foldername in media_folders:
    processed_folder = os.path.join(media_f, foldername, "processed")

    if any(fn.startswith(foldername) for fn in os.listdir(outputf_ts)):
        continue # skip if the file already exists

    for filename in os.listdir(processed_folder):
        if filename.endswith(".mp4"):
            vfiles.append(os.path.join(processed_folder, filename))
            vfilenames.append(filename)

### for flipped videos
for filename in os.listdir(input_flipped):
    if filename.endswith(".mp4") and not filename.split(".mp4")[0] in os.listdir(outputf_ts_flipped):
        vfiles_flipped.append(os.path.join(input_flipped, filename))
        vfilenames_flipped.append(filename)

#check videos to be processed
print("The following folder is set as the output folder where all the pose time series are stored")
print(os.path.abspath(outputf_ts))
print("\n The following folder is set as the output folder for saving the masked videos ")
print(os.path.abspath(outputf_video))
print(f"\n The following {len(vfilenames)} videos will be processed: ")
print(vfilenames)
print(f"\n The following {len(vfilenames_flipped)} flipped videos will be processed: ")
print(vfilenames_flipped)

The following folder is set as the output folder where all the pose time series are stored
p:\workspaces\mld-akamine\working_data\01_fribble_zoom_experiment\08_analysis\kinematics\data\mediapipe\output_timeseries

 The following folder is set as the output folder for saving the masked videos 
p:\workspaces\mld-akamine\working_data\01_fribble_zoom_experiment\08_analysis\kinematics\data\mediapipe\output_videos

 The following 46 videos will be processed: 
['001_a.mp4', '001_b.mp4', '002_a.mp4', '002_b.mp4', '003_a.mp4', '003_b.mp4', '005_a.mp4', '005_b.mp4', '006_a.mp4', '006_b.mp4', '007_a.mp4', '007_b.mp4', '008_a.mp4', '008_b.mp4', '009_a.mp4', '009_b.mp4', '010_a.mp4', '010_b.mp4', '011_a.mp4', '011_b.mp4', '012_a.mp4', '012_b.mp4', '013_a.mp4', '013_b.mp4', '015_a.mp4', '015_b.mp4', '016_a.mp4', '016_b.mp4', '017_a.mp4', '017_b.mp4', '018_a.mp4', '018_b.mp4', '019_a.mp4', '019_b.mp4', '020_a.mp4', '020_b.mp4', '021_a.mp4', '021_b.mp4', '022_a.mp4', '022_b.mp4', '023_a.mp4', '023_b.m

### Initialize mediapipe modules

In [4]:
### initialize modules and functions=======================

#load in mediapipe modules
mp_holistic = mp.solutions.holistic
# Import drawing_utils and drawing_styles.
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles


### Define landmarks=====================================
#landmarks 33x that are used by Mediapipe (Blazepose)
markersbody = ['NOSE', 'LEFT_EYE_INNER', 'LEFT_EYE', 'LEFT_EYE_OUTER', 'RIGHT_EYE_INNER', 
                'RIGHT_EYE', 'RIGHT_EYE_OUTER', 'LEFT_EAR', 'RIGHT_EAR', 'MOUTH_LEFT', 
                'MOUTH_RIGHT', 'LEFT_SHOULDER', 'RIGHT_SHOULDER', 'LEFT_ELBOW', 'RIGHT_ELBOW', 
                'LEFT_WRIST', 'RIGHT_WRIST', 'LEFT_PINKY', 'RIGHT_PINKY', 'LEFT_INDEX', 
                'RIGHT_INDEX', 'LEFT_THUMB', 'RIGHT_THUMB', 'LEFT_HIP', 'RIGHT_HIP', 
                'LEFT_KNEE', 'RIGHT_KNEE', 'LEFT_ANKLE', 'RIGHT_ANKLE', 'LEFT_HEEL', 
                'RIGHT_HEEL', 'LEFT_FOOT_INDEX', 'RIGHT_FOOT_INDEX']

markershands = ['LEFT_WRIST', 'LEFT_THUMB_CMC', 'LEFT_THUMB_MCP', 'LEFT_THUMB_IP', 'LEFT_THUMB_TIP', 'LEFT_INDEX_FINGER_MCP',
              'LEFT_INDEX_FINGER_PIP', 'LEFT_INDEX_FINGER_DIP', 'LEFT_INDEX_FINGER_TIP', 'LEFT_MIDDLE_FINGER_MCP', 
               'LEFT_MIDDLE_FINGER_PIP', 'LEFT_MIDDLE_FINGER_DIP', 'LEFT_MIDDLE_FINGER_TIP', 'LEFT_RING_FINGER_MCP', 
               'LEFT_RING_FINGER_PIP', 'LEFT_RING_FINGER_DIP', 'LEFT_RING_FINGER_TIP', 'LEFT_PINKY_FINGER_MCP', 
               'LEFT_PINKY_FINGER_PIP', 'LEFT_PINKY_FINGER_DIP', 'LEFT_PINKY_FINGER_TIP',
              'RIGHT_WRIST', 'RIGHT_THUMB_CMC', 'RIGHT_THUMB_MCP', 'RIGHT_THUMB_IP', 'RIGHT_THUMB_TIP', 'RIGHT_INDEX_FINGER_MCP',
              'RIGHT_INDEX_FINGER_PIP', 'RIGHT_INDEX_FINGER_DIP', 'RIGHT_INDEX_FINGER_TIP', 'RIGHT_MIDDLE_FINGER_MCP', 
               'RIGHT_MIDDLE_FINGER_PIP', 'RIGHT_MIDDLE_FINGER_DIP', 'RIGHT_MIDDLE_FINGER_TIP', 'RIGHT_RING_FINGER_MCP', 
               'RIGHT_RING_FINGER_PIP', 'RIGHT_RING_FINGER_DIP', 'RIGHT_RING_FINGER_TIP', 'RIGHT_PINKY_FINGER_MCP', 
               'RIGHT_PINKY_FINGER_PIP', 'RIGHT_PINKY_FINGER_DIP', 'RIGHT_PINKY_FINGER_TIP']
facemarks = [str(x) for x in range(478)] #there are 478 points for the face mesh (see google holistic face mesh info for landmarks)

print("Note that we have the following number of pose keypoints for markers body")
print(len(markersbody))

print("\n Note that we have the following number of pose keypoints for markers hands")
print(len(markershands))

print("\n Note that we have the following number of pose keypoints for markers face")
print(len(facemarks ))

Note that we have the following number of pose keypoints for markers body
33

 Note that we have the following number of pose keypoints for markers hands
42

 Note that we have the following number of pose keypoints for markers face
478


### Make empty objects with column names for face, body, and hands separetely

In [5]:
### set up the column names and objects for the time series data (add time as the first variable/column)=======================
markerxyzbody = ['time']
markerxyzhands = ['time']
markerxyzface = ['time']

for mark in markersbody:
    for pos in ['X', 'Y', 'Z', 'visibility']: #for markers of the body you also have a visibility reliability score
        nm = pos + "_" + mark
        markerxyzbody.append(nm)
for mark in markershands:
    for pos in ['X', 'Y', 'Z']:
        nm = pos + "_" + mark
        markerxyzhands.append(nm)
for mark in facemarks:
    for pos in ['X', 'Y', 'Z']:
        nm = pos + "_" + mark
        markerxyzface.append(nm)

### Define functions to convert google classification object to numerical values

In [6]:
#check if there are numbers in a string
def num_there(s):
    return any(i.isdigit() for i in s)

#take some google classification object and convert it into a string
def makegoginto_str(gogobj):
    gogobj = str(gogobj).strip("[]")
    gogobj = gogobj.split("\n")
    return(gogobj[:-1]) #ignore last element as this has nothing

#make the stringifyd position traces into clean numerical values
def listpostions(newsamplemarks):
    newsamplemarks = makegoginto_str(newsamplemarks)
    tracking_p = []
    for value in newsamplemarks:
        if num_there(value):
            stripped = value.split(':', 1)[1]
            stripped = stripped.strip() #remove spaces in the string if present
            tracking_p.append(stripped) #add to this list  
    return(tracking_p)

### Let's run mediapipe

In [9]:
def run_mediapipe(vidf, output_video_folder, output_ts_folder):
    # get the last part of the path, which is the video name. Note that on Windows you might need to change the slash to a backslash.
    videoname = vidf.split("\\")[-1] if "\\" in vidf else vidf.split("/")[-1]
    outputf_ts_processed = output_ts_folder + videoname.split(".mp4")[0] + "/"

    #capture the video, and check video settings
    capture = cv2.VideoCapture(vidf) #load in the videocapture
    frameWidth = capture.get(cv2.CAP_PROP_FRAME_WIDTH) #check frame width
    frameHeight = capture.get(cv2.CAP_PROP_FRAME_HEIGHT) #check frame height
    samplerate = capture.get(cv2.CAP_PROP_FPS)   #fps = frames per second

    #make an 'empty' video file where we project the pose tracking on
    fourcc = cv2.VideoWriter_fourcc(*'MP4V') #for different video formats you could use e.g., *'XVID'
    out = cv2.VideoWriter(output_video_folder+videoname, fourcc, 
                            fps = samplerate, frameSize = (int(frameWidth), int(frameHeight)))

    # Run MediaPipe frame by frame using Holistic with `enable_segmentation=True` to get pose segmentation.
    time = 0
    tsbody = [markerxyzbody]   #these will be your time series objects, which start with collumn names initialized above
    tshands = [markerxyzhands] #these will be your time series objects, which start with collumn names initialized above
    tsface = [markerxyzface]   #these will be your time series objects, which start with collumn names initialized above
    with mp_holistic.Holistic(
            static_image_mode=False, smooth_landmarks = True, enable_segmentation=True, refine_face_landmarks=True, 
            min_tracking_confidence = 0.7, model_complexity=1) as holistic:
        while (True):
            ret, image = capture.read() #read frame
            if ret == True: #if there is a frame
                # To improve performance, optionally mark the image as not writeable to pass by reference.
                image.flags.writeable = False
                image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) #make sure the image is in RGB format
                results = holistic.process(image) #apply Mediapipe holistic processing

                # Draw landmark annotation on the image.
                image.flags.writeable = True
                image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
                if  np.all(results.segmentation_mask) != None: #check if there is a pose found
                    #now lets draw on the original_image the left and right hand landmarks, the facemesh and the body poses
                    #left hand
                    mp_drawing.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS)
                    #right hand
                    mp_drawing.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS)
                    #face
                    mp_drawing.draw_landmarks(
                        image,
                        results.face_landmarks,
                        mp_holistic.FACEMESH_TESSELATION,
                        landmark_drawing_spec=None,
                        connection_drawing_spec=mp_drawing_styles
                        .get_default_face_mesh_tesselation_style())
                    #body
                    mp_drawing.draw_landmarks(
                        image,
                        results.pose_landmarks,
                        mp_holistic.POSE_CONNECTIONS,
                        landmark_drawing_spec=mp_drawing_styles.
                        get_default_pose_landmarks_style())
                    
                    #######################now save everything to a time series
                    #make a variable list with x, y, z, info where data is appended to
                    samplebody = listpostions(results.pose_landmarks)
                    sampleface = listpostions(results.face_landmarks)
                    sampleLH = listpostions(results.left_hand_landmarks)
                    sampleRH = listpostions(results.right_hand_landmarks)
                    if len(sampleLH) == 0:
                        sampleLH = ['' for x in range(int(len(markerxyzhands)/2))]
                    samplehands = sampleLH + sampleRH

                    samplebody.insert(0, time)
                    samplehands.insert(0, time)
                    sampleface.insert(0, time)

                    tsbody.append(samplebody)   #append to the timeseries object
                    tshands.append(samplehands) #append to the timeseries object
                    tsface.append(sampleface)   #append to the timeseries object
                #show the video as we process (you can comment this out, if you want to run this process in the background)
                # cv2.imshow("resizedimage", image)
                out.write(image) #save the frame to the new masked video
                time = time+(1000/samplerate)#update the time variable  for the next frame
            if cv2.waitKey(1) == 27: #allow the use of ESCAPE to break the loop
                break
            if ret == False: #if there are no more frames, break the loop
                break

    #once done de-initialize all processes
    out.release()
    capture.release()
    cv2.destroyAllWindows()
    cv2.waitKey(1) #this is needed to close the window for mac users

    ####################################################### data to be written row-wise in csv file
    os.makedirs(outputf_ts_processed) #make a folder for the output
    
    # opening the csv file in 'w+' mode
    filebody = open(outputf_ts_processed + videoname[:-4]+'_body.csv', 'w+', newline ='')
    #write it
    with filebody:    
        write = csv.writer(filebody)
        write.writerows(tsbody)

    # opening the csv file in 'w+' mode
    filehands = open(outputf_ts_processed + videoname[:-4]+'_hands.csv', 'w+', newline ='')
    #write it
    with filehands:
        write = csv.writer(filehands)
        write.writerows(tshands)

    # opening the csv file in 'w+' mode
    fileface = open(outputf_ts_processed + videoname[:-4]+'_face.csv', 'w+', newline ='')
    #write it
    with fileface:    
        write = csv.writer(fileface)
        write.writerows(tsface)

In [10]:
for vidf in tqdm(vfiles[26:]):
    run_mediapipe(vidf, outputf_video, outputf_ts)

# for vidf in tqdm(vfiles_flipped):
#     run_mediapipe(vidf, outputf_video_flipped, outputf_ts_flipped)

100%|██████████| 20/20 [32:29:06<00:00, 5847.34s/it]   
