<h1> Quantitative Measures for Gesture Space </h1>

In this module, we learn different ways to measure the space used by gestures. This notebook is based on https://envisionbox.org/embedded_Analysis_kinematic_features_module.html

<h2> Overview of the script </h2>

1. Vertical Amplitude
2. McNeillian Space
3. Volumetric Space


In [None]:
import os
import numpy as np
import pandas as pd


In [None]:
df = pd.read_csv('p:/shared/FOCUS-GROUPS/Gesture-Kinematics/gesture_space_compuation/1130_JS_body.csv')


In [None]:
df.head()


In [None]:

def convert_MP_to_OP(df_MP):
    # first we create a dictionary that maps the names in our MediaPipe output to the names in our OpenPose output
    conv_dict = {"RIGHT_WRIST":"R_Hand", "LEFT_WRIST":"L_Hand","NOSE":"Nose","RIGHT_ELBOW":"RElb","LEFT_ELBOW":"LElb","RIGHT_HIP":"RHip","LEFT_HIP":"LHip", "LEFT_EYE":"LEye","RIGHT_EYE":"REye"}
    
    OP_df = pd.DataFrame()
    
    for key in conv_dict:
        
        OP_df[conv_dict[key]] = [[row["X_"+key],row["Y_"+key],row["Z_"+key]] for _,row in df_MP.iterrows()]
    OP_df["time"] = df_MP["time"].copy()
    
    # NOTE: not all methods track the exact same keypoints. For some of our calculations we need a Neck point, and a Mid-Hip Point.
    # We need to calculate these based on others
    OP_df["Neck"] = [[np.mean([row["X_LEFT_SHOULDER"],row["Y_RIGHT_SHOULDER"]]),np.mean([row["X_LEFT_SHOULDER"],row["Y_LEFT_SHOULDER"]]),row["Z_RIGHT_SHOULDER"] ] for _, row in df_MP.iterrows()]
    OP_df["MidHip"] = [[np.mean([row["X_LEFT_HIP"],row["X_RIGHT_HIP"]]),np.mean([row["Y_LEFT_HIP"],row["Y_LEFT_HIP"]]),row["Z_RIGHT_HIP"] ] for _, row in df_MP.iterrows()]
    return OP_df

    


In [None]:
df_OP = convert_MP_to_OP(df)


In [None]:
df_OP.head()


<h2> Vertical Amplitude </h2>



This first feature, vertical amplitude, will calculate the maximum amplitude of the hands. It does this not in an image-specific way (such as pixels, meters), but in a person-specific way, giving you the max height relative to the body of the person performing the gesture.


In [None]:

def calc_vert_height(df, hand):
    # Vertical amplitude
    # H: 0 = below midline;
    #    1 = between midline and middle-upper body;
    #    2 = above middle-upper body, but below shoulders;
    #    3 = between shoulders nad middle of face;
    #    4 = between middle of face and top of head;
    #    5 = above head

    H = []
    for index, frame in df.iterrows():
        SP_mid = ((df.loc[index, "Neck"][1] - df.loc[index, "MidHip"][1]) / 2) + df.loc[index, "MidHip"][1]
        Mid_up = ((df.loc[index, "Nose"][1] - df.loc[index, "Neck"][1]) / 2) + df.loc[index, "Neck"][1]
        Eye_mid = (df.loc[index, "REye"][1] + df.loc[index, "LEye"][1] / 2)  # mean of the two eyes vert height
        Head_TP = ((df.loc[index, "Nose"][1] - Eye_mid) * 2) + df.loc[index, "Nose"][1]

        if hand == "B":
            hand_height = max([df.loc[index, "R_Hand"][1], df.loc[index, "L_Hand"][1]])
        else:
            hand_str = hand + "_Hand"
            hand_height = df.loc[index][hand_str][1]

        if hand_height > SP_mid:
            if hand_height > Mid_up:
                if hand_height > df.loc[index, "Neck"][1]:
                    if hand_height > df.loc[index, "Nose"][1] :
                        if hand_height > Head_TP:
                            H.append(5)
                        else:
                            H.append(4)
                    else:
                        H.append(3)
                else:
                    H.append(2)
            else:
                H.append(1)
        else:
            H.append(0)
    MaxHeight = max(H)
    return MaxHeight





In [None]:
max_height_R = calc_vert_height(df_OP, "B")

print("Max height for the two hands: " + str(max_height_R))


We get a value of 1, indicating that the maximum height (considering both hands) is "between midline and middle-upper body". The function gives the option of indicating whether you want to consider one hand, or both, which can be useful if you\'ve annotated the handedness of a gesture. In this case, we know that this is a two-handed gesture, so we\'ll use the "B" tag for our calculations.


<h2>McNeillian Space</h2>




What if we want to be more specific? Those familiar with gesture studies will likely have seen David McNeill\'s (1992) delineation of <i>gesture space</i>. 
<br><img src=\"./gesture_space.jpg\"></center><br>
The gesture space offers some interesting insights into the way we use visual space during a gesture. For example, is it produces primarily directly in front of us? Do we cover a lot of space around us? How \'expansive\' is the gesture? However, these can be difficult or time consuming to manually annotate, and even more so if the video angle is not straight ahead. However, we can implement this using our motion tracking keypoints.<br>
This requires several calculations, such as defining the grid seen in the image above, checking the position of the hands in relation to the grid, and calculating different features about where the hands moved, which areas they occupied, etc. 


In [None]:
import statistics

def calc_mcneillian_space(df, hand_idx):
    # this calls the define_mcneillian_grid function for each frame, then assign the hand to one space for each frame
    # output:
    # space_use - how many unique spaces were traversed
    # mcneillian_max - outer-most main space entered
    # mcneillian_mode - which main space was primarily used
    # 1 = Center-center
    # 2 = Center
    # 3 = Periphery
    # 4 = Extra-Periphery
    # subsections for periphery and extra periphery:
    # 1 = upper right
    # 2 = right
    # 3 = lower right
    # 4 = lower
    # 5 = lower left
    # 6 = left
    # 7 = upper left
    # 8 = upper
    if hand_idx == 'B':
        hands = ['L_Hand','R_Hand']
    else:
        hands = [hand_idx + '_Hand']
    # compare, at each frame, each hand to the (sub)section limits, going from inner to outer, clockwise
    for hand in hands:
        Space = []

        for frame in range(len(df)):

            cc_xmin, cc_xmax, cc_ymin, cc_ymax, c_xmin, c_xmax, c_ymin, c_ymax, p_xmin, p_xmax, p_ymin, p_ymax = \
            define_mcneillian_grid(df, frame)
            # centre-centre
            if cc_xmin < df[hand][frame][0] < cc_xmax and cc_ymin < df[hand][frame][1] < cc_ymax:
                Space.append(1)
            # centre
            elif c_xmin < df[hand][frame][0] < c_xmax and c_ymin < df[hand][frame][1] < c_ymax:
                Space.append(2)
            # periph
            elif p_xmin < df[hand][frame][0] < p_xmax and p_ymin < df[hand][frame][1] < p_ymax:
                # if it\'s in the periphery, we need to also get the subsection
                # first, is it on the right side?
                if cc_xmax < df[hand][frame][0]:
                    # if so, we narrow down the y location
                    if cc_ymax < df[hand][frame][1]:
                        Space.append(31)
                    elif cc_ymin < df[hand][frame][1]:
                        Space.append(32)
                    else:
                        Space.append(33)
                elif cc_xmin < df[hand][frame][0]:
                    if c_ymax < df[hand][frame][1]:
                        Space.append(38)
                    else:
                        Space.append(34)
                else:
                    if cc_ymax < df[hand][frame][1]:
                        Space.append(37)
                    elif cc_ymin < df[hand][frame][1]:
                        Space.append(36)
                    else:
                        Space.append(35)
            else:  # if it\'s not periphery, it has to be extra periphery. We just need to get subsections
                if c_xmax < df[hand][frame][0]:
                    if cc_ymax < df[hand][frame][1]:
                        Space.append(41)
                    elif cc_ymin < df[hand][frame][1]:
                        Space.append(42)
                    else:
                        Space.append(43)
                elif cc_xmin < df[hand][frame][0]:
                    if c_ymax < df[hand][frame][1]:
                        Space.append(48)
                    else:
                        Space.append(44)
                else:
                    if c_ymax < df[hand][frame][1]:
                        Space.append(47)
                    elif c_ymin < df[hand][frame][1]:
                        Space.append(46)
                    else:
                        Space.append(45)
        if hand == 'L_Hand':
            Space_L = Space
        else:
            Space_R = Space

    # how many spaces used?
    if hand_idx == 'L' or hand_idx == 'B':
        space_use_L = len(set(Space_L))
        if max(Space_L) > 40:
            mcneillian_maxL = 4
        elif max(Space_L) > 30:
            mcneillian_maxL = 3
        else:
            mcneillian_maxL = max(Space_L)
        # which main space was most used?
        mcneillian_modeL = get_mcneillian_mode(Space_L)
    else:
        space_use_L = 'NA'
        mcneillian_maxL = 'NA'
        mcneillian_modeL = 'NA'

    if hand_idx == 'R' or hand_idx == 'B':
        space_use_R = len(set(Space_R))
        # maximum distance (main spaces)
        if max(Space_R) > 40:
            mcneillian_maxR = 4
        elif max(Space_R) > 30:
            mcneillian_maxR = 3
        else:
            mcneillian_maxR = max(Space_R)
        # which main space was most used?
        mcneillian_modeR = get_mcneillian_mode(Space_R)
    else:
        space_use_R = 'NA'
        mcneillian_maxR = 'NA'
        mcneillian_modeR = 'NA'

    return space_use_L, space_use_R, mcneillian_maxL, mcneillian_maxR, mcneillian_modeL, mcneillian_modeR


def get_mcneillian_mode(spaces):
    mainspace = []
    for space in spaces:
        if space > 40:
            mainspace.append(4)
        elif space > 30:
            mainspace.append(3)
        else:
            mainspace.append(space)

    mcneillian_mode = statistics.mode(mainspace)
    return mcneillian_mode

def define_mcneillian_grid(df, frame):
    # define the grid based on a single frame, output xmin,xmax, ymin, ymax for each main section
    # subsections can all be found based on these boundaries
    bodycent = df['Neck'][frame][1] - (df['Neck'][frame][1] - df['MidHip'][frame][1])/2
    face_width = (df['LEye'][frame][0] - df['REye'][frame][0])*2
    body_width = df['LHip'][frame][0] - df['RHip'][frame][0]

    # define boundaries for center-center
    cc_xmin = df['RHip'][frame][0]
    cc_xmax = df['LHip'][frame][0]
    cc_len = cc_xmax - cc_xmin
    cc_ymin = bodycent - cc_len/2
    cc_ymax = bodycent + cc_len/2

    # define boundaries for center
    c_xmin = df['RHip'][frame][0] - body_width/2
    c_xmax = df['LHip'][frame][0] + body_width/2
    c_len = c_xmax - c_xmin
    c_ymin = bodycent - c_len/2
    c_ymax = bodycent + c_len/2

    # define boundaries of periphery
    p_ymax = df['LEye'][frame][1] + (df['LEye'][frame][1] - df['Nose'][frame][1])
    p_ymin = bodycent - (p_ymax - bodycent) # make the box symmetrical around the body center
    p_xmin = c_xmin - face_width
    p_xmax = c_xmax + face_width

    return  cc_xmin, cc_xmax, cc_ymin, cc_ymax, c_xmin, c_xmax, c_ymin, c_ymax, p_xmin, p_xmax, p_ymin, p_ymax




In [None]:
space_use_L, space_use_R, mcneillian_maxL, mcneillian_maxR, mcneillian_modeL, mcneillian_modeR = calc_mcneillian_space(df_OP, "B")

print("Number of spaces uses by the right hand: " + str(space_use_R))
print("Most peripheral space used by right hand: " + str(mcneillian_maxR))
print("Right hand spent most time in space number: " + str(mcneillian_modeR))


As we see, 4 different spaces were used (centre, 2 in periphery, and 1 in extra-periphery, in this case). Additionally, space 4 (extra periphery) is both the maximally peripheral space used, and where the right hand spent mos of its time.<br>



<h2>Volumetric Space</h2>


Another way to think about the space used is to calculate the volumetric space. Imagine that at the beginning of your annotation, we draw a cube (if the data is 3D, otherwise a box) around the hands, such that the hands position along the x-axis forms the outer side-limits of the cube/box, and their position on the y-axis forms the upper and lower limits. Our cube/box will therefore be quite flat at the beginning. But if we update and expand this cube/box with each frame, we can get an idea of the dynamic space that is used during a gesture.


In [None]:

def calc_volume_size(df, hand):
    # calculates the volumetric size of the gesture, ie how much visual space was utlized by the hands
    # for 3D data, this is actual volume (ie. using z-axis), for 2D this is area, using only x and y\
    # first we check if we should use one or both hands for calculating the initial boundaries
    if hand == 'B':
        x_max = max([df['R_Hand'][0][0], df['L_Hand'][0][0]])
        x_min = min([df['R_Hand'][0][0], df['L_Hand'][0][0]])
        y_max = max([df['R_Hand'][0][0], df['L_Hand'][0][1]])
        y_min = min([df['R_Hand'][0][0], df['L_Hand'][0][1]])
    else:
        hand_str = hand + '_Hand'
        x_min = df[hand_str][0][0]
        x_max = df[hand_str][0][0]
        y_min = df[hand_str][0][1]
        y_max = df[hand_str][0][1]
    # then we check if it\'s 3D or 2D data
    if len(df['R_Hand'][0]) > 2:
        if hand == 'B':
            z_max = max([df['R_Hand'][0][2], df['L_Hand'][0][2]])
            z_min = min([df['R_Hand'][0][2], df['L_Hand'][0][2]])
        else:
            z_min = df[hand_str][0][2]
            z_max = df[hand_str][0][2]
    # at each frame, compare the current min and max with the previous, to ultimately find the outer values
    if hand == 'B':
        hand_list = ['R_Hand', 'L_Hand']
    else:
        hand_list = [hand_str]

    for frame in range(1, len(df)):
        for hand_idx in hand_list:
            if df[hand_idx][frame][0] < x_min:
                x_min = df[hand_idx][frame][0]
            if df[hand_idx][frame][0] > x_max:
                x_max = df[hand_idx][frame][0]
            if df[hand_idx][frame][0] < y_min:
                y_min = df[hand_idx][frame][1]
            if df[hand_idx][frame][0] > y_max:
                y_max = df[hand_idx][frame][1]
            if len(df[hand_idx][0]) > 2:
                if df[hand_idx][frame][0] < z_min:
                    z_min = df[hand_idx][frame][2]
                if df[hand_idx][frame][0] > z_max:
                    z_max = df[hand_idx][frame][2]

    if len(df['R_Hand'][0]) > 2:
        # get range
        x_len = x_max - x_min
        y_len = y_max - y_min
        z_len = z_max - z_min
        # get volume
        vol = x_len * y_len * z_len
    else:
        x_len = x_max - x_min
        y_len = y_max - y_min
        # get area (ie volume)
        vol = x_len * y_len
    return vol




In [None]:
volume = calc_volume_size(df_OP,'B')
print('Volumetric space of the two hands: ' + str(volume) + " meters")
volume_R = calc_volume_size(df_OP,'R')
print('Volumetric space of the right hand: ' + str(volume_R) + " meters")


<h2> Working with Multiple Files </h2>


In [None]:
# !pip install pympi-ling


In [None]:
import pympi


In [None]:


path = 'p:/shared/FOCUS-GROUPS/Gesture-Kinematics/gesture_space_compuation/'

MT_files = [file for file in os.listdir(path) if file.endswith(".csv")] #Reading list of motion tracking files from the folder
eaf_files = [file for file in os.listdir(path) if file.endswith(".eaf")] #Reading list of ELAN files from the folder

results_df = pd.DataFrame(columns = ["file", "gesture", 
                            
                            "MN_mode_L","MN_mode_R", "volume_B", "volume_R", "volume_L"]) #Creating an empty dataframe to store the results

for MT_file in MT_files: #Iterating through each motion tracking file 
    print('Processing file: ' + MT_file)

    df_MT = pd.read_csv(path + MT_file)
    df_OP = convert_MP_to_OP(df_MT) #Converting the MediaPipe output to OpenPose output (only the required columns)

    eaf_file = [eaf_file for eaf_file in eaf_files if eaf_file.startswith(MT_file.split("_")[0])][0]           

    eafob = pympi.Elan.Eaf(path + eaf_file) #Reading the corresponding ELAN file

    # make sure you set the name of the tier where the annotations of interest are located
    gesture_annots = eafob.get_annotation_data_for_tier("Director_Speech")

    # then we can look through each annotation, and calculate our kinematic values
    file_list = []
    g_index_list = []
    PV_R_list = []

    mode_L_list = []
    mode_R_list = []

    volume_B_list = []
    volume_R_list = []
    volume_L_list = []

    g_index = 1
    for annot in gesture_annots:
        # this first line just takes the rows that correspond to our annotation
        g_data = df_OP[(df_OP.time >= annot[0]) & (df_OP.time <= annot[1])]
        g_data.reset_index(inplace=True)
        
        if len(g_data) > 10:
        
            
            space_use_L, space_use_R, mcneillian_maxL, mcneillian_maxR, mcneillian_modeL, mcneillian_modeR = calc_mcneillian_space(g_data, "B")

            volume_B = calc_volume_size(g_data,'B')
            volume_R = calc_volume_size(g_data,'R')
            volume_L = calc_volume_size(g_data,'L')

            # now store them all in a dataframe
            file_list.append(MT_file)
            g_index_list.append(g_index)
        
            mode_L_list.append(mcneillian_modeL)
            mode_R_list.append(mcneillian_modeR)

            volume_B_list.append(volume_B)
            volume_R_list.append(volume_R)
            volume_L_list.append(volume_L)

            g_index += 1
        
    results_df = pd.concat([results_df, pd.DataFrame(np.column_stack([file_list, g_index_list,
                                    
                                    mode_L_list,  mode_R_list, volume_B_list, volume_R_list, volume_L_list]), columns = ["file", "gesture", 
                            
                            "MN_mode_L","MN_mode_R", "volume_B", "volume_R", "volume_L"])])
                    
    


In [None]:
results_df


In [None]:
# !conda install anaconda::openpyxl #Go to terminal and run this command to install openpyxl package


In [None]:
results_df.to_excel('space_results.xlsx')
