# Gesture Kinematic Analysis: Speed, Acceleration, and Jerk

## Attribution

The custom functions for smoothing and calculating speed vectors and derivatives are sourced from the following EnvisionBox module: **Selecting, smoothing, and deriving measures from motion tracking, and merging with acoustics and annotations in Python**.

If you found this code helpful for your research, please cite the original source code material from the EnvisionBox website:

**Pouw, W. (2023).** *Selecting, smoothing, and deriving measures from motion tracking, and merging with acoustics and annotations.* [21.10.2025]. Retrieved from: https://envisionbox.org/embedded_MergingMultimodal_inPython.html

---

In this tutorial, we learn how to compute the kinematic features (speed, velocity, acceleration and jerk) for gestures using Mediapipe keypoints.

## Script Overview

 - Import necessary packages
 - Read saved Mediapipe keypoints 
 - Normalise the keypoints
 - Perform smoothing
 - Extract the speed, velocity, acceleration, and jerk
 - Visualise the kinematic measures

In [15]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

import plotly.graph_objects as go
from scipy.signal import butter, filtfilt

from mpl_toolkits.mplot3d import Axes3D
%matplotlib notebook

In [16]:
data = pd.read_csv('../Mediapipe_results/Cheeseburger_short_body.csv')
data.head()

Unnamed: 0,time,X_NOSE,Y_NOSE,Z_NOSE,visibility_NOSE,X_LEFT_EYE_INNER,Y_LEFT_EYE_INNER,Z_LEFT_EYE_INNER,visibility_LEFT_EYE_INNER,X_LEFT_EYE,...,Z_RIGHT_HEEL,visibility_RIGHT_HEEL,X_LEFT_FOOT_INDEX,Y_LEFT_FOOT_INDEX,Z_LEFT_FOOT_INDEX,visibility_LEFT_FOOT_INDEX,X_RIGHT_FOOT_INDEX,Y_RIGHT_FOOT_INDEX,Z_RIGHT_FOOT_INDEX,visibility_RIGHT_FOOT_INDEX
0,0.0,0.503632,0.170614,-0.387601,0.999965,0.51501,0.149007,-0.368177,0.999947,0.521415,...,0.512078,0.006639,0.512488,1.341411,0.087873,0.001159,0.396702,1.288286,0.352186,0.005587
1,33.366667,0.500433,0.181421,-0.547669,0.999965,0.512496,0.154659,-0.530533,0.999948,0.519399,...,0.637187,0.006476,0.513931,1.390587,0.222031,0.001126,0.401639,1.37105,0.413664,0.005456
2,66.733333,0.499375,0.182815,-0.469713,0.999967,0.511689,0.155652,-0.450981,0.99995,0.518643,...,0.585826,0.005932,0.51694,1.402249,0.224946,0.00103,0.420132,1.386711,0.365694,0.004993
3,100.1,0.49917,0.183356,-0.476485,0.999966,0.511697,0.156496,-0.456981,0.999951,0.518673,...,0.616054,0.005448,0.517169,1.417265,0.201427,0.000959,0.425954,1.394233,0.395156,0.004646
4,133.466667,0.498991,0.185082,-0.459819,0.999969,0.511738,0.157429,-0.443996,0.999955,0.518811,...,0.584259,0.00495,0.512583,1.434925,0.177233,0.000887,0.431666,1.419939,0.355257,0.004256


## Normalisation 

For more information on normalisation, please refer to the [Normalization](https://github.com/Multimodal-Language-Department-MPI-NL/Normalization) notebook.
Normalization of keypoints is important for ensuring a consistent representation of poses across different frames and individuals. Moreover, it addresses the potential variance in the actual position of a person in the image, directing the pose representation to emphasize the relative positions of body parts rather than their absolute positions.


The following code normalizes a set of keypoints representing a skeleton by centering them around the mean position between the left and right shoulders and scaling them based on the distance between these shoulders. 

In [17]:
def normalize_skeleton_landmarks(keypoints):
    left_shoulder, right_shoulder = 11, 12  # Assuming the indices for left and right shoulder in markersbody
    mid = keypoints[:, [left_shoulder, right_shoulder], :].mean(axis=1, keepdims=True)

    shoulder_length = np.linalg.norm(keypoints[:, left_shoulder, :] - keypoints[:, right_shoulder, :], ord=2, axis=1)
    normalized_keypts = (keypoints - mid) / shoulder_length[:, None, None]
    return normalized_keypts

In [18]:
markersbody = ['NOSE', 'LEFT_EYE_INNER', 'LEFT_EYE', 'LEFT_EYE_OUTER', 'RIGHT_EYE_INNER', 'RIGHT_EYE', 'RIGHT_EYE_OUTER',
          'LEFT_EAR', 'RIGHT_EAR', 'MOUTH_LEFT', 'MOUTH_RIGHT', 'LEFT_SHOULDER', 'RIGHT_SHOULDER', 'LEFT_ELBOW', 
          'RIGHT_ELBOW', 'LEFT_WRIST', 'RIGHT_WRIST', 'LEFT_PINKY', 'RIGHT_PINKY', 'LEFT_INDEX', 'RIGHT_INDEX',
          'LEFT_THUMB', 'RIGHT_THUMB', 'LEFT_HIP', 'RIGHT_HIP', 'LEFT_KNEE', 'RIGHT_KNEE', 'LEFT_ANKLE', 'RIGHT_ANKLE',
          'LEFT_HEEL', 'RIGHT_HEEL', 'LEFT_FOOT_INDEX', 'RIGHT_FOOT_INDEX']

xyz_columns = [axis + '_' + landmark  for landmark in markersbody for axis in ['X', 'Y', 'Z']]


# Select only the columns representing X, Y, Z coordinates
keypoints_columns = data[xyz_columns]


# Take the absolute value of Z coordinates
z_columns = [col for col in keypoints_columns.columns if col.startswith('Z_')]

# Reshape the data into the required format (num_frames, num_keypoints, 3)
num_keypoints = len(xyz_columns) // 3
keypoints_array = keypoints_columns.values.reshape(-1, num_keypoints, 3)

# Apply normalization function
normalized_keypoints = normalize_skeleton_landmarks(keypoints_array)

# Update the DataFrame with the normalized values
normalized_keypoints_df = pd.DataFrame(normalized_keypoints.reshape(-1, num_keypoints * 3), columns=xyz_columns)


## Data Visualisation

In [19]:
def getSkeletalModelStructure():
    # Definition of skeleton model structure:
    # The structure is an n-tuple of:
    # (index of a start point, index of an end point, index of a bone)

    return (
        # nose
        (0, 1, 0),
        (0, 4, 0),

        # eyes
        (1, 2, 1),
        (2, 3, 1),
        (4, 5, 1),
        (5, 6, 1),

        # ears
        (6, 8, 2),
        (3, 7, 2),

        # mouth
        (9, 10, 3),

        # collar bone
        (11, 12, 4),
        (12, 24, 4),
        (11, 23, 4),
        (24, 23, 4),
        (24, 26, 4),
        (23, 25, 4),
        (26, 28, 4),
        (25, 27, 4),
        (28, 30, 4),
        (30, 32, 4),
        (28, 32, 4),
        (27, 31, 4),
        (27, 29, 4),
        (29, 31, 4),

        # arms
        (12, 14, 5),
        (11, 13, 5),

        # hands
        (13, 15, 6),
        (14, 16, 6),

        # fingers
        (15, 21, 7),
        (15, 17, 7),
        (15, 19, 7),
        (17, 19, 7),
        (16, 18, 7),
        (16, 22, 7),
        (16, 20, 7),
        (18, 20, 7)
    )

In [20]:

def plot_3d_skeleton(keypoints, skeletal_model_structure, ax, color):

#     Extract x, y, z coordinates
    x, y, z = zip(*keypoints)

    # Plot keypoints
    ax.scatter(x, y, z, color='blue', label='Keypoints')
    ax.view_init(elev=-90, azim=-90)
    # Plot skeleton connections
    for connection in skeletal_model_structure:
        start_idx, end_idx, _ = connection
        start_point = keypoints[start_idx]
        end_point = keypoints[end_idx]
        xs = [start_point[0], end_point[0]]
        ys = [start_point[1], end_point[1]]
        zs = [start_point[2], end_point[2]]
        ax.plot(xs, ys, zs, color=color)

    ax.set_xlabel('X Label')
    ax.set_ylabel('Y Label')
    ax.set_zlabel('Z Label')

    plt.show()



# Assuming you have keypoints and skeletal_model_structure

skeletal_model_structure = getSkeletalModelStructure()  # Use the corrected function

fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111, projection='3d')


# Plot the 3D skeleton
plot_3d_skeleton(keypoints_array[20], skeletal_model_structure, ax, color='red')

plt.show()



<IPython.core.display.Javascript object>

In [21]:
data.update(normalized_keypoints_df)

## Smoothing

For more information on smoothing, please refer to the [Smoothing](https://github.com/Multimodal-Language-Department-MPI-NL/Smoothing) notebook.
One thing that you will run into when using motion tracking data, especially when using video based motion tracking data, is that you will have noise-related jitter in your time series. At some times such noise maybe minimal, e.g., when using very accurate device-based motion tracking devices. But in other cases, you will see that there are sudden jumps or kinks from time point to time point due to tracking inaccuracies (that can be caused by occlusions, or not ideal lighting, camara position changes, etc.).

It is good therefore to apply some smoothing to the position traces of your motion tracking data, as well as any derivatives that are approximated afterwards (e.g., 3D speed, vertical velocity). You can for example apply a low-pass filter, whereby you try to only allow fluctuations that have a slow frequency change (gradual changes from point to point) so as to filter out (i.e., reduce the amplitude of) the jitter that occurs at very high frequencies (because they result in sudden changes from point to point). Note that when using low-pass filters there can be some time shift, so in that case it is good to undo that shift by running the smoothing forwards and backwards (we do this by using filtfilt); this undoing of distortions in time is called a “zero-phase” low-pass filter. Applying zero-phase low-pass filters is important if you care about precise temporal precision relative to some other timeseries for example (e.g., acoustics).

In [22]:
def butter_it(x, sampling_rate, order, lowpass_cutoff):
    nyquist = sampling_rate / 2
    cutoff = lowpass_cutoff / nyquist  # Normalized frequency
    b, a = butter(order, cutoff, btype='low')
    filtered_x = filtfilt(b, a, x)
    return np.asarray(filtered_x, dtype=np.float64)

In [23]:
#apply a butterworth filter to the following position traces

data[xyz_columns] = data[xyz_columns].apply(lambda x: butter_it(x=x,sampling_rate=100, order=20, lowpass_cutoff=20))

## Kinematic Measures
<b> Speed </b> : Change in distance per unit time

<b>  Velocity </b> : Change in displacement per unit time

<b> Acceleration </b> : Change in velocity per unit time

<b> Jerk </b> : Change in acceleration per unit time

In [24]:
# function that differientates and butterworth filters the speed vector
def derive_it(x):
    x = np.concatenate(([0], np.diff(x)))
    x= butter_it(x, sampling_rate=100, order=2, lowpass_cutoff=20)
    return x

# function to calculate the speed vector
def get_speed_vector(x, y, z, time_millisecond):
#     z = abs(z)
    # calculate the Euclidean distance from time point x to time point x+1, for 3 dimensions
    speed = np.concatenate(([0], np.sqrt(np.diff(x) ** 2 + np.diff(y) ** 2 + np.diff(z) ** 2)))
    speed = butter_it(speed, sampling_rate=100, order=2, lowpass_cutoff=20)

    # scale the speed vector so that we express it units change per second change
    time_diff = np.mean(np.diff(time_millisecond)) / 1000
    speed = speed / time_diff
    return speed

# function to scale the time series
def sc_it(x):
    return (x - np.mean(x)) / np.std(x, ddof=0)

#make a new variable in a pandas dataframe
data['speed'] = get_speed_vector(data['X_RIGHT_INDEX'], data['Y_RIGHT_INDEX'], data['Z_RIGHT_INDEX'], data['time'])
data['vertical_velocity'] = derive_it(data['Y_RIGHT_INDEX'])/np.mean(np.diff(data['time']))
data['acceleration'] = derive_it(data['speed'])
data['jerk'] = derive_it(data['acceleration'])

cs = ['speed', 'vertical_velocity', 'acceleration', 'jerk']
data[cs] = data[cs].apply(lambda x: sc_it(x)) #to ensure that the different features are on a similar scale


# Create the plot using Plotly Express (px)
# Create the second plot
fig4 = go.Figure()
fig4.add_trace(go.Scatter(x=data['time'], y=data['speed'], name='speed', mode='lines', line=dict(color='black')))
fig4.add_trace(go.Scatter(x=data['time'], y=data['vertical_velocity'], name='vertical velocity', mode='lines', line=dict(color='red')))
fig4.add_trace(go.Scatter(x=data['time'], y=data['acceleration'], name='acceleration', mode='lines', line=dict(color='gold')))
fig4.add_trace(go.Scatter(x=data['time'], y=data['jerk'], name='jerk', mode='lines', line=dict(color='green')))
# show only a portion of the plot for the x axis
fig4.update_xaxes(range=[1800, 2600])
fig4.show()

  data['speed'] = get_speed_vector(data['X_RIGHT_INDEX'], data['Y_RIGHT_INDEX'], data['Z_RIGHT_INDEX'], data['time'])
  data['vertical_velocity'] = derive_it(data['Y_RIGHT_INDEX'])/np.mean(np.diff(data['time']))
  data['acceleration'] = derive_it(data['speed'])
  data['jerk'] = derive_it(data['acceleration'])
