# <strong><u>2: Speed and Upsample Notebook |</u></strong>

This notebook is designed for speed alignment and data up-sampling in gesture analysis. It uses Python libraries like Pandas for data management, NumPy for numerical operations, and Matplotlib for plotting.

In this workflow, the notebook reads in keypoint data from a pickled DataFrame and performs operations to align speed and upsample the data.

### <strong>Requirements</strong>

To run this notebook, you will need the following Python packages:

- pandas
- numpy
- scipy
- matplotlib

You can install these packages using pip:
```shell
    pip install pandas numpy scipy matplotlib
```

### <strong>Key Variables and Functions</strong>
- `KEYPOINT`: Variable specifying which keypoint (e.g., "right_wrist") to focus on
- `keypoints_df`: DataFrame storing keypoint data



### <strong>Importing Libraries</strong>
- `pandas`: For DataFrame operations and data analysis
- `numpy`: For numerical computations
- `scipy`: For scientific computations, including filtering
- `matplotlib`: For plotting and visualizations


In [1]:
import pandas as pd
import numpy as np
from scipy.signal import savgol_filter
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import tqdm as tq

### <strong>Choosing The Right Keypoints</strong>

Using this script, we can isolate the keypoints with which we would like to perform the analyses on. Below is our index of Keypoints as `keypoint_name(keypoint_id)`:

![Keypoints](https://i.imgur.com/SkXsWRZ.png)

### <strong><u>Available Keypoints:</u></strong>
<ul>
    <li><font color='#f2ce72'>nose</font>
    <li><font color='#f2ce72'>right_eye</font>
    <li><font color='#f2ce72'>left_eye</font>
    <li><font color='#f2ce72'>right_shoulder</font>
    <li><font color='#f2ce72'>left_shoulder</font>
    <li><font color='#f2ce72'>right_elbow</font>
    <li><font color='#f2ce72'>left_elbow</font>
    <li><font color='#f2ce72'>right_wrist</font>
    <li><font color='#f2ce72'>left_wrist</font>
</ul>


In [2]:
PARTICIPANT = "TEST_VIDEO"
KEYPOINTS = ["right_wrist", "right_index"]

### <strong>Load Data</strong>
In this section, the data is loaded into the notebook for further processing. You can adjust the source file paths as needed.

In [3]:
# Load keypoint data
keypoints_df = pd.read_csv('../Keypoints/' + PARTICIPANT + '_keypoints.csv', encoding='utf-8')
column_names = [
    "time_ms", 
    "right_shoulder_x", "right_shoulder_y", 
    "left_shoulder_x", "left_shoulder_y", 
    "right_elbow_x", "right_elbow_y", 
    "left_elbow_x", "left_elbow_y", 
    "right_wrist_x", "right_wrist_y", 
    "left_wrist_x", "left_wrist_y", 
    "right_eye_x", "right_eye_y", 
    "left_eye_x", "left_eye_y",
    "nose_x", "nose_y",
    "right_index_x", "right_index_y"
]

keypoints_df.columns = column_names

keypoints_df

Unnamed: 0,time_ms,right_shoulder_x,right_shoulder_y,left_shoulder_x,left_shoulder_y,right_elbow_x,right_elbow_y,left_elbow_x,left_elbow_y,right_wrist_x,...,left_wrist_x,left_wrist_y,right_eye_x,right_eye_y,left_eye_x,left_eye_y,nose_x,nose_y,right_index_x,right_index_y
0,84.461929,0.198001,0.342151,0.307497,0.323048,0.210005,0.519039,0.318844,0.468460,0.308525,...,0.333631,0.566624,0.261904,0.205538,0.284583,0.213101,0.335072,0.612606,0.279609,0.233554
1,84.461929,0.197970,0.342321,0.307493,0.322877,0.211355,0.520423,0.318701,0.468566,0.309275,...,0.332906,0.568208,0.265967,0.209468,0.284984,0.214662,0.335579,0.613424,0.281538,0.234615
2,84.461929,0.197890,0.342544,0.307435,0.322611,0.211838,0.521279,0.318527,0.468604,0.309307,...,0.332512,0.569205,0.267718,0.211380,0.285211,0.215398,0.335682,0.613453,0.282556,0.235052
3,84.461929,0.197889,0.342544,0.307390,0.322506,0.211925,0.521493,0.318324,0.468644,0.309303,...,0.332125,0.569669,0.268629,0.211813,0.285317,0.215475,0.335877,0.613462,0.283162,0.235046
4,84.461929,0.197964,0.342510,0.307357,0.322399,0.211912,0.521668,0.318284,0.468521,0.309232,...,0.331974,0.569654,0.268504,0.212028,0.285359,0.215490,0.335872,0.613231,0.283221,0.235012
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2527,84.461929,0.191529,0.324052,0.300764,0.320215,0.209532,0.501772,0.312723,0.475984,0.304429,...,0.323519,0.592008,0.262365,0.198722,0.279472,0.204498,0.330853,0.618805,0.277494,0.221118
2528,84.461929,0.191777,0.324050,0.300904,0.320389,0.209514,0.501767,0.312701,0.476053,0.304514,...,0.323430,0.592078,0.261025,0.196125,0.279524,0.202596,0.330863,0.619109,0.276265,0.218602
2529,84.461929,0.192015,0.324011,0.301009,0.320790,0.209501,0.501707,0.312693,0.476201,0.304509,...,0.323402,0.592055,0.260169,0.195268,0.279784,0.202405,0.330866,0.619205,0.275564,0.218108
2530,84.461929,0.192362,0.324010,0.301183,0.321365,0.209491,0.501691,0.312690,0.476482,0.304342,...,0.323432,0.592058,0.259832,0.195248,0.280289,0.202582,0.330836,0.619253,0.275666,0.218125


### <strong>Calculate Unsmoothed Speed</strong>
Here, the speed of the right wrist is calculated without any smoothing. Parameters such as the sampling rate can be adjusted.

### <strong>Apply Savitzky-Golay Filtering</strong>
The Savitzky-Golay filtering method is applied to smooth the speed data. You can adjust the window size and polynomial order.

This code block is used to smooth the values in the `speed_unsmooth` column of a Pandas DataFrame called `keypoints_df`. The first line of code sets any values in the `speed_unsmooth` column that are below the 20th percentile to 0, effectively removing any low-speed outliers from the data. The second line of code applies a Savitzky-Golay filter to the `speed_unsmooth` column with a window size of 9 and a polynomial order of 2, and stores the smoothed values in a new column called `speed_smooth`.

The `np.percentile()` function is used to calculate the 20th percentile of the `speed_unsmooth` column, which is used as the threshold for removing low-speed outliers. The `savgol_filter()` function is used to apply a Savitzky-Golay filter to the `speed_unsmooth` column, which is a type of smoothing filter that can be used to remove noise from data while preserving important features such as peaks and valleys.

Overall, this code block is used to smooth the values in the `speed_unsmooth` column of a DataFrame using a Savitzky-Golay filter, and remove low-speed outliers from the data. Possible ways to improve the code include adding comments to explain the purpose of each line of code, and using more descriptive variable names.

In [9]:
# Initialize an empty DataFrame to store final results
final_df = pd.DataFrame()

# Generate the new time series at 10 ms intervals
new_time_series = np.arange(keypoints_df['time_ms'].iloc[0], keypoints_df['time_ms'].iloc[-1], 10)

# Loop through each keypoint specified in the KEYPOINTS list

# SUGGESTION WIM: Wrap the loop with tqdm for a progress bar
for keypoint in tq.tqdm(KEYPOINTS, desc='Processing keypoints', unit='keypoint'):
    # Extracting original x and y coordinates
    x_original = keypoints_df[f'{keypoint}_x'].values
    y_original = keypoints_df[f'{keypoint}_y'].values
    time_original = keypoints_df['time_ms'].values

    # Interpolation for upsampling to 10 ms intervals
    interp_x = interp1d(time_original, x_original, kind='linear')
    interp_y = interp1d(time_original, y_original, kind='linear')
    x_upsampled = interp_x(new_time_series)
    y_upsampled = interp_y(new_time_series)

    # Smoothing using Savitzky-Golay filter
    window_length = 11  # Choose an appropriate window length
    polyorder = 3  # Choose an appropriate polynomial order
    x_smooth = savgol_filter(x_upsampled, window_length, polyorder)
    y_smooth = savgol_filter(y_upsampled, window_length, polyorder)

    # Speed calculation
    delta_time = np.diff(new_time_series)
    delta_x = np.diff(x_smooth) / delta_time
    delta_y = np.diff(y_smooth) / delta_time
    speed = np.sqrt(np.nan_to_num(delta_x ** 2 + delta_y ** 2) / delta_time)
    
    # Normalize speed to make it comparable across participants
    speed = speed / np.nanmax(speed)
    
    # if speed is below the 20th percentile, set it to 0
    speed[speed < np.nanpercentile(speed, 50)] = 0
    
    # smooth the speed
    speed = savgol_filter(speed, 11, 3)

    # Add to the final DataFrame
    temp_df = pd.DataFrame({
        'time_ms': new_time_series[:-1],
        f'{keypoint}_x': x_smooth[:-1],
        f'{keypoint}_y': y_smooth[:-1],
        f'{keypoint}_speed': speed,
        f'{keypoint}_delta_x': delta_x,
        f'{keypoint}_delta_y': delta_y, 
    })
    if final_df.empty:
        final_df = temp_df
    else:
        final_df = pd.merge(final_df, temp_df, on='time_ms', how='right')

final_df.head()

ValueError: If mode is 'interp', window_length must be less than or equal to the size of x.

In [6]:
print("Length of x_upsampled:", len(x_upsampled))
print("Length of y_upsampled:", len(y_upsampled))
print(len(new_time_series))

Length of x_upsampled: 0
Length of y_upsampled: 0
0


### <strong>Save Aligned Data as CSV</strong>
Finally, the aligned data is saved as a CSV file. You can specify the destination path for the output file.

In [None]:
# Save the aligned data as a CSV file
final_df.to_csv(f'C:/Users/cosmo/OneDrive/Desktop/Repositories/Co-Speech-Gesture-Automation/TEST_FILES/'+ PARTICIPANT + '_processed_data.csv', index=False)
final_df.head()

print(f'File saved as {PARTICIPANT}_processed_data.csv')

### <strong>Plot Data</strong>

The `plot()` function is used to plot the unsmoothed and smoothed speed over time. The `label` parameter is used to set the legend label for each line. The `color` parameter is used to set the color of each line.



In [None]:
import plotly.graph_objects as go

# Create traces
trace0 = go.Scatter(x=final_df['time_ms'], y=final_df['right_wrist_speed'], mode='lines', name='Right Wrist Speed')
trace1 = go.Scatter(x=final_df['time_ms'], y=final_df['right_wrist_delta_x'], mode='lines', name='Right Wrist Raw Velocity X', yaxis='y2')
trace2 = go.Scatter(x=final_df['time_ms'], y=final_df['right_wrist_delta_y'], mode='lines', name='Right Wrist Raw Velocity Y', yaxis='y2')


# Create layout
layout = go.Layout(title='Right Wrist Speed and Velocity', xaxis=dict(title='Time (ms)'), yaxis1=dict(title='Speed', overlaying='y', side='left'))

# Create figure
fig = go.Figure(data=[trace0], layout=layout)

# Show figure

fig.show()



In [None]:
# !pip install nbformat==4.2.0