## Data Description

It is same as the aiming_script


| File Type | Granularity | Description | Role in the Project |
| --- | --- | --- | --- |
| AimingData.csv | Per Trial | Contains subject demographics (Gender, DOB) and trial conditions (visCond, surface, distance). This is the main metadata and conditions file | The Base/Core DataFrame. We will build everything onto this |
| s...timeInfoData.csv | Per Trial | Summary of movement timing for each trial (onset, offset, movTime) | Mostly redundant, as this info is in other files, but can be used for verification |
| s...grasp_paramData.csv | Per Trial | Calculated Features that summarises the entire grasp for a trail (eg. MaxGrip Aperture MGA, Max Velocity MVel). The signal column is 'grasp'. | These are precalculated summaries we can use directly |
| s...reach_paramData.csv | Per Marker, Per Trial | Calculated Feature for individual markers (index, thumb, etc.) during the reach. Here we have multiple rows per trial | We need to reshape to fit the one row per trial format |
| s...trajData.csv | Per Time-Step, Per Trial | The raw 3D coordinates over time for all markers. | Too granular for a standard ML model directly. We use this for advance feature engineering or deep learning (eg. transformers). The paramData filess are the summary of this raw data|

# First we will merge all this information into a single DataFrame

1. Combine all the single subject files: First we will read all the files for individual subjects (s001, etc) and concatenate them into a single DataFrame for each data type.
2. Handle the Multi Row Per Trial Data (reach_paramData): This feature is too long. We will pivot it to a wide format so theres only one row per trial. For each parameter in that file (MVel1, Macc, etc), we will create new columns based on the signal (eg. MVel_index, MVel_thumb, MAcc_index, MAcc_thumb).
3. Merge everything together

In [11]:
import pandas as pd
import glob 
import os

# Step 0: Set Data Path

In [17]:
data_path = "C:/CourseWork/Dissertation Classifying grip strategies using machine learning/data/01_raw/Prehension/filtered_data/"

# Step 1: Load Grasp Information

In [18]:
grasp_param_files = glob.glob(os.path.join(data_path, "s???grasp_paramData.csv"))
if not grasp_param_files:
    raise FileNotFoundError(f"No grasp_paramData files found in {data_path}")
grasp_params_full = pd.concat([pd.read_csv(f) for f in grasp_param_files], ignore_index=True)
# Drop the 'signal' column as it's redundant here
grasp_params_full.drop(columns=["signal"], errors="ignore", inplace=True)
print(f"Combined {len(grasp_param_files)} grasp_param files. Shape: {grasp_params_full.shape}")

Combined 20 grasp_param files. Shape: (2826, 49)


In [20]:
# 1b. Load REACH parameters
reach_param_files = glob.glob(os.path.join(data_path, "s*reach_paramData.csv"))
if not reach_param_files:
    raise FileNotFoundError(f"No reach_paramData files found in {data_path}")
reach_params_long = pd.concat([pd.read_csv(f) for f in reach_param_files], ignore_index=True)
print(f"Combined {len(reach_param_files)} reach_param files. Shape: {reach_params_long.shape}")

Combined 20 reach_param files. Shape: (11304, 39)


# Step 2: Create a Clean Base DataFrame with Demographics

In [21]:
# We will use GRASP dataframe as our base
base_df = grasp_params_full[['subjName', 'trialN', 'visCond', 'surface', 'distance']].copy()

# Remove any duplicates rows to get unique entry for each trial
base_df.drop_duplicates(inplace=True)
print(f"Created clean base DataFrame with {base_df.shape[0]} unique trials")

Created clean base DataFrame with 2826 unique trials


# Step 3: Create aggregate Kinematics from REACH data


In [26]:
# We will now calculate summary kinematics by averaging across markers, these are the variables that describe the overall movement
kinematic_vars_to_average = [
    'FX', 'FY', 'FZ', 'FXVel', 'FYVel', 'FZVel', 'FVel', 'FAcc',
    'MVel', 'MAcc', 'MDec', 'pathLength', 'Xmax', 'Ymax', 'Zmax'
]
# Calculate the mean for each trial across the 4 markers ('index', 'knuck', 'thumb', 'wrist')
reach_summary = reach_params_long.groupby(['subjName', 'trialN'])[kinematic_vars_to_average].mean().reset_index()
print(f"Created correct summary kinematics. Shape: {reach_summary.shape}")


Created correct summary kinematics. Shape: (2826, 17)


# Step 4: Create the wide format Dataframe for marker specific kinematics

In [29]:
marker_specific_vars = [
    'timeMAcc', 'timeMVel', 'timeMDec', 'timeMDecToOffset',
    'timeMVelToMDec', 'timeMAccToMVel',
    'timeToXmax', 'timeToYmax', 'timeToZmax',
    'XlocMinN', 'YlocMinN', 'ZlocMinN', 'XlocMaxN', 'YlocMaxN', 'ZlocMaxN',
    'Xmax_index', 'Xmax_knuck', 'Xmax_thumb', 'Xmax_wrist',
]

# A better, more robust approach is to identify all columns that are NOT trial-level info.
trial_level_info = ['subjName', 'trialN', 'visCond', 'surface', 'distance', 'onset', 'offset', 'movTime']
all_kinematic_vars = [col for col in reach_params_long.columns if col not in trial_level_info and col != 'signal']

reach_params_wide = reach_params_long.pivot_table(
    index=['subjName', 'trialN'],
    columns='signal',
    values=all_kinematic_vars
)

# Create clearer column names
reach_params_wide.columns = [f"{val}_{sig}" for val, sig in reach_params_wide.columns]
reach_params_wide.reset_index(inplace=True)
print(f"Pivoted marker-specific data. Shape: {reach_params_wide.shape}")

Pivoted marker-specific data. Shape: (2826, 122)


# Merge Everything into a final master dataframe

In [31]:
# Define merge keys
merge_keys = ["subjName", "trialN"]

# 5a. Merge the clean base with the CORRECT summary kinematics
master_df = pd.merge(base_df, reach_summary, on=merge_keys, how='inner')

# 5b. Merge with the detailed grasp parameters (dropping redundant columns)
master_df = pd.merge(
    master_df,
    grasp_params_full.drop(columns=['visCond', 'surface', 'distance']), on=merge_keys, how="inner"
)

# 5c. Merge with the wide format marker specific data
master_df = pd.merge(master_df, reach_params_wide, on=merge_keys, how="inner")

In [32]:
# FINAL CLEANUP
if "distance" in master_df.columns:
    master_df["distance"] = master_df["distance"].str.replace(".csv", "", regex=False)
print("\n--- New Corrected Master DataFrame")
print(f"Final combined dataframe has {master_df.shape[0]} rows and {master_df.shape[1]} columns")
print("\nFirst 5 rows of the new Prehension DataFrame:")
print(master_df.head())


--- New Corrected Master DataFrame
Final combined dataframe has 2826 rows and 184 columns

First 5 rows of the new Prehension DataFrame:
   subjName  trialN visCond surface distance      FX_x      FY_x      FZ_x  \
0         1       1   clear   black     near  0.066117  0.109976  0.153287   
1         1       2   clear   black      far  0.080668  0.115076  0.359722   
2         1       3   clear    wood   middle  0.071642  0.117912  0.249301   
3         1       4   clear   black   middle  0.072217  0.115635  0.250760   
4         1       5   clear    wood     near  0.070657  0.111656  0.146179   

    FXVel_x   FYVel_x  ...  timeToXmax_thumb  timeToXmax_wrist  \
0  0.007951  0.013748  ...          3.308333          3.316667   
1  0.010462  0.015378  ...          3.775000          4.050000   
2  0.008205  0.016199  ...          3.666667          3.691667   
3  0.012554  0.011980  ...          3.791667          3.958333   
4  0.008591  0.014774  ...          3.875000          3.891667 

In [33]:
# --- Save the new, reliable master dataset ---
master_df.to_csv("prehension_master_dataset.csv", index=False)
print("\nSuccessfully saved the new, corrected data to 'prehension_master_dataset.csv'")


Successfully saved the new, corrected data to 'prehension_master_dataset.csv'
