# Gait analysis
This tutorial showcases the high-level functions composing the gait pipeline. Before following along, make sure all data preparation steps have been followed in the [data preparation tutorial](data_preparation.ipynb). 

To run the complete gait pipeline, a prerequisite is to have both accelerometer and gyroscope data, although a small part of the pipeline requires only accelerometer data. Roughly, the pipeline can be split into seven segments:
1. Data preprocessing
2. Gait feature extraction
3. Gait detection
4. Arm activity feature extraction
5. Filtering gait
6. Arm swing quantification
7. Aggregation

Using only accelerometer data, the first three steps can be completed. 

[!WARNING] The gait pipeline has been developed on data of the Gait Up Physilog 4, and is currently being validated on the Verily Study Watch. Different sensors and positions on the wrist may affect outcomes.

Throughout the tutorial, a small segment of data from a participant of the Personalized Parkinson Project is used to demonstrate the functionalities.

## Load data
Load the prepared data into memory. For example, the following functions can be used depending on the file extension of the data:
- _.csv_: `pandas.read_csv()` ([documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html))
- _.json_: `json.load()` ([documentation](https://docs.python.org/3/library/json.html#json.load))

We use the interally developed `TSDF` ([documentation](https://biomarkersparkinson.github.io/tsdf/)) to load and store data [[1](https://arxiv.org/abs/2211.11294)]. 

In [1]:
from pathlib import Path
from paradigma.util import load_tsdf_dataframe

# Set the path to the data file location
path_to_prepared_data =  Path('../../tests/data/1.prepared_data/imu')

# Load the data from the file
df_imu, _, _ = load_tsdf_dataframe(path_to_prepared_data, prefix='IMU')

df_imu

Unnamed: 0,time,accelerometer_x,accelerometer_y,accelerometer_z,gyroscope_x,gyroscope_y,gyroscope_z
0,0.00000,0.550718,0.574163,-0.273684,-115.670732,32.012195,-26.097561
1,0.01004,0.535885,0.623445,-0.254545,-110.609757,34.634146,-24.695122
2,0.02008,0.504306,0.651675,-0.251675,-103.231708,36.768293,-22.926829
3,0.03012,0.488517,0.686603,-0.265550,-96.280488,38.719512,-21.158537
4,0.04016,0.494258,0.725359,-0.278469,-92.560976,41.280488,-20.304878
...,...,...,...,...,...,...,...
72942,730.74468,0.234928,-0.516268,-0.802871,0.975610,-2.256098,2.256098
72943,730.75472,0.245455,-0.514354,-0.806699,0.304878,-1.707317,1.768293
72944,730.76476,0.243541,-0.511005,-0.807177,0.304878,-1.585366,1.890244
72945,730.77480,0.240191,-0.514354,-0.808134,0.000000,-1.280488,1.585366


## Step 1: Preprocess data
The single function `preprocess_imu_data` in the cell below runs all necessary preprocessing steps. It requires the loaded dataframe, a configuration object `config` specifying parameters used for preprocessing, and a selection of sensors. For the sensors, options include `'accelerometer'`, `'gyroscope'`, or `'both'`.

The function `preprocess_imu_data` processes the data as follows:
1. Resample the data to ensure uniformly distributed sampling rate
2. Apply filtering to separate the gravity component from the accelerometer

In [2]:
from paradigma.config import IMUConfig
from paradigma.preprocessing import preprocess_imu_data

config = IMUConfig()

df_preprocessed = preprocess_imu_data(
    df=df_imu, 
    config=config,
    sensor='both'
)

print(f"The dataset is automatically resampled to {config.sampling_frequency} Hz.")
df_preprocessed

The dataset is automatically resampled to 100 Hz.


Unnamed: 0,time,accelerometer_x,accelerometer_y,accelerometer_z,gyroscope_x,gyroscope_y,gyroscope_z,accelerometer_x_grav,accelerometer_y_grav,accelerometer_z_grav
0,0.00,0.053078,0.010040,-0.273154,-115.670732,32.012195,-26.097561,0.497639,0.564123,-0.000530
1,0.01,0.038337,0.058802,-0.256899,-110.636301,34.624710,-24.701537,0.497666,0.564510,0.002305
2,0.02,0.006824,0.086559,-0.256739,-103.292766,36.753000,-22.942002,0.497698,0.564887,0.005122
3,0.03,-0.009156,0.120855,-0.273280,-96.349062,38.692931,-21.175227,0.497733,0.565254,0.007919
4,0.04,-0.003770,0.159316,-0.289007,-92.585735,41.237328,-20.311531,0.497772,0.565610,0.010696
...,...,...,...,...,...,...,...,...,...,...
73074,730.74,-0.008013,0.000749,-0.004196,1.150220,-2.561552,2.440945,0.252408,-0.502961,-0.819643
73075,730.75,-0.001078,0.000216,-0.001827,0.588721,-1.917765,1.948620,0.252408,-0.502961,-0.819643
73076,730.76,0.004707,0.004335,-0.005821,0.270257,-1.626831,1.813725,0.252408,-0.502961,-0.819643
73077,730.77,0.000607,0.004143,-0.006275,0.185022,-1.451942,1.793145,0.252408,-0.502961,-0.819643


The resulting dataframe shown above contains uniformly distributed timestamps with corresponding accelerometer and gyroscope values. Note the for accelerometer values, the following notation is used: 
- `accelerometer_x`: the accelerometer signal after filtering out the gravitational component
- `accelerometer_x_grav`: the gravitational component of the accelerometer signal

The accelerometer data is retained and used to compute gravity-related features for the classification tasks, because the gravity is informative of the position of the arm.

## Step 2: Extract gait features
With the data uniformly resampled and the gravitional component separated from the accelerometer signal, features can be extracted from the time series data. This step does not require gyroscope data. To extract the features, the pipeline executes the following steps:
- Use overlapping windows to group timestamps;
- Extract temporal features;
- Use Fast Fourier Transform the transform the windowed data into the spectral domain;
- Extract spectral features;
- Combine both temporal and spectral features into a final dataframe

These steps are encapsulated in `extract_gait_features` (documentation can be found [here](https://github.com/biomarkersParkinson/paradigma/blob/main/src/paradigma/gait/gait_analysis.py)).

In [4]:
from paradigma.config import GaitConfig
from paradigma.pipelines.gait_pipeline import extract_gait_features

config = GaitConfig(step='gait')

df_gait_features = extract_gait_features(
    df=df_preprocessed, 
    config=config
)

print(f"A total of {df_gait_features.shape[1]-1} features have been extracted from {config.window_length_s}-second windows with {config.window_length_s-config.window_step_length_s} seconds overlap.")
df_gait_features

A total of 34 features have been extracted from 6-second windows with 5 seconds overlap.


Unnamed: 0,time,accelerometer_x_grav_mean,accelerometer_y_grav_mean,accelerometer_z_grav_mean,accelerometer_x_grav_std,accelerometer_y_grav_std,accelerometer_z_grav_std,accelerometer_std_norm,accelerometer_x_power_below_gait,accelerometer_y_power_below_gait,...,accelerometer_mfcc_3,accelerometer_mfcc_4,accelerometer_mfcc_5,accelerometer_mfcc_6,accelerometer_mfcc_7,accelerometer_mfcc_8,accelerometer_mfcc_9,accelerometer_mfcc_10,accelerometer_mfcc_11,accelerometer_mfcc_12
0,0.0,0.527557,0.485994,-0.314004,0.045376,0.061073,0.312533,0.187336,0.002437,3.817437e-02,...,0.622230,0.700691,0.150885,0.458676,0.033595,0.243243,0.178762,0.066051,0.090567,0.128075
1,1.0,0.532392,0.466982,-0.426883,0.045834,0.052777,0.261492,0.196270,0.001498,1.291698e-02,...,0.288084,0.581373,0.086737,0.447655,-0.129908,0.292478,0.028656,0.123552,0.049143,0.081229
2,2.0,0.545189,0.433756,-0.539777,0.057979,0.080084,0.145126,0.200461,0.001840,1.876028e-03,...,0.277590,0.481786,0.069445,0.331342,-0.197669,0.244441,-0.151760,0.091923,-0.100824,0.003800
3,3.0,0.556586,0.397208,-0.613691,0.070765,0.122509,0.054681,0.104950,0.003253,1.984698e-03,...,0.521323,0.321901,0.237257,0.078296,-0.074240,0.283644,-0.239699,0.028845,-0.050743,0.036805
4,4.0,0.571852,0.359068,-0.639196,0.079765,0.144845,0.042924,0.095547,0.002794,2.084068e-03,...,0.538269,0.111283,0.293136,-0.069686,-0.059406,0.356973,-0.266953,0.050041,0.058963,0.082503
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
720,720.0,0.256444,-0.500508,-0.819729,0.010743,0.002128,0.001750,0.005931,0.000012,4.725561e-07,...,-0.139455,-0.188946,-0.132557,0.084801,-0.017492,0.192915,0.245377,0.193867,0.187995,-0.012953
721,721.0,0.253641,-0.501306,-0.820199,0.009944,0.002160,0.001446,0.006012,0.000006,1.032106e-06,...,-0.132443,-0.210135,-0.139654,0.145187,0.114746,0.259192,0.164914,0.155829,0.138032,0.018775
722,722.0,0.250637,-0.502115,-0.820543,0.006747,0.001614,0.001049,0.005955,0.000005,1.246072e-06,...,-0.253100,-0.130186,-0.120869,0.202868,0.257120,0.277669,0.102645,0.244376,0.095645,0.016884
723,723.0,0.248829,-0.502711,-0.820712,0.003810,0.000894,0.000793,0.005670,0.000004,8.045864e-07,...,-0.264555,0.052061,-0.069921,0.162502,0.154580,0.324308,0.110056,0.302551,0.060131,-0.000505


Each row in this dataframe corresponds to a single window, with the window length and overlap set in the `config` object. Note that the `time` column has a 1-second interval instead of the 10-millisecond interval before, as it now represents the starting time of the window relative to the first data point of the dataframe.

## Step 3: Gait detection

## Step 4: Arm activity feature extraction

## Step 5: Filtering gait

## Step 6: Arm swing quantification

## Step 7: Aggregation