# Data Preparation
ParaDigMa requires the data to be of a specific format. This tutorial provides examples of how to prepare your input data for analysis.

## Load data
This example uses data of the Personalized Parkinson Project, which is stored in Time Series Data Format (TSDF). IMU and PPG data are sampled at a different sampling frequency and hence stored separately. 

In [None]:
import os
from paradigma.util import load_tsdf_dataframe

path_to_raw_data = os.path.join('../../tests/data/0.raw_data')
path_to_imu_data = os.path.join(path_to_raw_data, 'imu')

df_imu, imu_time, imu_values = load_tsdf_dataframe(
    path_to_data=path_to_imu_data, 
    prefix='IMU'
)
df_imu.head(5)

In [None]:
import os
from paradigma.util import load_tsdf_dataframe

path_to_raw_data = os.path.join('../../tests/data/0.raw_data')
path_to_ppg_data = os.path.join(path_to_raw_data, 'ppg')

df_ppg, ppg_time, ppg_values = load_tsdf_dataframe(
    path_to_data=path_to_ppg_data, 
    prefix='PPG'
)
df_ppg.head(5)

## Prepare dataframe

#### Change column names
To safeguard robustness of the pipeline, ParaDigMa fixes column names to a predefined standard.

In [None]:
from paradigma.constants import DataColumns

accelerometer_columns = [DataColumns.ACCELEROMETER_X, DataColumns.ACCELEROMETER_Y, DataColumns.ACCELEROMETER_Z]
gyroscope_columns = [DataColumns.GYROSCOPE_X, DataColumns.GYROSCOPE_Y, DataColumns.GYROSCOPE_Z]

# Rename dataframe columns
df_imu = df_imu.rename(columns={
    'time': DataColumns.TIME,
    'acceleration_x': DataColumns.ACCELEROMETER_X,
    'acceleration_y': DataColumns.ACCELEROMETER_Y,
    'acceleration_z': DataColumns.ACCELEROMETER_Z,
    'rotation_x': DataColumns.GYROSCOPE_X,
    'rotation_y': DataColumns.GYROSCOPE_Y,
    'rotation_z': DataColumns.GYROSCOPE_Z,
})

# Set columns to a fixed order
df_imu = df_imu[[DataColumns.TIME] + accelerometer_columns + gyroscope_columns]
df_imu.head(5)

In [None]:
from paradigma.constants import DataColumns

ppg_columns = [DataColumns.PPG]

# Rename dataframe columns
df_ppg = df_ppg.rename(columns={
    'time': DataColumns.TIME,
    'ppg': DataColumns.PPG,
})

# Set columns to a fixed order
df_ppg = df_ppg[[DataColumns.TIME] + ppg_columns]
df_ppg.head(5)

#### Set sensor values to the correct units
First, TSDF stores the data efficiently using scaling factors. We should therefore convert the sensor values back to the true values. 

In [None]:
df_imu[accelerometer_columns + gyroscope_columns] *= imu_values.scale_factors
df_imu.head(5)

In [None]:
df_ppg[ppg_columns] *= ppg_values.scale_factors
df_ppg.head(5)

ParaDigMa expects acceleration to be measured in g, and rotation in deg/s. Units can be converted conveniently using ParaDigMa functionalities.

In [None]:
from paradigma.util import convert_units_accelerometer, convert_units_gyroscope

accelerometer_units = 'm/s^2'
gyroscope_units = 'deg/s'

accelerometer_data = df_imu[accelerometer_columns].values
gyroscope_data = df_imu[gyroscope_columns].values

df_imu[accelerometer_columns] = convert_units_accelerometer(accelerometer_data, accelerometer_units)
df_imu[gyroscope_columns] = convert_units_gyroscope(gyroscope_data, gyroscope_units)
df_imu.head(5)

#### Account for watch side
For the Gait & Arm Swing pipeline, it is essential to ensure correct sensor axes orientation. For more information please read [X]. If the sensors are not correctly aligned, you can use `invert_watch_side` to ensure consistency between sensors worn on the left or right wrist.

In [None]:
from paradigma.util import invert_watch_side

watch_side = 'right'

df_imu = invert_watch_side(df_imu, watch_side)
df_imu.head(5)

#### Change time column
ParaDigMa expects the data to be in seconds relative to the first row. The toolbox has the built-in function `transform_time_array` to help users transform their time column to the correct format.

In [None]:
from paradigma.constants import TimeUnit
from paradigma.util import transform_time_array

df_imu[DataColumns.TIME] = transform_time_array(
    time_array=df_imu[DataColumns.TIME], 
    input_unit_type=TimeUnit.DIFFERENCE_MS, 
    output_unit_type=TimeUnit.RELATIVE_S,
)
df_imu.head(5)

In [None]:
from paradigma.constants import TimeUnit
from paradigma.util import transform_time_array

df_ppg[DataColumns.TIME] = transform_time_array(
    time_array=df_ppg[DataColumns.TIME], 
    input_unit_type=TimeUnit.DIFFERENCE_MS, 
    output_unit_type=TimeUnit.RELATIVE_S,
)
df_ppg.head(5)

These dataframes are ready to be processed by ParaDigMa.