# ANOVA over all DBS configurations for a specific walking test

What we are roughly doing in this notebook:
```
for each paradigm (fast, free,...):
   for each patient:
      for each gait parameter (stride legnt, gait speed, ...):
         if there are significant differences between 130, 100, OFF, ... according to anova:
            n_patients_significant[gait parameter] += 1
```


In [1]:
import sys
from tqdm.notebook import tqdm

sys.path.append("../")
from helper.util import reshape_to_multicolumn
from helper.anova import anova, extract_significant_p, conclude_results
from typing import Tuple, Dict
import pprint

import numpy as np
import pandas as pd

#%load_ext lab_black

## Select DBS configurations to use

In [2]:
configs = {
    "frequencies": ["OFF", "030", "085", "130"],
    "intensities": ["OFF", "033", "066", "100"],
    "pulse_widths": ["OFF", "040", "090"],
}
dbs_parameter = "frequencies"
selected_configs = configs[dbs_parameter]
print(
    f"Configurations that will be used for ANOVA: {dbs_parameter} ({', '.join(selected_configs)})"
)
print(
    f"We are using the frequencies here, but we can do the same for pulse width or intensity, if necessary."
)

Configurations that will be used for ANOVA: frequencies (OFF, 030, 085, 130)
We are using the frequencies here, but we can do the same for pulse width or intensity, if necessary.


## Load data (not visible in html)

In [3]:
stride_params = pd.read_csv("../../data/stride_params.csv")
stride_params.drop("time_stamp_s", axis=1, inplace=True)

df = stride_params.set_index("configuration")
df = df.loc[selected_configs]
df = df.reset_index()
config = df["configuration"]
df.drop("configuration", axis=1, inplace=True)
df.insert(1, "configuration", config)

stride_params = df

stride_params[
    (stride_params["patient_id"] == "Pat_03")
    & (stride_params["configuration"] == "040")
]
stride_params.drop("preferred", axis=1, inplace=True)
available_gait_parameters = stride_params.columns[5:]
strides_dict = {
    patient: stride_params[stride_params["patient_id"] == patient]
    for patient in set(stride_params["patient_id"])
}

## Calculate anovas (not visible in html)

In [4]:
results_anova = {}
anova_significant = {}
unprocessed = []
paradigms = list(set(sorted(stride_params["test"])))
with tqdm(
    total=len(paradigms) * len(available_gait_parameters), file=sys.stdout
) as pbar:
    # handled_cases = 0
    pbar.set_description(f"Processing...")
    for paradigm in paradigms:
        results_anova[paradigm] = {}
        anova_significant[paradigm] = {}
        for gait_para in available_gait_parameters:
            pbar.update(1)
            results, n_a = anova(strides_dict, paradigm, gait_para)
            results_anova[paradigm][gait_para] = results
            unprocessed.extend(list(n_a))


# Not for all patients all tests have been recorded, so there are some for which there are no gait parameters
# pprint.pprint("Not processed because no strides detected: ")
# pprint.pprint(set(unprocessed))

  0%|          | 0/78 [00:00<?, ?it/s]

# Display results

## Show one of the resulting dataframes


In [5]:
# Here you can replace "free" by any of "slow", "normal", "fast", "tug_one", "tug_two" to get results for another test paradigm.
paradigm = "free"
df = conclude_results(results_anova[paradigm], p_value_limit=0.05)
print(
    f"This shows an exemplary result for the gait test paradigm '{paradigm}' (2 minute walk):"
)
df

This shows an exemplary result for the gait test paradigm 'free' (2 minute walk):


Unnamed: 0,n_patients,n_patients_significant,n_feet_significant
gait_speed_meter_per_second,23,20,34
heel_strike_angle_deg,23,21,37
landing_impact_g,23,20,34
max_lateral_excursion_cm,23,17,23
max_sensor_lift_cm,23,20,34
stance_time_per_cent,23,20,30
stance_time_s,23,20,34
stride_length_cm,23,22,38
stride_time_s,23,18,32
swing_time_per_cent,23,20,30


Actually I would not expect differences in turning angle, but there are some.
Let's take a closer look, what might be going on. First, we want to find out for which patient there was the significant difference.

In [6]:
pats = extract_significant_p(
    results_anova[paradigm]["turning_angle_deg"], p_value_limit=0.05
).index
print(
    f"These patients show significant differences for the paradigm '{paradigm}'' and turning_angle_deg:\n",
    list(pats),
)

These patients show significant differences for the paradigm 'free'' and turning_angle_deg:
 ['Pat_01', 'Pat_27', 'Pat_08', 'Pat_19']


Now, let's see what's going on with the gait parameters of Pat_19 for example:

In [7]:
patient = "Pat_19"
paradigm = "free"

df = strides_dict[patient].set_index("test")
gait_paradigm = df.loc[paradigm]
gait_paradigm.groupby("configuration").mean()["turning_angle_deg"]

configuration
085    1.725107
130    2.172212
OFF   -2.379655
Name: turning_angle_deg, dtype: float64

Potentially the patient turned into different directions in the different DBS configurations?

## Explaining what is going to be shown next
Next, we show the average number of patients for which we found a significance according to anova. It is averaged over all gait parameters for one specific gait test paradigm (e.g. "free"). Example:

In [8]:
df = pd.DataFrame(
    data=[10, 20],
    columns=["n_patients_significant"],
    index=["Stride length", "Stride time"],
)
print(
    f"For this, on average gait parameters of {df['n_patients_significant'].mean()} patients reach significance according to anova.\n"
    "Of course if e.g. stride length was significant for 5 patients and stride time as well for 5 patients, these could actually be 5 different patients in both cases. We ignore this for now.\n"
)
df

For this, on average gait parameters of 15.0 patients reach significance according to anova.
Of course if e.g. stride length was significant for 5 patients and stride time as well for 5 patients, these could actually be 5 different patients in both cases. We ignore this for now.



Unnamed: 0,n_patients_significant
Stride length,10
Stride time,20


## Show the conclusion of results

In [17]:
print(
    "For explanation where these values come from, please see the cell above this one (and its output).\n"
)
for paradigm in paradigms:
    df = conclude_results(results_anova[paradigm], p_value_limit=0.05)
    # display(df)
    mean = df.mean()["n_patients_significant"]
    print(
        f"For '{paradigm}':"
        f"  On average, gait parameters of {mean:.1f} patients ({mean/23*100:.1f}%) reach significance according to anova."
    )

For explanation where these values come from, please see the cell above this one (and its output).

For 'slow':  On average, gait parameters of 17.3 patients (75.3%) reach significance according to anova.
For 'normal':  On average, gait parameters of 16.3 patients (70.9%) reach significance according to anova.
For 'fast':  On average, gait parameters of 14.8 patients (64.5%) reach significance according to anova.
For 'tug_one':  On average, gait parameters of 15.5 patients (67.6%) reach significance according to anova.
For 'free':  On average, gait parameters of 18.7 patients (81.3%) reach significance according to anova.
For 'tug_two':  On average, gait parameters of 14.9 patients (64.9%) reach significance according to anova.
