# Statistical Analysis
This notebook can be used to analyse the results from the tracking algorithm (see [4b_segment_and_track_points_with_kalman_alignment](4b_segment_and_track_points_with_kalman_alignment.ipynb)). First, you specify which videos to consider and place them in groups to compare. The notebook runs welch t-tests to compare the velocity measurements, average velocity per path and average velocity per video. Finally, it generates boxplots that show the measurements in the videos.

Note that the data first have to be processed in 3b

The data and pipeline version of this notebook are the commit {insert commit hash here}.

# Steps to be done before analysis:
Generate markers for every video with the notebook place_and_evaluate_markers.ipynb
Run segment_and_track_points_with_kalman_alignment.ipynb for every video

In [None]:
# Imports
from IPython import display
import os

import yaml
import numpy as np
import pandas as pd
from scipy.stats import ttest_ind
import matplotlib.pyplot as plt

from fam13a import utils

The videos that must be included in the analysis can be specified below. You can assign the videos to belong to one of the groups such that you can compare both groups against each other in the welch t-tests performed later in the notebook.

In [None]:
DATA_DIR = os.path.join(utils.here(True), 'data', 'processed', 'xenopus', 'statistics')
group_1_video_keys = ['15_L2_MO_late_1', '15_L2_MO_late_2', '15_L2_MO_late_3']
group_2_video_keys = ['C_MO_1', 'C_MO_2', 'C_MO_3']

In [None]:
def determine_category(video_id):
    if video_id in group_1_video_keys:
        return "Group 1"
    elif video_id in group_2_video_keys:
        return "Group 2"
    else:
        return "undefined"

# Results

In [None]:
df = pd.DataFrame(dict({"video_id": [], "path_id": [], "vel": []}))
video_keys = group_1_video_keys + group_2_video_keys
for VIDEO_ID in video_keys:
    yml_file = os.path.join(DATA_DIR, VIDEO_ID + ".yml")
    if os.path.exists(yml_file):
        with open(yml_file, "r") as f:
            result = yaml.load(f, Loader=yaml.SafeLoader)[0]
        for path_id in range(len(result['vel_per_path'])):
            df = df.append(pd.DataFrame(dict({"video_id": VIDEO_ID,
                                              "path_id": int(path_id)+1,
                                              "vel": result['vel_per_path'][path_id]})))
df["video_category"] = df["video_id"].apply(determine_category)

## Save results to CSV
The following lines of code store the results in a csv for further analysis in a different program. Note that only the union of the experiments listed in `group_1_video_keys` and `group_2_video_keys`are included. If you wish to include more experiments, you can just add them to these lists and ignore the group column in the CVS if this is then no longer meaningful.

In [None]:
filename = f"{DATA_DIR}{os.path.sep}xenopus_path_data.csv"
df.to_csv(filename, index = False)
print(f"Saved outputs to {filename}")

# t-test all individual velocity measurements
https://en.wikipedia.org/wiki/Welch%27s_t-test

In [None]:
test_result = ttest_ind(df.loc[df["video_category"]=="Group 1",]["vel"],
                        df.loc[df["video_category"]=="Group 2",]["vel"],
                        equal_var=False)
print(f"t-statistic: {test_result[0]}")
print(f"two sided p-value: {test_result[1]}")

# t-test average velocity per path
https://en.wikipedia.org/wiki/Welch%27s_t-test

In [None]:
test_result = ttest_ind(df.loc[df["video_category"]=="Group 1",].groupby(["video_id", "path_id"]).mean()["vel"].values,
                        df.loc[df["video_category"]=="Group 2",].groupby(["video_id", "path_id"]).mean()["vel"].values,
                        equal_var=False)
print("t-statistic: " + str(test_result[0]))
print("two sided p-value: " + str(test_result[1]))

# t-test average velocity per video
https://en.wikipedia.org/wiki/Welch%27s_t-test
The average is a weighted average as in longer paths contribute more to the average velocity in a video than shorter paths.

In [None]:
test_result = ttest_ind(df.loc[df["video_category"]=="Group 1",].groupby("video_id").mean()["vel"].values,
                        df.loc[df["video_category"]=="Group 2",].groupby("video_id").mean()["vel"].values,
                        equal_var=False)
print("t-statistic: " + str(test_result[0]))
print("two sided p-value: " + str(test_result[1]))

# Generate plots

In [None]:
plt.rcParams['figure.figsize'] = [17, 10]
df.boxplot(column="vel", by="video_id", fontsize=20)
plt.show()
df.boxplot(column="vel", by="video_id", showfliers=False, fontsize=16)
plt.show()

In [None]:
plt.rcParams['figure.figsize'] = [50, 17]
df.groupby("video_id").boxplot(column="vel", by="path_id", sharey=True, fontsize=20)
plt.show()
df.groupby("video_id").boxplot(column="vel", by="path_id", sharey=False, fontsize=20)
plt.show()

In [None]:
plt.rcParams['figure.figsize'] = [15, 10]
df.boxplot(column="vel", by="video_category", fontsize=15)
plt.show()
df.boxplot(column="vel", by="video_category", showfliers=False, fontsize=15)
plt.show()

In [None]:
plt.rcParams['figure.figsize'] = [30, 15]
df.groupby("video_category").boxplot(column="vel", by="path_id", sharey=True, fontsize=20)
plt.show()
df.groupby("video_category").boxplot(column="vel", by="path_id", sharey=False, fontsize=20)
plt.show()