## Exploration (Number)
In this exploration, we will be looking at the four kinds of active motions (walking, jogging, going upstairs, and going downstairs).
Specifically, we will be testing for whether there is any significable difference between the different kinds of motions and the general rotation velocities that we see.

In [32]:
import os
import pandas as pd
import statsmodels.api as sm
from statsmodels.multivariate.manova import MANOVA

manova_df = pd.DataFrame(columns = ['Activity', 'rotationRateX', 'rotationRateY', 'rotationRateZ'])

# Fill up the manova data
for activity_prefix in ['wlk', 'ups', 'dws', 'jog']:
  csv_names = []

  # For every folder whose name starts with the prefix
  folder_names = [f for f in os.listdir('A_DeviceMotion_data') if f.startswith(activity_prefix)]
  for folder_name in folder_names:
    # For every csv inside those folders:
    these_names = [f for f in os.listdir('A_DeviceMotion_data/'+folder_name) if f.endswith('.csv')]
    csv_names.extend(these_names)

  # Now, we have all the csv's
  for csv_name in csv_names:
    # Read the csv
    full_csv_name = 'A_DeviceMotion_data/'+folder_name+'/'+csv_name
    df = pd.read_csv(full_csv_name)

    df.dropna(inplace=True) # just in case
    
    # TODO: This is highly inefficent.
    # Fill up the new dataframe
    avg_x = df['rotationRate.x'].abs().mean()
    avg_y = df['rotationRate.y'].abs().mean()
    avg_z = df['rotationRate.z'].abs().mean()

    manova_df.loc[len(manova_df)] = [activity_prefix, avg_x, avg_y, avg_z]

display(manova_df)

Unnamed: 0,Activity,rotationRateX,rotationRateY,rotationRateZ
0,wlk,1.876488,1.491614,0.551218
1,wlk,1.649619,1.020533,0.670874
2,wlk,0.863968,0.686256,0.770936
3,wlk,1.182331,1.198608,1.363506
4,wlk,1.186141,1.200116,0.584594
...,...,...,...,...
259,jog,2.784325,2.728351,1.214659
260,jog,1.058893,0.704312,0.598713
261,jog,1.373832,1.739999,1.685387
262,jog,1.778247,1.629658,1.812501


In [33]:
# Now we can perform a MANOVA test, does the activities differ significantly based on the rotation rates?
manova_calculation = MANOVA.from_formula('Activity ~ rotationRateX + rotationRateY + rotationRateZ', data=manova_df)
print(manova_calculation.mv_test())

                                 Multivariate linear model
                                                                                            
--------------------------------------------------------------------------------------------
       Intercept                Value         Num DF  Den DF          F Value         Pr > F
--------------------------------------------------------------------------------------------
          Wilks' lambda                0.0000 4.0000 257.0000 289356276058554304.0000 0.0000
         Pillai's trace                1.0000 4.0000 257.0000 289356276058554304.0000 0.0000
 Hotelling-Lawley trace 4503599627370495.0000 4.0000 257.0000 289356276058554304.0000 0.0000
    Roy's greatest root 4503599627370495.0000 4.0000 257.0000 289356276058554304.0000 0.0000
--------------------------------------------------------------------------------------------
                                                                                            
-----------

In this case, the p-values (indicated by the column Pr > F) we get are well below our usual threshold of 0.05, meaning we can reject our null hypothesis that the rotation rates combined do not differ between the different kinds of movement activity.
However, there are many serious deficiencies with this testing that we are aware of.
Namely,
1. Assuming that this testing is well-conceived, there is an obvious numerical stability issue with these calculations.
2. Whether a simple additive model makes sense to combine rotation rates from a physical perspective (given they're in something like radians per unit time, I would assume.)
3. This testing does not take into account the time-series nature of the data.
4. This testing does not appropriately account for possible idiosyncrasies in the time evolution of the data (like slowing down or speeding up of walking, tripping, whatever.)
5. There needs to be a post-hoc analysis since by the intrinsic linkage defined by physics, these rotation rates are definitionally not independent.
In a future direction, these and other deficiencies will need to be corrected.
The scope should also be extended to a classification approach where the rotations are able to be used as predictors.