#Exploratory Data Analysis for Example data of gait abnormality.

This EDA is going to look at example data taken by myself as a means of experimenting with novel GCN architectures and for ascertaining the qualities these novel architectures should have in order to extract the data we want to make the medical inferences we need to.

The data is split in 3 categories, with 10 sequences each, with each sequence being 88-157 frames each, 480 x 480 resolution all of the same individual (me) in the same environment but sometimes wearing slightly different clothing. The main difference between the three categories is my gait.

Set 1: My gait is normal, I walk past the camera 5 times in each direction to produce 10 sequences of my normal healthy gait.

Set 2: My gait is marginally impaired. I achieved this by simulating a slight limp.

Set 3: My gait is severely impaired. I acheived this by shuffling my feet, not moving my arms, keeping my head down in an effort to mimic the effect of advanced dementia on standard gait.

The goal of this EDA is to identify the key differences in the data between these sets and work out how to make them as differentiable as possible. The plan thus far is to process all of the images into skeletal graphs via Higher HRNet, load them in here and chart out each of the 17 variables across sequences (so 3 separate coloured dots) to see how different each one is. 

Then I want to create a normal distribution of each of the 3 by decimating the 17 joints in each frame to 3D co-ordinates to view and see if there is clean cut normal distributions.

There are two eventualities when I process this data: either they are extremely separable or they aren't. 

If they are separable, then there is no major problems and I can implement a GCN in here and see how accurate it is. 

If it isn't separable, then I need to work with the data, find the most definining joints and redundant ones, etc. 

compare different GCN's with the data, see if there is major differences, play with different modules potentially invent a new one to improve the performance on the standard ones.

In [45]:
import pandas as pd
import os
import numpy as np

print(os.getcwd())
#Load in the dataset
colnames=['Instance', 'No_In_Sequence', 'Class', 'Joint_1','Joint_2','Joint_3','Joint_4','Joint_5','Joint_6','Joint_7',
          'Joint_8','Joint_9','Joint_10','Joint_11','Joint_12','Joint_13','Joint_14','Joint_15','Joint_16', 'Joint_17'] 
dataset_master = pd.read_csv("example_dataset.csv", names=colnames, header=None)

#Dataset is currently 2 instances worth of severe gait obstruction (simulated parkinson's).
dataset_master.head()

C:\Users\chris\Documents


Unnamed: 0,Instance,No_In_Sequence,Class,Joint_1,Joint_2,Joint_3,Joint_4,Joint_5,Joint_6,Joint_7,Joint_8,Joint_9,Joint_10,Joint_11,Joint_12,Joint_13,Joint_14,Joint_15,Joint_16,Joint_17
0,1,0,1,"[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]"
1,1,1,1,"[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]"
2,1,2,1,"[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]"
3,1,3,1,"[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]","[0, 0, 0]"
4,1,4,1,"[88.94531, -10.5390625, 109]","[84.02344, -12.8828125, 105]","[89.17969, -12.8828125, 110]","[84.49219, -0.4609375, 15]","[84.02344, -10.5390625, 104]","[100.19531, 4.4609375, 191]","[104.41406, -11.0078125, 172]","[126.21094, 13.1328125, 187]","[126.21094, 8.4453125, 186]","[150.35156, 15.0078125, 185]","[149.17969, 14.5390625, 185]","[146.60156, 2.3515625, 187]","[141.21094, -10.7734375, 172]","[177.53906, 8.9140625, 190]","[174.25781, -10.3046875, 157]","[208.94531, 14.7734375, 184]","[199.10156, 13.8359375, 187]"


In [46]:
import ast

#Next step: remove all blank frames (Frames with no joints present)
#dataset_master = ast.literal_eval(dataset_master)

for index, row in dataset_master.iterrows():
    for col_index, col in enumerate(row):
        if col_index >= 3:
            tmp = ast.literal_eval(row[col_index])
            tmp = list(map(int, tmp))
            dataset_master.iat[index, col_index] = tmp
            print(dataset_master.iat[index, col_index])
    


#After that, cast all joints to integers


[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[88, -10, 109]
[84, -12, 105]
[89, -12, 110]
[84, 0, 15]
[84, -10, 104]
[100, 4, 191]
[104, -11, 172]
[126, 13, 187]
[126, 8, 186]
[150, 15, 185]
[149, 14, 185]
[146, 2, 187]
[141, -10, 172]
[177, 8, 190]
[174, -10, 157]
[208, 14, 184]
[199, 13, 187]
[90, 10, 190]
[86, 13, 189]
[85, 7, 189]
[86, 15, 190]
[80, -4, 103]


SyntaxError: invalid syntax (<unknown>, line 1)