 The goal is to provide a database of false examples for training the algorithm on a chosen exercise, waterbag hard.
 All the data provided is put together in a single table and repetitions of each exercise are separated into unique rows of data.
 The main difficulty is to resize the range of movements so they all fit into the database. Within one exercise the length of movements is rather similar so deciding on a standard time does not impact the result; but using the same time for all exercises is misleading. The chosen solution is to remove or add (using forward-fill) lines of data at regular intervals until all data fit into the fixed 137 (i.e. 1.37 seconds) length, which is appropriate for the waterbag exercise.

In [1]:
import pandas as pd
trackers=pd.read_csv('data1.csv')
trackers.head()

Unnamed: 0,gyroX,gyroY,gyroZ,lowAccelX,lowAccelY,lowAccelZ,highAccelX,highAccelY,highAccelZ,exercise
0,617.0,342.0,-1120.0,1338.0,2865.0,2215.0,-267.0,686.0,621.0,(jab_)cross_shadowboxing_medium_20_T7R_1531551...
1,477.0,-49.0,-1214.0,1516.0,3041.0,2332.0,-249.0,704.0,633.0,(jab_)cross_shadowboxing_medium_20_T7R_1531551...
2,228.0,-596.0,-1256.0,1602.0,3406.0,1796.0,-240.0,740.0,579.0,(jab_)cross_shadowboxing_medium_20_T7R_1531551...
3,24.0,-1098.0,-1276.0,1264.0,3484.0,1194.0,-274.0,748.0,519.0,(jab_)cross_shadowboxing_medium_20_T7R_1531551...
4,-164.0,-1552.0,-1236.0,1016.0,3876.0,576.0,-299.0,787.0,457.0,(jab_)cross_shadowboxing_medium_20_T7R_1531551...


In [2]:
#Using a margin of 68 before and after the maximum as a standard based on waterbag hard (68*2+1=137)
sensors_columns=list(trackers.columns)
sensors_columns.remove('exercise')
col_names=[sensors_columns[i]+str(n) for n in range(0,137) for i in range(0,len(sensors_columns)) ]
flat_trackers = pd.DataFrame(columns=col_names)

In [3]:
#Find max of each row
trackers['max_value']=trackers.max(axis=1)
trackers.shape

(363327, 11)

In [4]:
trackers.dropna(inplace=True)

In [5]:
trackers.shape

(345729, 11)

In [6]:
#Count movements in each series from the series title (when the count is indicated)
trackers['repetitions']=trackers['exercise'].str.extract('(\d+)_T7', expand=True)
trackers.repetitions = pd.to_numeric(trackers.repetitions,errors='coerce')
#for series of noise, the range is a factor of the length of time / 137 (our chosen standard length based on waterbag_hard)
trackers['length'] = trackers.groupby(['exercise']).transform('count')['gyroX']
fill_value=trackers['length']/137
trackers['repetitions'].fillna(round(fill_value,0),inplace=True)
trackers.repetitions.unique()

array([ 20.,  10.,   8.,   9., 162.,  40.,  29., 114., 110.,  18.,  17.])

In [7]:
#Create a list of exercise names
import glob
list_files=glob.glob('./recordings+7.14+Luc/*.txt')
list_names=[s.strip('(jab_recordings+7.14+Luc/\\.txt').replace('-','_').replace(')','') for s in list_files]

In [8]:
#Basic visualization function
def pre_visualize(data,title):
    import matplotlib.pyplot as plt
    %matplotlib inline
    data.plot(figsize=(12, 6))
    plt.xlabel('time'), plt.ylabel('acceleration'),plt.title(title)
    plt.show()

The below solution doesn't work very well. A demonstration of RapidMiner has shown that instead of trying to identify the local maximum as both a clue on the number of repetitions and the median of one movement, it would be more efficient to train the model to recognize the beginning of a pattern by using the first movement as a model. We propose to try to imitate this RapidMiner approach in another program in the future.

In [9]:
#Find local maximums & extract margin
def find_max(df):
    reps=int(df.repetitions.unique())
    maximums=df.max_value.sort_values(ascending=False)[0:reps] #imperfect solution when the maximum is reached slowly
    maximums.sort_index(inplace=True)
    #Select only the maximums corresponding to the number of repetitions
    margin=int((maximums.index[reps-1]-maximums.index[0])/(reps-1)/2)
    start=df.index[0]
    return maximums.index.values,margin,start

In [15]:
import numpy as np

for i in range(0,len(list_names))[0:10]:
    #Take each exercise one by one
    exercise_indices = trackers[trackers.exercise.str.contains(list_names[i])].index
    exercise=trackers.loc[exercise_indices, :]
    loc_maxs,margin,start=find_max(exercise)
    #Standardize the length of movements (1): intervals are built from a ratio of the mvt_length to 137
    mvt_length=(loc_maxs[0]+1+margin-start)-(loc_maxs[0]-margin-start)
    if mvt_length>137:
        ratio=mvt_length/137
    else:
        ratio=int(round(137/mvt_length,0))
    
    #Separate by individual movements 
    for j in loc_maxs:     
        exercise1=exercise.drop(['exercise','max_value','repetitions','length'],axis=1)
        one_mvt=exercise1[j-margin-start:j+1+margin-start]
          
        #Standardize the length of movements to 1.37 seconds (137 rows) (2)
        #create a custom-made list starting at the start of the df and jumping at regular intervals
        if mvt_length>137:
            new_index=[list(range(j-margin,j+137)[x:x+ratio]) for x in range(0,mvt_length,ratio+1)]
            new_index=np.array(new_index).flatten().tolist()
            print(len(new_index))
            #one_mvt=one_mvt.reindex(new_index)
            #print(one_mvt.shape)
        #if mvt_length<137:
            

#         one_mvt.reset_index(inplace=True)
#         one_mvt=one_mvt.drop('index',axis=1)
#         one_mvt=one_mvt.reindex(range(0,137))
#         a=pd.Series(one_mvt.values.flatten())
#         a=a.rename(index=lambda x:col_names[x]).T
#         flat_trackers=flat_trackers.append(a,ignore_index=True)

#   if i<3:
#       pre_visualize(one_mvt,list_names[i])

301
101
101
101
101
101
101
101
101
101
101
219
146
146
146
146
146
146
146
146
146
146


In [11]:
9984-137-150

9697

In [12]:
#Find movement of max length to determine the necessary padding to the average movement length
# print(trackers[trackers.repetitions<=40]['length'].max())
# print(trackers[trackers.exercise.str.contains('waterbag_hard_20_T7R_1531551')]['length'].unique())

In [13]:
#[k+ratio for k in range(j-margin,j+137,ratio)]
[list(range(0,20)[x:x+2]) for x in range(0,20,3)]

[[0, 1], [3, 4], [6, 7], [9, 10], [12, 13], [15, 16], [18, 19]]

In [14]:
import numpy as np
flatten_list=np.array(_).flatten().tolist()
flatten_list

[0, 1, 3, 4, 6, 7, 9, 10, 12, 13, 15, 16, 18, 19]

In [17]:
import random

def remove(l,n):
    return random.sample(l,int(len(l)*(1-n)))

print(remove(list(range(1,11)),0.25))

[10, 2, 3, 4, 8, 5, 7]
