# Generating Data Set for Neural Network

1) After pre-processing and data augmentation, we merge data from all participants to a single data frame as training data for CNN Model.<br>

2) Data from finger orientation study project (https://doi.org/10.1145/3132272.3134130) is also added to increase the dataset.<br>

3) After merging data from all participants we reshape data into 27 X 15 resembling the screen size of the smarphone and re-label the data to contain only labels as 'finger' and 'phalanx' for binary classification.

In [1]:
import pandas as pd
import numpy as np
import time

In [3]:
# Merging all augmented data and storing it as training data set.
df_train = pd.DataFrame()

for i in range(1, 26):
    df = pd.read_pickle('/home/rahul/Documents/phalanx_detection/pre_processed_aug/Participant_'+str(i)+'.pkl')
    print('Number of Images for Participant ' + str(i)+' are '+str(df.shape[0])+'.')
    df_train = pd.concat([df_train, df], ignore_index=True)

print('Total number of images in training set are ' + str(len(df_train)))

Number of Images for Participant 1 are 31792.
Number of Images for Participant 2 are 42700.
Number of Images for Participant 3 are 37532.
Number of Images for Participant 4 are 46896.
Number of Images for Participant 5 are 53932.
Number of Images for Participant 6 are 66296.
Number of Images for Participant 7 are 66832.
Number of Images for Participant 8 are 48152.
Number of Images for Participant 9 are 52336.
Number of Images for Participant 10 are 47464.
Number of Images for Participant 11 are 48232.
Number of Images for Participant 12 are 45632.
Number of Images for Participant 13 are 52848.
Number of Images for Participant 14 are 48340.
Number of Images for Participant 15 are 54680.
Number of Images for Participant 16 are 59520.
Number of Images for Participant 17 are 52348.
Number of Images for Participant 18 are 51128.
Number of Images for Participant 19 are 51988.
Number of Images for Participant 20 are 36720.
Number of Images for Participant 21 are 52700.
Number of Images for P

In [4]:
# Here we also include data set from finger orientation study project 

df_orientation = pd.read_pickle('/home/rahul/Documents/phalanx_detection/pre_processed_aug/orientationdata.pkl')
df_train = pd.concat([df_train, df_orientation], ignore_index=True,sort = True)
print('Total number of images in training set including orientation data are ' + str(len(df_train)))

Total number of images in training set including orientation data are 1764902


In [5]:
%%time
# convert to 27x15 Matrix image
for i in range(df_train.shape[0]):
    full_matrix = np.zeros(shape=(27, 15))
    cropped_matrix = df_train.Cropped_Matrix[i]
    x,y = cropped_matrix.shape
    full_matrix[:x, :y] = cropped_matrix
    df_train.at[i, 'Cropped_Matrix'] = full_matrix.astype(np.int32)

CPU times: user 2min 18s, sys: 1.84 s, total: 2min 19s
Wall time: 2min 17s


In [7]:
%%time
# Deleting Images which are captured during task changing duration labelled as Pause (to avoid wrong labelling).
for i in range(df_train.shape[0]):
    if df_train.Task[i]== 'PAUSE':
        print(i, df_train.Task[i])
        df_train.drop(i, inplace=True)

229241 PAUSE
245815 PAUSE
262389 PAUSE
278963 PAUSE
285255 PAUSE
301963 PAUSE
318671 PAUSE
335379 PAUSE
996562 PAUSE
1009737 PAUSE
1022912 PAUSE
1036087 PAUSE
CPU times: user 56.8 s, sys: 242 ms, total: 57.1 s
Wall time: 50 s


In [11]:
%%time
# modify labels to finger and phalanx

df_train.rename(columns={'Task': 'Label', 'Cropped_Matrix': 'Input'}, inplace=True)
df_train['Label'] = df_train['Label'].replace({'DRAG': 'finger',
                                               'TAP': 'finger',
                                               'SCROLL':'finger',
                                               'PHALANX_SCROLL': 'phalanx',
                                               'PHALANX_TAP': 'phalanx',
                                               'PHALANX_DRAG': 'phalanx',
                                               })

CPU times: user 1.09 s, sys: 4.03 ms, total: 1.09 s
Wall time: 557 ms


In [13]:
# Number of samples for each class
df_train['Label'].value_counts()

finger     1073386
phalanx     691504
Name: Label, dtype: int64

In [16]:
# save training data in .pkl
df_train.to_pickle('/home/rahul/Documents/phalanx_detection/training_data.pkl')