# Dividing of Data

## Abstract
Below is the code used to divide the data set into `Training`, `Cross-Validation(CV)` and `Testing` portions. The data is split according to subjects(`Users` = `subjects`, this data set contains 14 user data sets. 0 to 14 without `User ID 3`).

This enable us to determine whther our system only works for the users in the training set or if its able to generalize to new unseen users(which is the ideal case).

## Libraries

`import pandas as pd`- Access the Pandas Library. It takes data from files such as CSV or TSV and creates a Python object with rows and columns other known as a DataFrame.

`import numpy as np` - NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers

`import matplotlib.pyplot as plt` - Matplotlib.pyplot is a collection of command style functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure, e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Reading in the files "Posture_new"

`pd.read_csv()` reads a comma-separated values(csv) file into DataFrame.

In [2]:
final_data = pd.read_csv("Posture_new.csv")

`.head()` will display data in `Posture_new.csv` to check if it is indead the correct data that has been read in.

In [3]:
final_data.head(5)

Unnamed: 0,User,X0,Y0,Z0,X1,Y1,Z1,X2,Y2,Z2,X3,Y3,Z3,X4,Y4,Z4,Class
0,0,54.26388,71.466776,-64.807709,76.895635,42.4625,-72.780545,36.621229,81.680557,-52.919272,85.232264,67.749219,-73.68413,59.188576,10.678936,-71.297781,1
1,0,56.527558,72.266609,-61.935252,39.135978,82.53853,-49.596509,79.223743,43.254091,-69.982489,87.450873,68.400808,-70.703991,61.587452,11.779919,-68.827418,1
2,0,55.849928,72.469064,-62.562788,37.988804,82.631347,-50.606259,78.451526,43.567403,-70.658489,86.835388,68.907925,-71.138344,61.686427,11.79344,-68.889316,1
3,0,55.329647,71.707275,-63.688956,36.561863,81.868749,-52.752784,86.32063,68.214645,-72.228461,61.596157,11.250648,-68.956425,77.387225,42.717833,-72.015146,1
4,0,55.142401,71.435607,-64.177303,36.175818,81.556874,-53.475747,76.986143,42.426849,-72.574743,86.368748,67.90126,-72.44465,61.275402,10.841109,-69.279906,1


## Scaling of data

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

def getScaledX(X):
    X_scaled = np.zeros(X.shape)
    scaler = None
    
   
    scaler = StandardScaler().fit(X)
    X_scaled = scaler.transform(X)
   
    
    return X_scaled,scaler

## Splitting Data

In [5]:
user_groups = pd.DataFrame(final_data)

The following user groups `0`,`1`,`2`,`8`,`10`,`11` and `13` will be used in the training data.

In [6]:
user_groups0 = user_groups[final_data.User == 0]
user_groups1 = user_groups[final_data.User == 1]
user_groups2 = user_groups[final_data.User == 2]
user_groups8 = user_groups[final_data.User == 8]
user_groups10 = user_groups[final_data.User == 10]
user_groups11 = user_groups[final_data.User == 11]
user_groups13 = user_groups[final_data.User == 13]

train = [user_groups0,user_groups1,user_groups2,user_groups8,user_groups10,user_groups11,user_groups13]

 The following user groups `4`,`5` and `6` will be used in the cross validation data.

In [7]:
user_groups4 = user_groups[final_data.User == 4]
user_groups5 = user_groups[final_data.User == 5]
user_groups6 = user_groups[final_data.User == 6]

cv = [user_groups4,user_groups5,user_groups6]

 The following user groups `7`,`9`, `12` and `14` will be used in the cross validation data.

In [8]:
user_groups7 = user_groups[final_data.User == 7]
user_groups9 = user_groups[final_data.User == 9]
user_groups12 = user_groups[final_data.User == 12]
user_groups14 = user_groups[final_data.User == 14]

test = [user_groups7,user_groups9,user_groups12,user_groups14]

## Final Step 

In [9]:
result_train = pd.concat(train)

Sliced_X= result_train.loc[:, 'X0':'Z4']
X_scaled,scalerT = getScaledX(Sliced_X)
result_train.loc[:, 'X0':'Z4'] = X_scaled

result_cv = pd.concat(cv)

Sliced_Xcv= result_cv.loc[:, 'X0':'Z4']
Xcv_scaled,scalerT = getScaledX(Sliced_Xcv)
result_cv.loc[:, 'X0':'Z4'] = Xcv_scaled

result_test = pd.concat(test)

Sliced_Xtest= result_test.loc[:, 'X0':'Z4']
Xtest_scaled,scalerT = getScaledX(Sliced_Xtest)
result_test.loc[:, 'X0':'Z4'] = Xtest_scaled

In [10]:
training_set = pd.DataFrame(result_train)
training_set.to_csv('train_data.csv')

crossval_set = pd.DataFrame(result_cv) 
crossval_set.to_csv('cv_data.csv')

testing_set = pd.DataFrame(result_test) 
testing_set.to_csv('test_data.csv')