In [None]:
#Question 3  Project

Wearable sensors have become increasingly popular over the last few years with the success of smartphones, fitness trackers, and smart watches. Current generations of phones even contain special low power cores the constantly monitor movement. This allows them to act as pedometers for example with a negligible impact on battery life. Other use cases exist as well. For example, with android phones you can set your phone to stay unlocked until it detects that it hasn't moved for some time. The newest version on android even puts your phone in deep sleep to save power if the phone has not moved for a set period of time. This I feel is just the tip of the iceberg with it being possible to apply the data to different applications.

I propose that we apply this data to advertising. For example, I show that just by using accelerometer data if it possible to differentiate between distracted states such as standing while talking, and walking while talking vs. standing, walking, and working at the computer. I argue that a person is more likely to interact with the advertisement if there is less competing stimulus from other people. This would allow a company to sell ads at a higher price if the persons phone lists them as not being distracted. 

To show the feasibly I use sensor data from UCI that only contains the raw x,y,z accelerometer data.  The data is labeled into 7 categories from 15 subjects.  (Working at Computer, Standing Up, Walking and Going Up/Down stairs, Standing, Walking, Going Up/Down Stairs, Walking while Talking, Standing while Talking.)

Figure 1 ![Figure 1](https://raw.githubusercontent.com/damienrj/data_incubator/master/data.png) shows an example of the data where I calculate the magnitude of the acceleration vs. time.  The plot is color coded to indicate the labeled state.   I then preformed feature engineering to generate more features included low pass and high pass filtered versions of the data to highlight different behaviors.  I also used a rolling average to calculate the average acceleration within a one second window for the data.  The data was then split into training and testing sets.  Applying a random forest classifier with k-folds cross validation gave a score of 93% to the training data.  Using the test data, I also archived 93%.  Figure 2 ![Matrix](https://raw.githubusercontent.com/damienrj/data_incubator/master/matrix.png) is a confusion matrix generated showing that the model is predicting the different labels.

This work clearly shows it is possible to differentiate between states,  and this is only using one accelerometer. Most smartphones have many more available sensors that would make it possible to archive even higher accuracy. 




In [463]:
import random
import numpy as np
import pandas as pd
from __future__ import division
from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split
from sklearn.cross_validation import KFold
from scipy import signal
from sklearn.metrics import confusion_matrix
import seaborn as sns
#Sampling frequency of the accelerometer: 52 Hz 
labels={1: 'Working at Computer', 2:'Standing Up, Walking, and Going Up/Down Stairs',
        3:'Standing', 4:'Walking', 5:'Going Up/Down Stairs', 6: 'Walking and Talking', 7:'Standing and Talking'}

# Buterworth low pass filter
N  = 2    # Filter order
Wn = .1 # Cutoff frequency of 2.6 hz
B, A = signal.butter(N, Wn)

#High pass
Wn = .9 # Cutoff frequency of 24 hz
Bh, Ah = signal.butter(N, Wn, 'high')

data=[]
for a in range(1,16):
    df = pd.read_csv('ActivityData/' + str(a) + '.csv' , header=None)
    df.drop(0, axis=1, inplace=True)
    df.columns = ['Accx', 'Accy', 'Accz', 'Label']
    df['ID'] = a
    df['Accx_filt'] = signal.filtfilt(B,A, df.Accx)
    df['Accy_filt'] = signal.filtfilt(B,A, df.Accy)
    df['Accz_filt'] = signal.filtfilt(B,A, df.Accz)
    df['Accx_filt_h'] = signal.filtfilt(Bh,Ah, df.Accx)
    df['Accy_filt_h'] = signal.filtfilt(Bh,Ah, df.Accy)
    df['Accz_filt_h'] = signal.filtfilt(Bh,Ah, df.Accz)
    df['Mean_x'] = pd.rolling_mean(df.Accx, 52)
    df['Mean_y'] = pd.rolling_mean(df.Accy, 52)
    df['Mean_z'] = pd.rolling_mean(df.Accz, 52)
    if len(data)==0:
        data = df
    else:
        data = pd.concat([data, df])
        
    
data = data[~(data.Label==0)]
data['Mag']= np.sqrt(data.Accx**2 + data.Accy**2 + data.Accz**2)

#Remove edges of data where there is no rolling averages (~50 rows per subject)
data=data[~data.Mean_x.isnull()]

features = ['Mag', 'Accx', 'Accy', 'Accz', 'ID', 'Accx_filt',
            'Accy_filt', 'Accz_filt', 'Accx_filt_h', 'Accy_filt_h',
            'Accz_filt_h', 'Mean_x', 'Mean_y', 'Mean_z']

#Split into training and testing data
features_train, features_test, labels_train, labels_test = train_test_split(data[features], data['Label'])

#Simple K-Fold cross validation. 10 folds.
cv = KFold(len(features_train), n_folds=10, indices=False)

#iterate through the training and test cross validation segments and
#run the classifier on each one, aggregating the results into a list
results = []
for traincv, testcv in cv:
    rf.fit(features_train[traincv], labels_train[traincv])
    results.append(rf.score(features_train[testcv], labels_train[testcv]))
    
    
print('Mean score ' + str(np.mean(results)))

#Train model on whole training set
rf = RandomForestClassifier(n_estimators=20, n_jobs=-1, verbose=1, min_samples_leaf=3, oob_score=True)
rf.fit(features_train, labels_train)

print('Test score ' + str(rf.score(features_test,labels_test)))
#Generate confusion_matrix
con_matrix = confusion_matrix(labels_test, rf.predict(features_test))
cm_normalized = cm.astype('float') / cm.sum(axis=1)


%matplotlib 

plt.imshow(cm_normalized, interpolation='nearest', cmap=plt.cm.Blues)

plt.title('Confusion Matrix')
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.colorbar()
plt.xticks([0,1,2,3,4,5,6], [1,2,3,4,5,6,7])
plt.yticks([0,1,2,3,4,5,6], [1,2,3,4,5,6,7])
plt.grid(False)
plt.savefig("matrix.png")

#Make plot for user 12
user = data[data.ID==12]
user.Mag-=np.min(user.Mag)
user.Mag /= np.max(user.Mag)
legend_text=[]

t = np.arange(0, len(user.Label))/52
sns.set_palette(sns.color_palette("Set2", 8))
for a in range(1,8):
    plt.plot(t[np.array(user.Label==a)], user.Mag[user.Label==a], linewidth = 1)
    legend_text.append(labels[a])

plt.legend(legend_text,'best')
plt.xlabel('Time(s)')
plt.ylabel('Acceleration Magnitude (normalized)')
plt.title('Subject 12 Data')
plt.xlim(xmax=max(t))
plt.savefig("data.png")

Using matplotlib backend: MacOSX
