
<img src="https://www.tibco.com/blog/wp-content/uploads/2015/08/tibco-logo-620x360.jpg" style="float: left; width: 30%; margin-right: 1%; margin-bottom: 0.5em;"><img src="https://www.skylinelabs.in/blog/images/tensorflow.jpg" style="float: center; width: 40%; margin-right: 1%; margin-bottom: 0.5em;">
<p style="clear: both;">

# Human Activity Detection using TensorFlow and Flogo

Author : Venkata Jagannath, Data Scientist, TIBCO Software

Date : October 17, 2017



In [1]:
import IPython

## Deep neural networks and TensorFlow

### Introduction to neural networks

Neural networks are part of a specialized category of machine learning called deep learning. These advanced models are used in supervised and unsupervised learning tasks to find non-linear patterns between a set of variables. 



### TensorFlow

TensorFlow is an open source deep learning library first released by Google in Nov 2015. It has since become the most popular library used for both development and production tasks. The backend allows users to deploy tasks to muiltple CPUs or GPUs. TensorFlow models can be outputed as [protocol buffer](https://developers.google.com/protocol-buffers/docs/overview) files.


### Introduction

#### Problem

Sensors such as accelerometers produce several records of data every second. As sensors become more commonplace, having these devices communicate with web servers will give rise to latency and bandwidth issues. Data privacy is also a big concern. To address these problems, there is a need to let the data remain on the edge device and make decisions on the device itself. This approach has the following advantages -

* Speed of execution - Since there are no latency & bandwidth issues, the speed of execution will greatly improve.
* Cost of maintenance - MNCs can avoid the cost of setting up and maintaining huge servers to store and process data
* Usage during network disruptions - The devices can be used even during network disruptions
* Data privacy - Since the data does not leave the device, the issue of data privacy does not arise.


#### Goal

Our objective with this code is to learn a classifier that can accurately predict a human activity based on accelerometer readings. We will also need to output a protobuffer file that can be deployed directly on the edge device for scoring at data source.


#### Approach

For this project, we will be using the SEMMA approach - **S**ample the data, **E**xplore the sample for patterns, **M**odify the data, Build predictive **M**odels & **A**nalyze the results

#### Import all necessary python packages

python packages including tensorflow can be installed from the command line using -

pip install tensorflow

In [2]:
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn import metrics

## Sample

The data collected from an accelerometer contains three columns - 'x', 'y' & 'z'. The accelerometer outputs 20 records of data every second. An intial sample of 21 seconds of data for each activity is collected.


#### Specify the Inputs:

**'data_location' :** location of the data files

**'model_output_loc' :** location where the protoBuffer file must be saved

**'hidden_units' :** Number of hidden layers in DNN Classifier model and number of neurons in each layer.

**'learn_rate' :** Learning rate for the neural network

In [3]:
data_location = './tn_training_data/'
model_output_loc = './models/TB/'
hidden_units = [100, 40, 3]
learn_rate = 0.01

## Explore & Modify

While exploring the above 21 seconds of data, we can observe a pattern for each activity. 

Autocorrelation plots over a period of values provide insights on how correlated a value at time t1 is to a value at time t2. 

Based on the insights obtained from the a TIBCO Spotfire analysis, we can conclude that a time lag of 10 is likely to capture significant variations between the three activities. 

We collect 5 training samples for each of these activities while altering the orientation of the accelerometer device. 

Using this information to create a final training dataset using the following steps -

* Read in all five datasets for each activity
* For each dataset of each activity - 
    * create 10 new features for each of 'x','y' & 'z' (lag - 1 to 10)
    * Drop 'na' values in the temporary dataframe
    * Append to final data frame
* Return final data frame to be use for training our model

In [4]:
def training_data(path,label,lag):
    
    filepath = path
    data = pd.DataFrame()
    
    for file in ["","2","3","4","5"]:
        
        path = filepath + label.lower() +file +".csv"
        
        column_names = ['x', 'y', 'z']
        temp = pd.read_csv(path,header = None, names = column_names)
        temp['activity'] = label
        
        for i in range(1,lag+1):
            temp['x'+str(i)] = temp['x'].shift(-1 * i)
            temp['y'+str(i)] = temp['y'].shift(-1 * i)
            temp['z'+str(i)] = temp['z'].shift(-1 * i)
        
        temp = temp.dropna()
        
        if data.empty:
            data = temp
        else:
            data = data.append(temp)
    
    return data

    
dataset = training_data(data_location,"jogging",10)
dataset = dataset .append(training_data(data_location,"Walking",10))
dataset = dataset .append(training_data(data_location,"Standing",10))

Separate the final dataset into training & test data (80-20 split) 

In [5]:
msk = np.random.rand(len(dataset)) < 0.8
train = dataset[msk]
test = dataset[~msk]

Create variables for number of columns, continuous cols, label col & the label names

In [6]:
num_labels = train['activity'].nunique()
cont_cols = [x for x in list(train.columns.values) if x != 'activity']
lab_col = 'activity'
label_names = list(train['activity'].unique())

Use a function to convert a pandas dataframe to tensors

In [7]:
tf.logging.set_verbosity(0)

def get_input_fn_from_pandas(data_set, num_epochs=None, shuffle=False):
    
    return tf.estimator.inputs.pandas_input_fn(x=data_set[cont_cols],y=data_set[lab_col],num_epochs=num_epochs,shuffle=shuffle)

feat_cols = [tf.feature_column.numeric_column(feat) for feat in cont_cols]

## Train DNNClassifier model

Fit a DNN classifer with the above specified hidden layers and learn rate

In [8]:
clf = tf.estimator.DNNClassifier(model_dir=model_output_loc,hidden_units=hidden_units,
                                 feature_columns=feat_cols,n_classes=num_labels,
                                 label_vocabulary= label_names,
                                 optimizer= tf.train.ProximalAdagradOptimizer(learning_rate=learn_rate,
                                                                                         l1_regularization_strength=0.001))
clf.train(input_fn=get_input_fn_from_pandas(train),steps=10000)

<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x290a4c4e550>

## Evaluate

Predict on the 20% unseen dataset

In [9]:
clf.evaluate(get_input_fn_from_pandas(test,10))

{'accuracy': 0.94217604,
 'average_loss': 0.22659916,
 'global_step': 10000,
 'loss': 28.962204}

## Output .pb file

Create a protobuffer file and save it to the model output location specified above

In [10]:
feature_spec = tf.estimator.classifier_parse_example_spec(feat_cols,label_key=lab_col)
serving_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)
servable_model_path = clf.export_savedmodel(model_output_loc, serving_input_fn, as_text=False)


print ("Tensorflow model saved at : " + model_output_loc)

Tensorflow model saved at : ./models/TB/
