# Human Activity Recognition with LSTMS

Hello everybody!
This is our *notebook* for Human Activity Recognition with LSTMs with help of simple smartphone-sensor data. 
>**This is part of an Android-Application-Project which will be used for non-intrusive medical surveillance systems.** 

* **The goal shortterm** is to further adapt existing HAR-classifications to a more standard environment (smartphone which is carried in the pocket or in the handbag, etc.) and to implement and combine the whole in a user-friendly (android) application which aims to secure and help the elderly by providing a remote surveillance possibility for realtives. 
* **The goal longterm** is to build an open-source application which can be utalized and adapted to any kind of non-intrusive (no camera, microphone, etc. needed) medical surveillance system for all kinds of different people in need. 

### Ressources

*Speaking for me,* I'm no expert in HAR with deep-learning.
That's why I did some research over already existing projects and tutorials (there are quite some.)
Check them out in our **spreasheet under the RESSOURCE-TAB: **
https://docs.google.com/spreadsheets/d/1EDc84oX6Z9HOHBKIYNhE6iemN4Il1MR1ylGJyiCkh-M/edit#gid=251000036

### What's next?

As I said in our *slack-channel* there will be a small team responsible for different parts of this projects.<br > 
**The topics/groups are the following** *(If you have a better idea, feel free to tell me!)* <br>
**--> ADD YOUR NAME BEHIND ONE (OR MORE) OF THE TOPICS AND START WORKING WITH THE OTHERS IN YOUR TEAM!!**

* _Data-Extraction/Generation and Pre-Processing:_ Standardize, Normalize, Batching-fct., etc.
<br> `In Charge: @Nicolas Remerscheid`
* _The Model:_ Maybe Embedded Layer (Dim. Reduction), LSTMs, Linear-Layer, etc.
<br> `In Charge: ...`
* _The Training:_ Training loops, visualization
<br> `In Charge: ...`
* _Validation and Testing:_ finding best hyperparams (Validation), test-loops, check-points
<br> `In Charge: ...`

One everybody has chosen a section, feel free to edit this notebook and **add your section!**

#### Imports

In [1]:
import numpy as np
import torch 
import os
from torch.utils.data.sampler import SubsetRandomSampler
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt

# For inline-visual.
%matplotlib inline

### 1. Data-Extraction/Generation and Pre-Processing
* For now use mostly the concepts, principles of: *(Added custom implementations at times and explanations)*
> Guillaume Chevalier, LSTMs for Human Activity Recognition, 2016, https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition


In [2]:
######## DEFINED PARAMS ########
# DATA: 
#    - UCI-Dataset: 60999314 data-examples
# Model: 
#    - Input-dimension/features: 
#    - Output-dimension: 6 (WALKING, STANDING, ...)
#    - Seq.-Length: 128 (2.56 sec with 1/2 Hz sensor-output on data in UCL data-set) -> TO BE VERIFIED!
#    - Overlap: 50% between data-windows 
#    - Batch-Size: 1500 (1500 Sequences to be processed in parallel -> vectorization!)
#    -> More Params to be defined!

# Note/Reminder: unrolled LSTM-Layer consists of the same strcuture -> one pair of weights per Layer/Cell

In [3]:
# Useful Constants

# Those are separate normalised input features for the neural network 
# Equal exactly the folder names of the UCI-Data-set -> array later used for data loading!
INPUT_SIGNAL_TYPES = [
    "body_acc_x_",
    "body_acc_y_",
    "body_acc_z_",
    "body_gyro_x_",
    "body_gyro_y_",
    "body_gyro_z_",
    "total_acc_x_",
    "total_acc_y_",
    "total_acc_z_"
]

# Output classes to learn how to classify 
LABELS = [
    "WALKING", 
    "WALKING_UPSTAIRS", 
    "WALKING_DOWNSTAIRS", 
    "SITTING", 
    "STANDING", 
    "LAYING"
]



### Downloading the UCI-Dataset
* More information on the data-set: UCI Machine Learning Repository
* **for now:** 1:1 from Guillaume Chevalier

In [4]:
# Note: Linux bash commands start with a "!" inside those "ipython notebook" cells

DATA_PATH = "data/"

# ******* Downloading-mechanism (with 'download_dataset.py')*******
#!pwd && ls
#os.chdir(DATA_PATH)
#!pwd && ls

#!python download_dataset.py

#!pwd && ls
#os.chdir("..")
#!pwd && ls

# -> Also manually implementable!

DATASET_PATH = DATA_PATH + "UCI HAR Dataset/"
print("\n" + "Dataset is now located at: " + DATASET_PATH)


Dataset is now located at: data/UCI HAR Dataset/


### Utility functions for training
* data-processing-functions
* How Guillaume Chevalier did batching in his project, ...

In [5]:
# TODO: UNDERSTAND WORKING PRINCIPLE (of the for-loop)

#def extract_batch_size(_train, step, batch_size):
#    # Function to fetch a "batch_size" amount of data from "(X|y)_train" data.

    # TODO: Why "list" is used? 
#    shape = list(_train.shape)
    # Only first dimension has to be changed: as TIME-STEP x INPUT-VECTOR remains the same 
    # Only number of seauences varies -> has to be limited to batch_size 
#    shape[0] = batch_size
#    batch_s = np.empty(shape)

#    for i in range(batch_size):
        # Loop index
        # step := time-steps per sequence 
#        index = ((step-1)*batch_size + i) % len(_train)
        # First index of _train := index of sequence 
#        batch_s[i] = _train[index]

#    return batch_s


def one_hot(y_, n_classes):
    # Function to encode neural one-hot output labels from number indexes
    # e.g. WITH 3 ENTRIES IN Y:
    # one_hot(y_=[[5], [0], [3]], n_classes=6):
    #     return [[0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0]]

    y_ = y_.reshape(len(y_))
    return np.eye(n_classes)[np.array(y_, dtype=np.int32)]  # Returns FLOATS

### Preparing dataset:
* **Loading-Mechanism:** *from Guillaume Chevalier*
* Added seperation of the TEST-Dataset into 30% Validation and 70% Testing
* Instead of implementing the batching thru a function (extract_batch_size) made us of DATA-LOADERS

In [6]:
# UCI dataset is stored with a classic folder structure 
TRAIN = "train/"
TEST = "test/"


# ***** FUNCTION DESCRIPTION *****
# Input (Argument): X_signals_paths := array which contains different path-locations
# Ouput: 3D-array containing the complete uci-data-set (sorted in the categories)
# Functionality: Takes as input array of paths to different data-examples (sorted) and load them 
#                one by one into a single 3D array: TIME-STEP x INPUT-VECTOR(one row) x SERIES 
# SERIES are ordered after ROWS in the different FILES which PATHS are stored in INPUT_SIGNAL_TYPES

# TODO: DIM-DESC of RETURN-ARR to ne VERIFIED!

def load_X(X_signals_paths):
    X_signals = []

    for signal_type_path in X_signals_paths:
        file = open(signal_type_path, 'r')
        # Read dataset from disk, dealing with text files' syntax
        X_signals.append(
            [np.array(serie, dtype=np.float32) for serie in [
                row.replace('  ', ' ').strip().split(' ') for row in file
            ]]
        )
        file.close()

    return np.transpose(np.array(X_signals), (1, 2, 0))

# ***** FUNCTION DESCRIPTION *****
# SAME AS FOR X but only stored in one .txt file in one location 

# TODO: 
#     1. Check how order is concerning different input-signal-types 
#     2. Check wether we use MANY_TO_ONE or MANY_TO_MANY Architecture

def load_y(y_path):
    file = open(y_path, 'r')
    # Read dataset from disk, dealing with text file's syntax
    y_ = np.array(
        [elem for elem in [
            row.replace('  ', ' ').strip().split(' ') for row in file
        ]],
        dtype=np.int32
    )
    file.close()

    # Substract 1 to each output class for friendly 0-based indexing
    return y_ - 1

# 1. Load Train and Testing Data in: INPUTS

# UCI-Data is seperated in the following folder structure: i.e.: body acc.-data for training
# data/UCI HAR Dataset/train/Interial Signals/body_acc_x_/train.txt
X_train_signals_paths = [
    DATASET_PATH + TRAIN + "Inertial Signals/" + signal + "train.txt" for signal in INPUT_SIGNAL_TYPES
]
X_test_signals_paths = [
    DATASET_PATH + TEST + "Inertial Signals/" + signal + "test.txt" for signal in INPUT_SIGNAL_TYPES
]

# Input path-arrays into the load-function 
x_train = load_X(X_train_signals_paths)
x_test = load_X(X_test_signals_paths)

# 2. Load Test and Training Data in: TARGET/LABELS 

y_train_path = DATASET_PATH + TRAIN + "y_train.txt"
y_test_path = DATASET_PATH + TEST + "y_test.txt"

y_train = load_y(y_train_path)
y_test = load_y(y_test_path)


# 3. One-Hot encode LABEL/TARGET data for later error-usage 

# Resulting dimension: #DATA_EXAMPLES x #CLASSES(=6)
y_train = one_hot(y_train, 6)
y_test = one_hot(y_test, 6)


# 4. Seperate Testing data into validation data and testing data 

# assumption: testing data is already shuffled -> no need to further shuffle before splitting into testing and val.
limit_ind = int(len(x_test) * 0.3)
x_val, y_val = x_test[:limit_ind], y_test[:limit_ind]
x_test, y_test = x_test[limit_ind:], y_test[limit_ind:]


In [7]:
# Create Data-Loaders for later use in Training, Vaidatiion and Testing Loops 
from torch.utils.data import TensorDataset, DataLoader

# 5. create Tensor datasets

train_data = TensorDataset(torch.from_numpy(x_train), torch.from_numpy(y_train))
valid_data = TensorDataset(torch.from_numpy(x_val), torch.from_numpy(y_val))
test_data = TensorDataset(torch.from_numpy(x_test), torch.from_numpy(y_test))

# 6. Define BASIC TRAINING PARAMS (1500 from Guillaume Chevalier's Project)
# TODO: Adjust params if necessary (@the person who does the training)
batch_size = 1500

# 7. Create Data-Loaders and do the SHUFFLING as well 

train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
valid_loader = DataLoader(valid_data, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_data, shuffle=True, batch_size=batch_size)

In [8]:
# obtain one batch of training data -> TO TEST if Data-Loader are functioning properly!
dataiter = iter(train_loader)
sample_x, sample_y = dataiter.next()

print('Sample input size: ', sample_x.size()) # batch_size, seq_length
print('Sample input: \n', sample_x)
print()
print('Sample label size (one-hot-encoded): ', sample_y.size()) # batch_size
print('Sample label: \n', sample_y)

Sample input size:  torch.Size([1500, 128, 9])
Sample input: 
 tensor([[[-0.0557,  0.0406, -0.1243,  ...,  0.9464, -0.2393, -0.1661],
         [-0.0604,  0.0076, -0.1454,  ...,  0.9417, -0.2718, -0.1861],
         [-0.0635,  0.0014, -0.1504,  ...,  0.9387, -0.2777, -0.1900],
         ...,
         [ 0.3361,  0.1754, -0.1093,  ...,  1.3202, -0.1211, -0.2267],
         [ 0.3717,  0.0962, -0.1715,  ...,  1.3555, -0.2003, -0.2888],
         [ 0.1735, -0.0609, -0.1232,  ...,  1.1570, -0.3573, -0.2404]],

        [[ 0.0002,  0.0057, -0.0197,  ...,  1.0071, -0.2273, -0.0802],
         [-0.0014,  0.0053, -0.0168,  ...,  1.0056, -0.2275, -0.0774],
         [-0.0017,  0.0012, -0.0206,  ...,  1.0053, -0.2313, -0.0812],
         ...,
         [ 0.0340,  0.1306,  0.0128,  ...,  1.0402, -0.1039, -0.0269],
         [ 0.0337,  0.1500,  0.0059,  ...,  1.0400, -0.0836, -0.0346],
         [ 0.0257,  0.1688,  0.0051,  ...,  1.0323, -0.0639, -0.0362]],

        [[ 0.2448,  0.1185, -0.2049,  ...,  1.2364, -

## 2. The Model
### Basic params 
* Centrally fix all **params** and the overall **network-structure**
* Params **1:1 same as from Guillaume Chevalier** *-> to be adapted and tuned in the future* 

In [None]:
############ START OF 2.,3.,4. TOPIC (Model, Training, Validation/Test.) ############
# MY SUGGESTION: ALSO USE EXISTING PROJECTS (i.e.: the SAME as me) AS BASIS 
# WE CAN THEN FIRSTLY REPRODUCE THIS AND IMPLEMENT IT IN AN APPLICATION AND THE TUNE IT!

# The training and testing can be done exactly THE SAME AS in the SENTIMENT_RNN_EXERCISE 
# Use the DATA-LOADERS to create the TRAIN-, VAL- and TEST-LOOPS!

# Input Data
training_data_count = len(X_train)  # 7352 training series (with 50% overlap between each serie)
test_data_count = len(X_test)  # 2947 testing series
n_steps = len(X_train[0])  # 128 timesteps per series
n_input = len(X_train[0][0])  # 9 input parameters per timestep

# LSTM Neural Network's internal structure

n_hidden = 32 # Hidden layer num of features
n_classes = 6 # Total classes (should go up, or should go down)

# Training 
learning_rate = 0.0025
lambda_loss_amount = 0.0015 # Depending on optimization-algorithm NOT NECESSARY
training_iters = training_data_count * 300  # Loop 300 times on the dataset
display_iter = 30000  # To show test set accuracy during training

# Some debugging info

print("Some useful info to get an insight on dataset's shape and normalisation:")
print("(X shape, y shape, every X's mean, every X's standard deviation)")
print(X_test.shape, y_test.shape, np.mean(X_test), np.std(X_test))
print("The dataset is therefore properly normalised, as expected, but not yet one-hot encoded.")

In [None]:
# TODO: MODEL-CREATION

## 3. Training & Validation
### To be edited

In [None]:
# TODO: Implement Training-Loop and Validation-Loop (same as in SENTIMENT_RNN_EXERCISE)

## 4. Testing
### To be edited

In [None]:
# TODO: Implement TEST-Loop and Visualizations(same as in SENTIMENT_RNN_EXERCISE)

## 5. Deployement 
### To be imported into an android application
* After converting the trained model to a KERAS-Model it has to be converted to a TensorFlow Mobile or TensorFlow Light Model and then imported into an Android-Application, I can then be used for inference theree. 
* A very good tutorial: https://heartbeat.fritz.ai/deploying-pytorch-and-keras-models-to-android-with-tensorflow-mobile-a16a1fb83f2

In [None]:
# TODO: Save trained model 