### Human Activity Recognition (HAR):
- __Definition:__ Identifying specific movements or actions of a person using sensor data.
- __Typical Activities:__ Includes walking, talking, standing, sitting, and more focused activities like cooking or factory work.
- __Sensor Data Sources:__
  - Remote Recording: Video, radar, or other wireless methods.
  - Direct Recording: Sensors on the person, such as accelerometers and gyroscopes in smartphones or custom hardware.
- __Historical Context:__
  1. Challenges:
      - Sensor data collection was once challenging and expensive, requiring custom hardware.
  3. Modern Solutions:
      - Smartphones and Personal Devices: Now ubiquitous and inexpensive, making sensor data collection easier and more common.
      - Fitness and Health Monitoring: Common applications of HAR with readily available data.
- __Problem Statement:__
  - Objective: Predict the activity from a snapshot of sensor data.
  - Data Types: Typically involves univariate or multivariate time series data from one or more sensor types.
- __Challenges:__
  - Data Variability: Different subjects perform activities differently, leading to variations in sensor data.
  - Modeling Difficulty: No direct way to relate sensor data to specific activities.
- __Approach:__
  - Data Collection: Record sensor data and corresponding activities from specific subjects.
  - Model Training: Fit a model using this data.
  - Generalization: Use the trained model to classify activities of new, unseen subjects based on their sensor data.

Two main neural network approaches are effective for time series classification and have shown strong performance in activity recognition using sensor data from smartphones and fitness trackers:
1. Convolutional Neural Network (CNN) Models.
2. Recurrent Neural Network (RNN) Models.
   
__Recommendations:__
- __RNN and LSTM:__
  - Best For: Recognizing short activities with a natural order.
  - Reason: They utilize the time-order relationship between sensor readings.
- __CNN:__
  - Best For: Inferring long-term repetitive activities.
  - Reason: They excel at learning deep features in recursive patterns.

### Data Preparation for Time Series Classification:

__Sliding Window Approach:__
- Definition: Dividing input signal data into windows of signals, where each window may contain one to a few seconds of observation data. This is often called a sliding window.
- Usage: Applied in both classical machine learning methods on hand-crafted features and neural networks.
  
__Window Size Considerations:__ 
No Best Window Size: Depends on the model, nature of the sensor data, and the activities being classified.

__Trade-offs:__
- Larger Windows: Require larger models, slower to train.
- Smaller Windows: Require smaller models, faster to train and fit.
  
__Intuitive Effects:__
- Smaller Windows: Faster activity detection, reduced resource and energy needs.
- Larger Windows: Better for recognizing complex activities.
  
__Overlapping Windows:__
- Purpose: Mitigates the risk of missing transitions between activities by overlapping the end of one window with the start of the next.
- Common Overlap: 50%, where the first half of the new window contains the last half of the previous window.
  
__Risks:__
- Transition Errors: Errors can appear at the beginning or end of activities.
- Incorrect Lengths: Truncated activity instances due to improper window lengths.
  
__Overlap in Neural Networks:__
- Effect:
  - Increases Training Data: A 50% overlap doubles the size of the training data, useful for smaller datasets.
  - Risk of Overfitting: Larger training data can lead to overfitting.
- Usage:
  - Common in Some Applications: Overlapping windows are tolerated and useful in specific contexts.
  - Less Frequent: Not always necessary and less frequently used with neural networks.

### Combined CNN-LSTM Model for HAR:
- __Common Approach:__ Using an LSTM in conjunction with a CNN for Human Activity Recognition (HAR) problems.
- __Model Types:__
  1. CNN-LSTM Model.
  2. ConvLSTM Model.
- __How It Works:__
- CNN:
  - Purpose: Extracts features from subsequences of raw sample data.
  - Function: Processes short-term dependencies and patterns in the data.
- LSTM:
  - Purpose: Interprets the features extracted by the CNN.
  - Function: Aggregates and models long-term dependencies and sequential relationships.

### Human Activity Recognition Dataset
- __Problem:__ Classifying sequences of accelerometer data from specialized harnesses or smartphones into known movements.
- __Challenges:__
  - Large number of observations per second.
  - Temporal nature of observations.
  - Difficulty in relating accelerometer data to specific movements.
- __Standard Dataset:__ Activity Recognition Using Smartphones
  - Published: 2012 by Davide Anguita et al., University of Genova, Italy.
  - Paper: "A Public Domain Dataset for Human Activity Recognition Using Smartphones" (2013).
  - Link: https://www.esann.org/sites/default/files/proceedings/legacy/es2013-84.pdf.
- __Dataset Details:__
  - Participants: 30 volunteers aged 19-48 years.
  - Protocol:
       - Each participant performed activities wearing a waist-mounted Samsung Galaxy S II smartphone.
       - Six activities: standing, sitting, lying down, walking, walking downstairs, walking upstairs.
       - Each activity was performed twice: first with the phone fixed on the left side of the belt, then as preferred by the user.
  - Pre-processing:
       - Noise filtering on accelerometer and gyroscope data.
       - Data split into fixed windows of 2.56 seconds (128 data points) with 50% overlap.
       - Accelerometer data split into gravitational (total) and body motion components.    
- __Dataset Usage:__
  - Link: https://raw.githubusercontent.com/jbrownlee/Datasets/master/HAR_Smartphones.zip
  - Download all the files in your working directory, unzip them, and rename the folder to "HARDataset".
- __Contents:__
  - Directories:
       - train: Training data (70% of the dataset).
       - test: Testing data (30% of the dataset).
  - Files:
       - README.txt: Detailed technical description.
       - features.txt: Description of engineered features.     
- __Train and Test Folders:__ Both folders contain similar files but with different data.
- __Important Files in Train Folder:__
  - Inertial Signals Folder: Contains preprocessed data.
  - X_train.txt: Engineered features for model fitting.
  - y_train.txt: Class labels for each observation (1-6).
  - subject_train.txt: Mapping of each data record to a subject identifier (1-30).
- __Inertial Signals Directory:__
  - Gravitational Acceleration Data: total_acc_x_train.txt, total_acc_y_train.txt, total_acc_z_train.txt.
  - Body Acceleration Data: body_acc_x_train.txt, body_acc_y_train.txt, body_acc_z_train.txt.
  - Body Gyroscope Data: body_gyro_x_train.txt, body_gyro_y_train.txt, body_gyro_z_train.txt.
- __Data Format:__
  - Separation: Columns are separated by whitespace.
  - Scaling: Values appear scaled to the range -1 to 1, confirmed by the README.txt file.

### Load and Explore Human Activity Data

In [4]:
import numpy as np
import pandas as pd

In [5]:
# Load a single file from the HAR dataset as a numpy array
def load_file(filepath):
    """
    Load a file from the HAR dataset.
    
    Parameters:
    filepath (str): The path to the file to load.
    
    Returns:
    numpy.ndarray: The loaded data.
    """
    dataframe = pd.read_csv(filepath, header=None, delim_whitespace=True)
    return dataframe.values

In [22]:
# Load the total_acc_y_train.txt file and print its shape
data = load_file('HARDataset/train/Inertial Signals/total_acc_y_train.txt')
# The training data is comprised of 7,352 rows or windows of data, where each window has 128 observations
print(data.shape)

(7352, 128)


In [23]:
data[0]

array([-0.1232167, -0.1268756, -0.1240037, -0.1249279, -0.1257667,
       -0.124462 , -0.1273606, -0.1278912, -0.1258682, -0.1243682,
       -0.1231382, -0.1213345, -0.1183578, -0.120062 , -0.1221186,
       -0.12008  , -0.1209017, -0.1213949, -0.1215677, -0.1246812,
       -0.1254896, -0.1249345, -0.1249063, -0.1249926, -0.1251552,
       -0.1247985, -0.1254793, -0.1268068, -0.1272888, -0.123713 ,
       -0.1192631, -0.1226967, -0.1271224, -0.126278 , -0.1261419,
       -0.1251686, -0.121594 , -0.1190558, -0.1179128, -0.1174034,
       -0.1172102, -0.1181487, -0.1185709, -0.1179084, -0.1205067,
       -0.1243031, -0.1256299, -0.1246896, -0.1218014, -0.1202801,
       -0.1206562, -0.1210648, -0.1216185, -0.1241114, -0.1280997,
       -0.1280257, -0.126537 , -0.1274474, -0.1273523, -0.1264597,
       -0.1247455, -0.1236691, -0.1229069, -0.1215528, -0.123976 ,
       -0.1268078, -0.1277862, -0.1266547, -0.1236336, -0.1249187,
       -0.1243005, -0.1197982, -0.1192223, -0.120174 , -0.1213

In [24]:
# Load a list of files and stack them as a 3D numpy array
def load_group(filenames, prefix=''):
    """
    Load multiple files and stack them into a single 3D numpy array.
    
    Parameters:
    filenames (list): List of filenames to load.
    prefix (str): Prefix path to the files.
    
    Returns:
    numpy.ndarray: The stacked data.
    """
    loaded = [load_file(prefix + name) for name in filenames]
    return np.dstack(loaded)

In [26]:
# Load the total accelerometer data (x, y, z) for training
filenames = ['total_acc_x_train.txt', 'total_acc_y_train.txt', 'total_acc_z_train.txt']
total_acc = load_group(filenames, prefix='HARDataset/train/Inertial Signals/')
print(total_acc.shape)
# (samples, timesteps, features)

(7352, 128, 3)


In [27]:
total_acc[0]

array([[ 1.012817  , -0.1232167 ,  0.1029341 ],
       [ 1.022833  , -0.1268756 ,  0.1056872 ],
       [ 1.022028  , -0.1240037 ,  0.1021025 ],
       [ 1.017877  , -0.1249279 ,  0.1065527 ],
       [ 1.02368   , -0.1257667 ,  0.1028135 ],
       [ 1.016974  , -0.124462  ,  0.1074931 ],
       [ 1.017746  , -0.1273606 ,  0.1093857 ],
       [ 1.019263  , -0.1278912 ,  0.1038862 ],
       [ 1.016417  , -0.1258682 ,  0.1024732 ],
       [ 1.020745  , -0.1243682 ,  0.0975659 ],
       [ 1.018643  , -0.1231382 ,  0.09764665],
       [ 1.019521  , -0.1213345 ,  0.09537356],
       [ 1.02026   , -0.1183578 ,  0.09367106],
       [ 1.018041  , -0.120062  ,  0.09921876],
       [ 1.020829  , -0.1221186 ,  0.09997368],
       [ 1.018644  , -0.12008   ,  0.09889572],
       [ 1.019398  , -0.1209017 ,  0.0962825 ],
       [ 1.020399  , -0.1213949 ,  0.09765831],
       [ 1.019222  , -0.1215677 ,  0.1004408 ],
       [ 1.022093  , -0.1246812 ,  0.09846986],
       [ 1.020433  , -0.1254896 ,  0.101

- Given the parallel structure of the train and test folders, we will create a new function to load input and output data for a specified folder.
- This function compiles a list of the 9 data files, combines them into a NumPy array with 9 features, and loads the output class data.
- The load_dataset() function below can be used for either the train or test group by passing the group name as an argument.

In [28]:
# Load a dataset group, such as train or test
def load_dataset(group, prefix=''):
    """
    Load all data for a given dataset group (train or test).
    
    Parameters:
    group (str): The dataset group to load ('train' or 'test').
    prefix (str): Prefix path to the dataset.
    
    Returns:
    tuple: Input data (X) and output class data (y).
    """
    filepath = prefix + group + '/Inertial Signals/'
    # List all 9 data files to load
    filenames = [
        # Total acceleration
        'total_acc_x_'+group+'.txt', 'total_acc_y_'+group+'.txt', 'total_acc_z_'+group+'.txt',
        # Body acceleration
        'body_acc_x_'+group+'.txt', 'body_acc_y_'+group+'.txt', 'body_acc_z_'+group+'.txt',
        # Body gyroscope
        'body_gyro_x_'+group+'.txt', 'body_gyro_y_'+group+'.txt', 'body_gyro_z_'+group+'.txt'
    ]
    # Load input data
    X = load_group(filenames, filepath)
    # Load class output
    y = load_file(prefix + group + '/y_'+group+'.txt')
    return X, y

In [29]:
# Example usage
X_train, y_train = load_dataset('train', prefix='HARDataset/')
X_test, y_test = load_dataset('test', prefix='HARDataset/')
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

(7352, 128, 9) (7352, 1)
(2947, 128, 9) (2947, 1)
