# IMU-Based Human Activity Recognition  
## Feature Engineering and Dataset Preparation

This notebook performs window-based feature extraction on raw IMU data.  
The goal is to convert raw accelerometer and gyroscope signals into structured feature vectors suitable for machine learning.


## 1. Import Required Libraries

This section imports libraries required for numerical computation, data handling, and file operations.


In [1]:
#numeric computation
import numpy as np

#data handling
import pandas as pd       

#utility
import os

## 2. Define Data Paths and Activity List

Raw IMU data for each activity is stored as separate CSV files.  
This section defines the data directory and lists all available activities.


In [2]:
# Base directory for raw IMU data
RAW_PATH = "../data/Raw data"
FEAT_PATH="../data/featured"

# List all activity files
activities = os.listdir(RAW_PATH)

# Display available activities
activities


['brisk walking.csv',
 'cycling.csv',
 'eating.csv',
 'jogging.csv',
 'phone interaction.csv',
 'pick and place.csv',
 'sit-stand-sit.csv',
 'sitting.csv',
 'stair-down.csv',
 'stair-up.csv',
 'standing.csv',
 'walking.csv']

## 3. Sliding Window Configuration

A fixed-size sliding window is used to segment continuous IMU signals into overlapping windows.


In [3]:
# Window configuration
WINDOW_SIZE = 20        # number of samples per window
STEP_SIZE = 10          # 50% overlap


## 4. Sliding Window Function

This function splits IMU data into overlapping windows based on the defined window size and step size.


In [4]:
def sliding_window(data,window_size,step_size):
    """
    Splits data into overlapping sliding windows.

    Parameters:
        data (ndarray): IMU data of shape (N, 6)
        window_size (int): number of samples per window
        step_size (int): step between windows

    Returns:
        windows (list): list of windowed data arrays
    """
    windows=[]
    for start in range(0,len(data)-window_size+1,step_size):
        end=start+window_size
        windows.append(data[start:end])
    return windows   

## 5. Time-Domain Feature Extraction Function

This function computes statistical time-domain features from each window for all IMU channels.


In [5]:
def extract_features(window):
    """
    Extracts time-domain features from a window of IMU data.

    Parameters:
        window (ndarray): shape (window_size, 6)

    Returns:
        features (list): extracted feature values
    """
    features = []
    for col in range(window.shape[1]):   
        x = window[:, col]

        features.extend([
            np.mean(x),
            np.min(x),
            np.max(x),
            np.std(x),
            np.var(x),
            np.sqrt(np.mean(x**2)),       #RMS 
            np.ptp(x),                    #peak to peak 
            np.median(x),
            np.mean(np.abs(x - np.mean(x))),  #MAD
            np.sum(x**2)                      #signal energy
        ])
    return features

In [6]:
raw_cols=['Ax_last','Ay_last','Az_last','Gx_last','Gy_last','Gz_last']
feature_names=['mean','min','max','std','var','rms','ptp','median','mad','energy']
channels=['Ax','Ay','Az','Gx','Gy','Gz']
feature_cols=raw_cols[:]
for ch in channels:
    for f in feature_names:
        feature_cols.append(f"{ch}_{f}")
len(feature_cols)

66

In [7]:
os.makedirs(FEAT_PATH,exist_ok=True)

In [11]:
for file_name in activities:
    if not file_name.endswith(".csv"):
        continue

    activity = file_name.replace(".csv", "")
    file_path = os.path.join(RAW_PATH, file_name)

    # load your CSV here


    #load raw IMU data
    df = pd.read_csv(os.path.join(RAW_PATH, file_name))

    #select IMU columns only (Ax, Ay, Az, Gx,Gy,Gz)
    imu = df.iloc[:, 0:6].astype(float).values

    #generate sliding windows
    windows = sliding_window(imu, WINDOW_SIZE, STEP_SIZE)

    #extract features for each window
    X_act = []
    for w in windows:
        row = w[-1].tolist() + extract_features(w)
        X_act.append(row)

    X_act = np.array(X_act)


    #convert to dataframe
    df_feat = pd.DataFrame(X_act, columns=feature_cols)

    #save featured CSV
    save_path = os.path.join(FEAT_PATH, f"{activity}_features.csv")
    df_feat.to_csv(save_path, index=False)

    print(f"Saved:{activity}_featured.csv -> {df_feat.shape}")


Saved:brisk walking_featured.csv -> (899, 66)
Saved:cycling_featured.csv -> (899, 66)
Saved:eating_featured.csv -> (899, 66)
Saved:jogging_featured.csv -> (899, 66)
Saved:phone interaction_featured.csv -> (899, 66)
Saved:pick and place_featured.csv -> (899, 66)
Saved:sit-stand-sit_featured.csv -> (899, 66)
Saved:sitting_featured.csv -> (899, 66)
Saved:stair-down_featured.csv -> (899, 66)
Saved:stair-up_featured.csv -> (899, 66)
Saved:standing_featured.csv -> (899, 66)
Saved:walking_featured.csv -> (899, 66)
