# Human Activity Recognition - Feature Extraction
### This notebook processes accelerometer and gyroscope sensor data to extract features for human activity recognition. The data is collected from four different activities: sitting, lying down, standing, and walking.

### Overview, The pipeline consists of:

1. Loading and merging accelerometer and gyroscope data

2. Combining data from all activities

3. Extracting statistical features using sliding windows

4. Saving the processed data for model training

## Import Libraries


In [10]:
import pandas as pd
import numpy as np
import os 

## Data Loading and Merging Function

In [11]:

def merge_activity(folder_path, activity_name):
    
    """
    Load and merge accelerometer and gyroscope data for a specific activity.
    
    Parameters:
    folder_path (str): Path to the folder containing sensor data files
    activity_name (str): Name of the activity (e.g., 'sitting', 'walking')
    
    Returns:
    pandas.DataFrame: Merged dataframe with accelerometer and gyroscope data
    """
    # Load Accelerometer
    acc = pd.read_csv(os.path.join(folder_path, "Accelerometer.csv"))
    acc = acc.rename(columns={
        acc.columns[0]: "time",
        acc.columns[1]: "accX",
        acc.columns[2]: "accY",
        acc.columns[3]: "accZ"
    })
    
    # Load Gyroscope
    gyr = pd.read_csv(os.path.join(folder_path, "Gyroscope.csv"))
    gyr = gyr.rename(columns={
        gyr.columns[0]: "time",
        gyr.columns[1]: "gyrX",
        gyr.columns[2]: "gyrY",
        gyr.columns[3]: "gyrZ"
    })
    
    # Merge on time
    df = pd.merge_asof(
        acc.sort_values("time"),
        gyr.sort_values("time"),
        on="time",
        direction="nearest"
    )
    
    df["activity"] = activity_name
    return df



## Load Data for All Activities

In [12]:
lying_df = merge_activity("separated\\lying down", "lying")
standing_df = merge_activity("separated\\standing", "standing")
sitting_df = merge_activity("separated\\sitting", "sitting")
walking_df = merge_activity("separated\\walking", "walking")
print(sitting_df.head())


       time      accX      accY      accZ      gyrX      gyrY      gyrZ  \
0  0.029746 -4.468596 -3.722036  7.714460  0.034437  0.065895 -0.019547   
1  0.048555 -4.398008 -3.804588  7.808977  0.034437  0.065895 -0.019547   
2  0.067364 -4.444668 -3.777071  7.905886  0.034437  0.065895 -0.019547   
3  0.086175 -4.432704 -3.789035  7.856833  0.034437  0.065895 -0.019547   
4  0.104985 -4.435097 -3.857230  7.971689  0.034437  0.065895 -0.019547   

  activity  
0  sitting  
1  sitting  
2  sitting  
3  sitting  
4  sitting  


## Combine All Activity Data

In [13]:

# Combine into one dataset
combined_df = pd.concat([sitting_df, lying_df, standing_df, walking_df], ignore_index=True)

# Save combined raw dataset
combined_df.to_csv("full_data/combined_raw.csv", index=False)

print("Combined dataset shape:", combined_df.shape)
combined_df.head()


Combined dataset shape: (167402, 8)


Unnamed: 0,time,accX,accY,accZ,gyrX,gyrY,gyrZ,activity
0,0.029746,-4.468596,-3.722036,7.71446,0.034437,0.065895,-0.019547,sitting
1,0.048555,-4.398008,-3.804588,7.808977,0.034437,0.065895,-0.019547,sitting
2,0.067364,-4.444668,-3.777071,7.905886,0.034437,0.065895,-0.019547,sitting
3,0.086175,-4.432704,-3.789035,7.856833,0.034437,0.065895,-0.019547,sitting
4,0.104985,-4.435097,-3.85723,7.971689,0.034437,0.065895,-0.019547,sitting


## Feature Extraction Function and Save The Results

In [14]:

def extract_features(df, window_size=100, step_size=50):
    """
    Extract statistical features from sensor data using sliding windows.
        
    Parameters:
    df (pandas.DataFrame): Input dataframe with sensor data
    window_size (int): Size of the sliding window (number of samples)
    step_size (int): Step size between consecutive windows
    
    Returns:
    pandas.DataFrame: Dataframe with extracted features for each window
    """
    features = []

    for start in range(0, len(df) - window_size, step_size):
        window = df.iloc[start:start+window_size]
        feats = {}

        # Extract stats for each sensor axis
        for col in ["accX", "accY", "accZ", "gyrX", "gyrY", "gyrZ"]:
            vals = window[col].values
            feats[f"{col}_mean"] = np.mean(vals)
            feats[f"{col}_std"] = np.std(vals)
            feats[f"{col}_min"] = np.min(vals)
            feats[f"{col}_max"] = np.max(vals)
            feats[f"{col}_energy"] = np.sum(vals**2) / len(vals)

        # Assign label (majority vote in window)
        feats["activity"] = window["activity"].mode()[0]

        features.append(feats)

    return pd.DataFrame(features)

# Run feature extraction
features_df = extract_features(combined_df, window_size=100, step_size=50)

# Save feature dataset
features_df.to_csv("full_data\\features.csv", index=False)

print("Feature dataset shape:", features_df.shape)
features_df.head()


Feature dataset shape: (3347, 31)


Unnamed: 0,accX_mean,accX_std,accX_min,accX_max,accX_energy,accY_mean,accY_std,accY_min,accY_max,accY_energy,...,gyrY_std,gyrY_min,gyrY_max,gyrY_energy,gyrZ_mean,gyrZ_std,gyrZ_min,gyrZ_max,gyrZ_energy,activity
0,-3.685473,1.997135,-8.252846,0.440279,17.571261,-3.616931,0.858293,-5.18405,-1.010968,13.818856,...,0.483096,-1.694338,0.598859,0.279357,-0.042829,0.578459,-1.49482,1.724727,0.336449,sitting
1,-1.785979,1.86733,-6.859026,1.311267,6.676641,-3.775539,0.683741,-5.955736,-2.402394,14.722198,...,0.597027,-1.694338,1.200926,0.369895,-0.375712,0.733944,-2.632828,0.654217,0.679832,sitting
2,-0.931023,0.853553,-3.101098,1.311267,1.595357,-2.954239,1.242214,-5.955736,0.447458,10.270621,...,0.368278,-0.519297,1.200926,0.179956,-0.322764,0.725953,-2.632828,0.654217,0.631184,sitting
3,-1.660033,1.116699,-4.991428,1.032503,4.002726,-2.788966,1.344942,-5.565705,0.447458,9.587203,...,0.634349,-2.404753,1.997165,0.407521,0.005821,0.231265,-0.454776,0.707361,0.053518,sitting
4,-3.069298,1.26626,-5.53101,1.032503,11.024001,-2.652121,1.681772,-6.028717,-0.277567,9.862104,...,0.636009,-2.404753,1.997165,0.413608,0.105961,0.221813,-0.380863,0.76417,0.060429,sitting


## Summary

* Raw Data Size: 167,402 samples with 8 columns (timestamp, 3 accelerometer axes, 3 gyroscope axes, activity label)
* Feature Data Size: 3,347 windows with 31 features (30 statistical features + activity label)
* Window Parameters: 100 samples per window, 50 samples step size
* Extracted Features: Mean, standard deviation, min, max, and energy for each sensor axis
* The resulting feature dataset is now ready for machine learning model training for activity classification.

