<a href="https://colab.research.google.com/github/Seham283/CNN-LSTM/blob/main/HAR_(CNN_LSTM).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Human Activity Recognition (HAR)
This notebook builds a CNN-LSTM deep learning model to classify human activities using the UCI HAR Dataset.

**Steps:**




1.     Load Data: Upload and extract the dataset.

2.    Preprocess Data: Extract features, scale values, and encode labels.


3.   Visualize Data: Explore activity distribution and signal patterns.

2.   Train Model: Use a CNN-LSTM for classification.


5.   Evaluate Performance: Check accuracy, confusion matrix, and reports.

2.   Make Predictions: Test the model on new samples.



In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling1D, LSTM, Dense, Dropout, TimeDistributed, BatchNormalization
import os
import shap

# **Step 1:** Data Loading


1.   Upload & Extract the UCI HAR Dataset ZIP file in Colab.

2.   Load Train & Test Data from extracted files.


3.   Assign Activity Labels for better readability.







In [3]:
from google.colab import files
import zipfile

# Step 1: Data Loading
print("Step 1: Uploading and unzipping UCI HAR Dataset in Colab...")

# Upload the ZIP file
print("upload the UCI HAR Dataset ZIP file ")
uploaded = files.upload()

# Find the uploaded ZIP file
if not uploaded:
    raise FileNotFoundError("No file uploaded. Please upload the UCI HAR Dataset ZIP.")
zip_file = list(uploaded.keys())[0]
print(f"Detected uploaded file: {zip_file}")

# Unzip the file in Colab's temporary directory
with zipfile.ZipFile(zip_file, 'r') as zip_ref:
    zip_ref.extractall("/content/UCI_HAR_Dataset")

# Define the base path to the unzipped dataset
base_path = "/content/UCI_HAR_Dataset/UCI HAR Dataset/"

# Function to load training and test data from unzipped files
def load_har_data(base_path):
    if not os.path.exists(base_path):
        raise FileNotFoundError(f"Directory {base_path} not found. Ensure the ZIP was uploaded and unzipped correctly.")

    train_X = np.loadtxt(os.path.join(base_path, "train/X_train.txt"))  # Load training features
    train_y = np.loadtxt(os.path.join(base_path, "train/y_train.txt"))  # Load training labels
    test_X = np.loadtxt(os.path.join(base_path, "test/X_test.txt"))     # Load test features
    test_y = np.loadtxt(os.path.join(base_path, "test/y_test.txt"))     # Load test labels
    return train_X, train_y, test_X, test_y

# Load the data
X_train, y_train, X_test, y_test = load_har_data(base_path)
activity_labels = {0: 'Walking', 1: 'Walking Upstairs', 2: 'Walking Downstairs',
                   3: 'Sitting', 4: 'Standing', 5: 'Laying'}  # Define activity names

Step 1: Uploading and unzipping UCI HAR Dataset in Colab...
upload the UCI HAR Dataset ZIP file 


Saving UCI HAR Dataset.zip to UCI HAR Dataset.zip
Detected uploaded file: UCI HAR Dataset.zip


# **Step 2:** Data Preprocessing
This step processes the raw input data to enhance features and prepare it for model training:

1.   Feature Extraction:


*   Time-domain features: Computes the mean and standard deviation for each sample.
*   Frequency-domain features: Uses FFT spectral energy to capture signal variations.


*  Feature Combination: Merges original signals with extracted features.

2.   Normalization:



*   Applies StandardScaler to standardize all features for better model performance.

3.  Label Encoding:



*   Converts activity labels to one-hot encoding for training a classification model.











In [6]:
# Step 2: Data Preprocessing
def extract_features(data, timesteps=561):
    samples = data.shape[0]  # Number of samples (7352 for train, 2947 for test)
    features = 1
    data_reshaped = data.reshape(samples, timesteps, features)

    # Extract time-domain features: mean and standard deviation
    mean_features = np.mean(data_reshaped, axis=1, keepdims=True)
    std_features = np.std(data_reshaped, axis=1, keepdims=True)

    # Extract frequency-domain features: FFT spectral energy
    fft_data = np.abs(np.fft.fft(data_reshaped, axis=1))[:, :timesteps//2, :]
    energy_features = np.sum(fft_data**2, axis=1, keepdims=True) / (timesteps//2)

    # Combine original data with new features
    # Broadcast new features to match timesteps dimension
    mean_features = np.repeat(mean_features, timesteps, axis=1)
    std_features = np.repeat(std_features, timesteps, axis=1)
    energy_features = np.repeat(energy_features, timesteps, axis=1)
    enhanced_data = np.concatenate([data_reshaped, mean_features, std_features, energy_features], axis=2)
    return enhanced_data

timesteps = 561  # Match the original number of features
X_train_enhanced = extract_features(X_train, timesteps)
X_test_enhanced = extract_features(X_test, timesteps)
n_features = X_train_enhanced.shape[-1]  # 4 features per timestep

# Normalize the data using StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_enhanced.reshape(-1, n_features)).reshape(X_train_enhanced.shape)
X_test_scaled = scaler.transform(X_test_enhanced.reshape(-1, n_features)).reshape(X_test_enhanced.shape)

# Convert labels to one-hot encoded format (0-based indexing)
y_train = tf.keras.utils.to_categorical(y_train - 1, num_classes=6)
y_test = tf.keras.utils.to_categorical(y_test - 1, num_classes=6)

# **Step 3:** Data Visualization
This step creates four visualizations to explore the preprocessed data:



1.   Activity Distribution – A histogram showing how many samples belong to each activity class in the training set.
2.  Sample Signals – A line plot displaying raw accelerometer signals for different activities to observe patterns.

1.   FFT Energy – A frequency-domain representation (Fourier Transform) to analyze signal differences across activities.
2.   Feature Correlation Heatmap – A heatmap showing relationships between key features like raw signals, mean, standard deviation, and FFT energy.







In [11]:
# Step 3: Data Visualization
# Visualization 1: Activity Distribution in Training Set
y_train_labels = np.argmax(y_train, axis=1)
fig = px.histogram(x=[activity_labels[i] for i in y_train_labels], title="Activity Distribution in Training Set",
                   labels={'x': 'Activity'}, color_discrete_sequence=['#636EFA'])
fig.update_layout(bargap=0.2)
fig.show()

# Visualization 2: Sample Signals for Each Activity
fig = go.Figure()
for i in range(6):
    idx = np.where(np.argmax(y_train, axis=1) == i)[0][0]  # First sample of each activity
    fig.add_trace(go.Scatter(y=X_train_scaled[idx, :, 0], name=activity_labels[i]))
fig.update_layout(title="Sample Signals by Activity (Accelerometer X)", xaxis_title="Time Step", yaxis_title="Amplitude")
fig.show()

# Visualization 3: FFT Energy by Activity Class
raw_signals = X_train_scaled[:, :, 0]  # Shape: (7352, 561)
fft_all = np.abs(np.fft.fft(raw_signals, axis=1))[:, :561//2]  # Shape: (7352, 280)
fig = go.Figure()
for class_idx in range(6):
    class_mask = (y_train_labels == class_idx)
    class_fft = fft_all[class_mask]
    mean_class_fft = np.mean(class_fft, axis=0)
    fig.add_trace(go.Scatter(y=mean_class_fft, name=activity_labels[class_idx],
                             x=np.arange(len(mean_class_fft))))
fig.update_layout(title="Average FFT Energy by Activity Class",
                  xaxis_title="Frequency Bin (0 to 280)", yaxis_title="Amplitude",
                  legend_title="Activity")
fig.show()

# Visualization 4: Feature Correlation Heatmap (Pearson Correlation)
subset = X_train_scaled[:100, :, :]
subset_avg = np.mean(subset, axis=1)
corr = np.corrcoef(subset_avg.T)
feature_names = ['Raw Signal', 'Mean', 'Std Dev', 'FFT Energy']
fig = px.imshow(corr, text_auto='.2f',
                labels=dict(x="Feature", y="Feature", color="Pearson Correlation"),
                x=feature_names, y=feature_names,
                title="Pearson Correlation Heatmap of Enhanced Features",
                color_continuous_scale='Spectral')
fig.update_layout(width=600, height=600)
fig.show()

#**step4:**   This step builds a CNN-LSTM model to classify activities.

*   The CNN layers extract patterns from the input data.

*   The LSTM layers capture time-based relationships.


*  The Dense layers classify the activity type.

The model is compiled with the Adam optimizer and categorical cross-entropy loss, then its structure is displayed.

In [13]:
# Step 4: Model Definition
# Define a deeper CNN-LSTM model with batch normalization
model = Sequential()
# CNN Layers for spatial feature extraction
model.add(Conv1D(64, kernel_size=5, activation='relu', padding='same', input_shape=(561, 4)))
model.add(BatchNormalization())
model.add(Conv1D(32, kernel_size=3, activation='relu', padding='same'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.3))

# LSTM Layers for temporal modeling
model.add(LSTM(128, return_sequences=True))
model.add(BatchNormalization())
model.add(LSTM(64, return_sequences=False))
model.add(Dropout(0.3))

# Dense Layers for classification
model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dense(6, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()


Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



# **Step 5:** Model Training
This step trains the CNN-LSTM model using 5-fold cross-validation to ensure reliable performance.

*   The dataset is split into five different training and validation sets.

*   The model is trained five times, each time on a different training set.


*   After training each fold, the model’s validation accuracy is recorded.

*   The final accuracy is reported as the average and standard deviation across all folds.

Once cross-validation is complete, the model is trained one last time on the full dataset for final optimization.



In [14]:
# Step 5: Model Training
# Perform 5-fold cross-validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)
fold_scores = []
for fold, (train_idx, val_idx) in enumerate(kf.split(X_train_scaled)):
    print(f"Training Fold {fold+1}/5")
    X_fold_train, X_fold_val = X_train_scaled[train_idx], X_train_scaled[val_idx]
    y_fold_train, y_fold_val = y_train[train_idx], y_train[val_idx]
    history = model.fit(X_fold_train, y_fold_train, epochs=20, batch_size=32,
                        validation_data=(X_fold_val, y_fold_val), verbose=1)
    fold_scores.append(model.evaluate(X_fold_val, y_fold_val, verbose=0)[1])

print(f"Cross-Validation Accuracy: {np.mean(fold_scores):.4f} ± {np.std(fold_scores):.4f}")

# Final training on full dataset
history = model.fit(X_train_scaled, y_train, epochs=20, batch_size=32, validation_split=0.2, verbose=1)

Training Fold 1/5
Epoch 1/20
[1m184/184[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m134s[0m 675ms/step - accuracy: 0.4809 - loss: 1.2905 - val_accuracy: 0.5153 - val_loss: 1.0299
Epoch 2/20
[1m184/184[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 680ms/step - accuracy: 0.6576 - loss: 0.7570 - val_accuracy: 0.6771 - val_loss: 0.6178
Epoch 3/20
[1m184/184[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 680ms/step - accuracy: 0.7391 - loss: 0.6180 - val_accuracy: 0.7886 - val_loss: 0.5087
Epoch 4/20
[1m184/184[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m141s[0m 677ms/step - accuracy: 0.7757 - loss: 0.5311 - val_accuracy: 0.7451 - val_loss: 0.5654
Epoch 5/20
[1m184/184[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 676ms/step - accuracy: 0.8001 - loss: 0.4806 - val_accuracy: 0.8600 - val_loss: 0.3639
Epoch 6/20
[1m184/184[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m141s[0m 671ms/step - accuracy: 0.8463 - loss: 0.3979 - val_accuracy: 0.8742 - val

# **Step 6**: Training Visualization
This step visualizes the model's training process over multiple epochs.


*  The first plot shows training and validation accuracy, helping to track model improvements.

*   The second plot displays training and validation loss, indicating how well the model is learning.






In [15]:
# Step 6: Training Visualization
# Plot training and validation accuracy
fig = go.Figure()
fig.add_trace(go.Scatter(y=history.history['accuracy'], name='Train Accuracy'))
fig.add_trace(go.Scatter(y=history.history['val_accuracy'], name='Val Accuracy'))
fig.update_layout(title="Model Accuracy Over Epochs", xaxis_title="Epoch", yaxis_title="Accuracy")
fig.show()

# Plot training and validation loss
fig = go.Figure()
fig.add_trace(go.Scatter(y=history.history['loss'], name='Train Loss'))
fig.add_trace(go.Scatter(y=history.history['val_loss'], name='Val Loss'))
fig.update_layout(title="Model Loss Over Epochs", xaxis_title="Epoch", yaxis_title="Loss")
fig.show()

# **Step 7:** Model Evaluation
This step evaluates the trained model’s performance on the test dataset.


*   Predictions are made using the trained model.

*   A classification report is generated to show precision, recall, and F1-score for each activity.


*   A confusion matrix is plotted to visualize correct and incorrect predictions:
     
1.   The raw confusion matrix shows actual counts.

2.   The normalized confusion matrix shows the proportion of correct and incorrect classifications.











In [20]:
# Step 7: Model Evaluation
# Make predictions on test set
y_pred = model.predict(X_test_scaled)
y_pred_classes = np.argmax(y_pred, axis=1)
y_test_classes = np.argmax(y_test, axis=1)

# Print classification report
print("Classification Report:")
print(classification_report(y_test_classes, y_pred_classes, target_names=list(activity_labels.values())))

# Confusion Matrix (Raw Counts)
cm = confusion_matrix(y_test_classes, y_pred_classes)
blue_scale = ['#ADD8E6', '#00008B']
fig = px.imshow(cm, text_auto=True, labels=dict(x="Predicted", y="True", color="Count"),
                x=list(activity_labels.values()), y=list(activity_labels.values()),
                title="Confusion Matrix (Raw Counts)",
                color_continuous_scale=blue_scale)
fig.show()

# Normalized Confusion Matrix (Proportions)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]  # Normalize by true class counts
fig = px.imshow(cm_normalized, text_auto='.2f', labels=dict(x="Predicted", y="True", color="Proportion"),
                x=list(activity_labels.values()), y=list(activity_labels.values()),
                title="Normalized Confusion Matrix (Proportions)",
                color_continuous_scale=blue_scale)
fig.show()

[1m93/93[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 191ms/step
Classification Report:
                    precision    recall  f1-score   support

           Walking       0.95      0.97      0.96       496
  Walking Upstairs       0.91      0.93      0.92       471
Walking Downstairs       0.93      0.89      0.91       420
           Sitting       0.93      0.79      0.86       491
          Standing       0.84      0.94      0.89       532
            Laying       0.99      1.00      1.00       537

          accuracy                           0.92      2947
         macro avg       0.93      0.92      0.92      2947
      weighted avg       0.93      0.92      0.92      2947



# **Step 8:** Prediction Section
This step tests the model on a random input sample and visualizes the results.


*   A function is created to predict the activity of a given input sample.

*  The function extracts features, normalizes the input, and predicts the activity class.


*   A random test sample is selected, and its true activity is compared with the predicted activity.

*   The prediction probabilities for each activity class are displayed in a line chart.







In [30]:
# Step 8: Prediction Section
# Function to predict activity on a custom input sample
def predict_activity(sample, model, scaler, timesteps=561):
    # Expect raw input shape
    if sample.shape != (561,):
        raise ValueError(f"Sample must have shape (561,)")

    # Enhance features
    sample_enhanced = extract_features(sample[np.newaxis, :], timesteps)
    # Normalize
    sample_scaled = scaler.transform(sample_enhanced.reshape(-1, n_features)).reshape(sample_enhanced.shape)
    # Predict
    pred = model.predict(sample_scaled)
    pred_class = np.argmax(pred, axis=1)[0]
    return activity_labels[pred_class], pred[0]

# Test prediction on a random test sample
random_idx = np.random.randint(0, X_test.shape[0])
test_sample = X_test[random_idx]  # Shape: (561,)
predicted_activity, pred_probs = predict_activity(test_sample, model, scaler)
true_activity = activity_labels[np.argmax(y_test[random_idx])]

# Sample-specific accuracy (correct or incorrect)
prediction_accuracy = 1.0 if predicted_activity == true_activity else 0.0

print("\n=== Prediction Test ===")
print(f"True Activity: {true_activity}")
print(f"Predicted Activity: {predicted_activity}")
print(f"Prediction Accuracy: {prediction_accuracy:.0%}")
print(f"Prediction Probabilities: {dict(zip(activity_labels.values(), pred_probs.round(4)))}")

# Visualization: Prediction Probabilities
fig = go.Figure()
fig.add_trace(go.Scatter(x=list(activity_labels.values()), y=pred_probs, mode='lines+markers',
                         name='Probability', line=dict(color='#636EFA')))
fig.update_layout(title=f"Prediction Probabilities (True: {true_activity}, Accuracy: {prediction_accuracy:.0%})",
                  xaxis_title="Activity", yaxis_title="Probability")
fig.show()

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 147ms/step

=== Prediction Test ===
True Activity: Walking
Predicted Activity: Walking
Prediction Accuracy: 100%
Prediction Probabilities: {'Walking': np.float32(0.9974), 'Walking Upstairs': np.float32(0.0005), 'Walking Downstairs': np.float32(0.002), 'Sitting': np.float32(0.0), 'Standing': np.float32(0.0), 'Laying': np.float32(0.0)}


# **brief summary of the project**

In [33]:
# summerization Report
print("\n=== Enhanced Human Activity Recognition Report ===")
print("Dataset: UCI HAR Dataset")
print("Model: Deeper CNN-LSTM with Batch Normalization")
print(f"Custom Features: Time-domain (mean, std), Frequency-domain (FFT energy)")
print(f"Test Set Accuracy: {model.evaluate(X_test_scaled, y_test, verbose=0)[1]:.4f}")
print("Innovations:")
print("- Multi-layer CNN and LSTM with batch normalization for robust feature extraction.")
print("- Enhanced feature set improves model generalization.")
print("- Extensive visualizations, including raw and normalized confusion matrices, provide deep insights.")
print("This model achieves high accuracy through a unique combination of deep architecture, custom features, and comprehensive evaluation.")


=== Enhanced Human Activity Recognition Report ===
Dataset: UCI HAR Dataset
Model: Deeper CNN-LSTM with Batch Normalization
Custom Features: Time-domain (mean, std), Frequency-domain (FFT energy)
Test Set Accuracy: 0.9233
Innovations:
- Multi-layer CNN and LSTM with batch normalization for robust feature extraction.
- Enhanced feature set improves model generalization.
- Extensive visualizations, including raw and normalized confusion matrices, provide deep insights.
This model achieves high accuracy through a unique combination of deep architecture, custom features, and comprehensive evaluation.
