# Human Activity Recognition - Machine Learning Model
### This notebook implements a machine learning pipeline for classifying human activities based on sensor data features. We use a Random Forest classifier to distinguish between four activities: lying, sitting, standing, and walking.

### Overview, The pipeline consists of:

1. Loading preprocessed feature data

2. Encoding categorical labels

3. Splitting data into training and testing sets

4. Training a Random Forest classifier

5. Evaluating model performance

6. Training on the full dataset for deployment

7. Saving the trained model for deployment

## Import Libraries

In [7]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score

## Load Data and Prepare for Training

In [8]:

# Load feature dataset
features_df = pd.read_csv("C:\\Users\\Dell\\Desktop\\AI Motion Sensor App\\Data\\full_data\\features.csv")

# Split into X (features) and y (labels)
X = features_df.drop(columns=["activity"])
y = features_df["activity"]

# Encode activity labels into numbers
le = LabelEncoder()
y_encoded = le.fit_transform(y)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y_encoded, test_size=0.2, random_state=42, stratify=y_encoded
)

print("Train shape:", X_train.shape, "Test shape:", X_test.shape)


Train shape: (2677, 30) Test shape: (670, 30)


## Train Random Forest Classifier

In [9]:

# Train RandomForest
rf = RandomForestClassifier(n_estimators=200, random_state=42)
rf.fit(X_train, y_train)

# Predictions
y_pred = rf.predict(X_test)

# Evaluation
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred, target_names=le.classes_))


Accuracy: 0.9970149253731343
              precision    recall  f1-score   support

       lying       1.00      1.00      1.00       178
     sitting       1.00      0.99      1.00       176
    standing       1.00      0.99      1.00       178
     walking       0.99      1.00      0.99       138

    accuracy                           1.00       670
   macro avg       1.00      1.00      1.00       670
weighted avg       1.00      1.00      1.00       670



## Train Model on Full Dataset

In [10]:
# Train a new model on the entire dataset for deployment
# This ensures the model learns from all available data

print("Training final model on entire dataset...")
print("Full dataset shape:", X.shape)

# Initialize a new Random Forest classifier
final_model = RandomForestClassifier(n_estimators=200, random_state=42)

# Train on the complete dataset
final_model.fit(X, y_encoded)

# Verify performance on the full dataset (this will be optimistic)
full_predictions = final_model.predict(X)
full_accuracy = accuracy_score(y_encoded, full_predictions)

print(f"Final model trained on full dataset with {X.shape[0]} samples")
print(f"Training accuracy on full dataset: {full_accuracy:.4f}")

# Quick check to ensure all classes are properly learned
print("\nClass distribution in full dataset:")
print(pd.Series(y).value_counts())

Training final model on entire dataset...
Full dataset shape: (3347, 30)
Final model trained on full dataset with 3347 samples
Training accuracy on full dataset: 1.0000

Class distribution in full dataset:
activity
lying       889
standing    889
sitting     880
walking     689
Name: count, dtype: int64


## Save Trained Model for Deployment

In [11]:
import joblib

joblib.dump(final_model, "models\\activity_final_model.pkl")
joblib.dump(le, "models\\label_encoder.pkl")

print("Final model and label encoder saved successfully!")

Final model and label encoder saved successfully!


## Model Performance Summary

* Test Accuracy: 99.7% - Excellent performance on the test set

* Precision: 99-100% for all classes - Very few false positives

* Recall: 99-100% for all classes - Very few false negatives

* F1-score: 99-100% for all classes - Balanced performance across metrics

* Full Dataset Training: Model trained on all available data for optimal deployment performance

### The model demonstrates exceptional performance in classifying all four activities, making it suitable for real-world deployment in the motion sensor application. The final model trained on the complete dataset will provide the best generalization for new, unseen data.