# AI-Powered Unsafe Activity Detector

## Project Overview
This notebook implements a Random Forest Classifier to detect unsafe activities using the UCI Human Activity Recognition (HAR) Dataset.

### Objectives
- Load and preprocess UCI HAR Dataset
- Create a binary safety classification model
- Evaluate model performance with focus on unsafe activity detection


In [None]:
# Import Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, ConfusionMatrixDisplay

# Check if running in Google Colab
import sys
in_colab = 'google.colab' in sys.modules

# Download UCI HAR Dataset if in Colab
if in_colab:
    !wget https://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip
    !unzip "UCI HAR Dataset.zip"
    data_path = 'UCI HAR Dataset/'
else:
    data_path = './data/'  # Local path


In [None]:
# Load Training and Testing Data
def load_har_data(data_path):
    # Load features
    X_train = np.loadtxt(data_path + 'train/X_train.txt')
    X_test = np.loadtxt(data_path + 'test/X_test.txt')

    # Load labels
    y_train = np.loadtxt(data_path + 'train/y_train.txt')
    y_test = np.loadtxt(data_path + 'test/y_test.txt')

    # Load activity labels
    with open(data_path + 'activity_labels.txt', 'r') as f:
        activity_labels = [line.strip().split()[1] for line in f.readlines()]

    return X_train, X_test, y_train, y_test, activity_labels

# Load data
X_train, X_test, y_train, y_test, activity_labels = load_har_data(data_path)

# Print dataset shapes
print("Training Features Shape:", X_train.shape)
print("Testing Features Shape:", X_test.shape)
print("Activity Labels:", activity_labels)


In [None]:
# Safety Re-Labeling
def relabel_safety(y_original):
    # Mapping:
    # Safe (0): STANDING (4), SITTING (5), LAYING (6)
    # Unsafe (1): WALKING (1), WALKING_UPSTAIRS (2), WALKING_DOWNSTAIRS (3)
    safe_activities = [4, 5, 6]
    unsafe_activities = [1, 2, 3]

    return np.where(np.isin(y_original, unsafe_activities), 1, 0)

# Relabel training and testing data
y_train_safety = relabel_safety(y_train)
y_test_safety = relabel_safety(y_test)

# Visualize Safety Class Distribution
plt.figure(figsize=(8, 5))
safety_counts_train = pd.Series(y_train_safety).value_counts()
safety_counts_train.plot(kind='bar')
plt.title('Safety Class Distribution in Training Data')
plt.xlabel('Safety Class (0: Safe, 1: Unsafe)')
plt.ylabel('Number of Samples')
plt.tight_layout()
plt.show()

# Print class distribution
print("Training Data Safety Class Distribution:")
print(safety_counts_train)


In [None]:
# Model Training and Evaluation
# Initialize and Train Random Forest Classifier
rf_classifier = RandomForestClassifier(
    n_estimators=100, 
    random_state=42, 
    class_weight='balanced'
)
rf_classifier.fit(X_train, y_train_safety)

# Make Predictions
y_pred = rf_classifier.predict(X_test)

# Confusion Matrix
cm = confusion_matrix(y_test_safety, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['Safe', 'Unsafe'])
disp.plot(cmap='Blues', values_format='d')
plt.title('Confusion Matrix: Safety Activity Detection')
plt.tight_layout()
plt.show()

# Classification Report
print("Classification Report:")
print(classification_report(y_test_safety, y_pred, 
                            target_names=['Safe', 'Unsafe'], 
                            zero_division=0))


## Safety Performance Analysis

### Key Insights
- **Unsafe Class Performance**: The recall and F1-score for the 'Unsafe' class are critical metrics
- High recall minimizes false negatives, which is crucial for safety-critical applications
- The balanced class weight helps mitigate potential class imbalance

### Recommendations
1. Consider collecting more diverse unsafe activity data
2. Experiment with feature engineering
3. Try ensemble methods or advanced sampling techniques
