# Exercise 3: Classification

## Objective
Predict human activity from sensor features using Logistic Regression. Learn how to train a classifier, evaluate its performance, visualize confusion matrices, and test model robustness.

## Step 1: Import Libraries
We import the necessary Python libraries for preprocessing, training, evaluation, and visualization.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

## Step 2: Load Data
Load training and test features and labels, and map activity IDs to readable names.

In [None]:
# Load activity labels
activity_labels = pd.read_csv('dataset/activity_labels.txt', delim_whitespace=True, header=None, index_col=0)

# Load training and test features
X_train = pd.read_csv('dataset/X_train.txt', delim_whitespace=True, header=None)
X_test = pd.read_csv('dataset/X_test.txt', delim_whitespace=True, header=None)

# Load training and test labels
y_train = pd.read_csv('dataset/y_train.txt', header=None)
y_test = pd.read_csv('dataset/y_test.txt', header=None)

# Map activity IDs to names
y_train_mapped = y_train[0].map(activity_labels[1])
y_test_mapped = y_test[0].map(activity_labels[1])

print("Data loaded. Classes:", activity_labels[1].unique())

## Step 3: Normalize Features
Scale all features using StandardScaler. This helps the Logistic Regression model converge faster and perform better.

In [None]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Step 4: Train Logistic Regression Model
We train a multi-class Logistic Regression classifier. We use `.ravel()` on the target to avoid a DataConversionWarning.

In [None]:
clf_model = LogisticRegression(max_iter=1000, multi_class='multinomial', random_state=42)

# Note: y_train_mapped is a Series, but sklearn prefers a 1D array. 
clf_model.fit(X_train_scaled, y_train_mapped)

## Step 5: Make Predictions and Evaluate
Predict activity on the test set and compute overall accuracy.

In [None]:
y_pred = clf_model.predict(X_test_scaled)
accuracy = accuracy_score(y_test_mapped, y_pred)
print(f'Accuracy on test set: {accuracy:.4f}')

## Step 6: Confusion Matrix
Visualize which activities are most often confused by the classifier.

In [None]:
labels = activity_labels[1].tolist()
cm = confusion_matrix(y_test_mapped, y_pred, labels=labels)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', xticklabels=labels, yticklabels=labels, cmap='Blues')
plt.xlabel('Predicted Activity')
plt.ylabel('True Activity')
plt.title('Confusion Matrix of Human Activity Prediction')
plt.xticks(rotation=45)
plt.show()

## Step 7: Test Robustness with Noise
Add small random noise to features to check if the model is stable.

In [None]:
# Add noise to the standardized test set
noise_factor = 0.1
X_test_noisy = X_test_scaled + np.random.normal(0, noise_factor, X_test_scaled.shape)

y_pred_noisy = clf_model.predict(X_test_noisy)
accuracy_noisy = accuracy_score(y_test_mapped, y_pred_noisy)

print(f'Accuracy with noise (factor={noise_factor}): {accuracy_noisy:.4f}')
print(f'Drop in accuracy: {(accuracy - accuracy_noisy):.4f}')

## Step 8: Optional Mapping to Robot Actions
If integrating with robotics, predicted activities can trigger robot actions:
- WALKING → Move forward
- WALKING_UPSTAIRS → Ascend slope
- WALKING_DOWNSTAIRS → Descend slope
- SITTING / STANDING → Idle / Stop
- LAYING → Sleep mode

## Step 9: Reflection Questions
1. Which activities are most often confused? (Hint: Look at SITTING vs STANDING in the matrix)
2. How does scaling affect the classifier?
3. How can you improve model robustness and accuracy?