# Exercise 1: Sensor Data Exploration

**Objective:** Understand the structure of the HAR dataset, visualize features, detect inconsistencies, and normalize data.

Students will learn how to load real sensor data, explore feature distributions, and prepare it for ML.

In [ ]:
# Load libraries
import pandas as pd
import matplotlib.pyplot as plt

# Load feature and activity info
features = pd.read_csv('dataset/features.txt', delim_whitespace=True, header=None)
activity_labels = pd.read_csv('dataset/activity_labels.txt', delim_whitespace=True, header=None, index_col=0)

# Load training and test data
X_train = pd.read_csv('dataset/X_train.txt', delim_whitespace=True, header=None)
X_test = pd.read_csv('dataset/X_test.txt', delim_whitespace=True, header=None)
y_train = pd.read_csv('dataset/y_train.txt', header=None)
y_test = pd.read_csv('dataset/y_test.txt', header=None)

# Assign feature names
X_train.columns = features[1]
X_test.columns = features[1]

# Map activity labels
y_train_mapped = y_train[0].map(activity_labels[1])
y_test_mapped = y_test[0].map(activity_labels[1])

## Inspect Dataset
- Check dimensions and basic stats
- Look for class balance

In [ ]:
print('X_train shape:', X_train.shape)
print('y_train shape:', y_train.shape)

print('\nActivity distribution:')
print(y_train_mapped.value_counts())

## Visualize Some Features
- Histogram of one feature to see distribution

In [ ]:
# Example: visualize mean body acceleration magnitude
plt.hist(X_train['tBodyAccMag-mean()'], bins=50)
plt.title('Distribution of tBodyAccMag-mean()')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

## Feature Normalization
Scaling features is important for both regression and classification because many ML models are sensitive to feature magnitude.

In [ ]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Reflection Questions
1. Why is it important to remove the target from features before scaling?
2. Why scale the test set using the same scaler as the training set?
3. How does class balance affect model training?