# K-Nearest Neighbours ‚Äì Practice Exercise: Human Activity Recognition

## Overview

This notebook challenges you to build a **multi-class classifier** that can identify different physical activities from smartphone sensor data.

### Your Mission
Create a k-NN model that can distinguish between different physical activities (cycling, walking, jogging, etc.) using only smartphone sensor readings.

**Goal Question:** Can we accurately identify what physical activity a person is doing based solely on their smartphone's motion sensors?

## About the Dataset

**Data Source:** Smartphone sensor data collected during various physical activities

This dataset contains real smartphone sensor readings from people performing different physical activities with their phone in their pocket. It's a great example of **time-series classification** and **Internet of Things (IoT)** applications.

### Dataset Details:
- **Multiple activity types:** Cycling, Walking, Jogging, Swimming, Tennis, etc.
- **Sensor readings:** 12 different sensor measurements per timestamp
- **Real-world data:** Collected from actual smartphone sensors
- **Multi-class problem:** More complex than binary classification!
- **Applications:** Fitness tracking, health monitoring, activity recommendation

### Available Activity Files:
- `Cycling.csv` - Bicycle riding data  
- `Walking.csv` - Normal walking data
- `Jogging.csv` - Jogging/running data
- `Swimming.csv` - Swimming activity data
- `Tennis.csv` - Tennis playing data
- `Football.csv` - Football/soccer data
- `JumpRope.csv` - Jump rope exercise data
- `Pushups.csv` - Push-up exercise data
- `Sitting.csv` - Sitting/stationary data
- Plus more activities...

### Sensor Features (12 total):

| Sensor Type | Features | Description |
|-------------|----------|-------------|
| **Accelerometer** | X, Y, Z (m/s¬≤) | Measures acceleration forces |
| **Gravity** | X, Y, Z (m/s¬≤) | Gravitational component of acceleration |
| **Linear Acceleration** | X, Y, Z (m/s¬≤) | Acceleration minus gravity |
| **Gyroscope** | X, Y, Z (rad/s) | Measures rotation rates |

**Additional columns:** Timestamp and datetime information

## What Makes This Challenging?

1. **Multi-class classification:** More than 2 categories to predict
2. **Time-series data:** Sensor readings change over time
3. **Similar activities:** Some activities might have similar sensor patterns
4. **Real-world noise:** Smartphone sensors can be noisy
5. **Feature engineering:** You might need to create new features from raw sensor data

## What You'll Learn

Through this exercise, you will:
- Combine multiple CSV files into a single dataset
- Handle time-series sensor data
- Visualize multi-dimensional sensor patterns
- Build a multi-class k-NN classifier
- Analyze confusion matrices for multiple classes
- Understand which activities are easily confused

## Instructions

üîç **Reference Material:** Look at `Glass Classification.ipynb` for coding examples, but note this problem is more complex!

üí° **Key Reminders:**
- This is **multi-class classification** (more than 2 classes)
- You'll need to **combine multiple CSV files** 
- **Feature scaling** is still crucial for k-NN
- **Confusion matrix** will be larger (activities √ó activities)
- Consider creating **additional features** from raw sensor data

**Ready for the challenge? Let's build an activity recognition system**

## Step 1: Import Required Libraries

You'll need Python libraries for handling multiple files, sensor data analysis, and multi-class classification.

### Essential Imports:
- **File handling:** os, glob (to work with multiple CSV files)
- **Data manipulation:** pandas, numpy
- **Visualisation:** matplotlib, seaborn
- **Machine learning:** scikit-learn modules
- **k-NN and evaluation:** KNeighborsClassifier, confusion_matrix, classification_report

### Understanding Smartphone Sensors:

**Accelerometer:** Measures acceleration forces in 3 dimensions
- Detects device orientation and movement
- Key for detecting walking, running, cycling patterns

**Gyroscope:** Measures rotation rates around 3 axes  
- Detects spinning and rotational movements
- Useful for activities like tennis, football

**Gravity vs Linear Acceleration:** 
- Gravity: The constant downward force (helps determine orientation)
- Linear Acceleration: Movement acceleration minus gravity (actual motion)

In [None]:
# TODO: Import all required libraries here
# Refer to Glass Classification.ipynb for the exact imports needed

# File handling for multiple CSV files
# Data manipulation and analysis
# Visualisation
# Machine learning (KNeighborsClassifier, StandardScaler, etc.)
# Evaluation metrics

## Step 2: Load and Combine Multiple Activity Files

Your biggest challenge: combine multiple CSV files into a single dataset with activity labels!

### Strategy:
1. **Find all CSV files** in the Data/ folder (except processed ones)
2. **Load each file** and add an 'activity' column
3. **Combine all dataframes** into one master dataset
4. **Clean the data** and standardize column names

### Activity Label Mapping:
You'll need to create labels for each activity:
- Cycling ‚Üí 0
- Walking ‚Üí 1  
- Jogging ‚Üí 2
- Swimming ‚Üí 3
- Tennis ‚Üí 4
- Football ‚Üí 5
- etc.

### Key Challenges:
- Different files might have different numbers of samples
- Sensor data might be noisy or have outliers
- Some files might have slightly different column formats
- You need to balance the number of samples per activity

In [None]:
# TODO: Load and combine multiple activity CSV files
# Find all CSV files in Data/ folder (exclude processed files)
# Load each file and add activity labels
# Combine all dataframes into one master dataset
# Clean column names and display basic info

## Step 3: Visualise Sensor Data Patterns

Understanding sensor patterns is crucial for activity recognition! Create visualisations to see how different activities create different sensor signatures.

### Recommended Visualisations:

1. **Activity Distribution:** Bar chart showing sample counts per activity
2. **Sensor Time Series:** Line plots showing how sensors change during different activities
3. **Sensor Magnitude:** Calculate and plot the magnitude of acceleration vectors
4. **Activity Comparison:** Box plots comparing sensor values across activities  
5. **Correlation Analysis:** Heatmap of sensor correlations

### Key Questions to Explore:
- Which sensors show the biggest differences between activities?
- Do similar activities (walking vs jogging) have similar sensor patterns?
- Can you see periodic patterns (like steps) in the time series?
- Which activities have the most/least sensor variation?

In [None]:
# TODO: Create sensor data visualisations
# Activity distribution bar chart
# Calculate sensor magnitudes (derived features)
# Box plots comparing sensor values across activities
# Time series plots for sample activities
# Correlation heatmap of sensor features

## Step 4: Feature Engineering and Data Preprocessing

Sensor data often benefits from feature engineering! You might want to create new features that capture important patterns.

### Feature Engineering Options:

1. **Magnitude Features:** ‚àö(x¬≤ + y¬≤ + z¬≤) for each sensor type
2. **Statistical Features:** Mean, std, min, max over time windows
3. **Frequency Features:** Extract frequency domain features using FFT
4. **Ratio Features:** Ratios between different sensor magnitudes

### Standard Preprocessing Steps:

1. **Select Features:** Choose which sensor readings and derived features to use
2. **Handle Missing Values:** Check for and handle any NaN values
3. **Feature Scaling:** Standardize all features (CRUCIAL for k-NN!)
4. **Train/Test Split:** Split data while maintaining activity balance

### Time Series Considerations:

- **Option 1:** Use individual sensor readings as features (simpler)
- **Option 2:** Create sliding windows and extract features from each window (more advanced)
- **Option 3:** Sample data points to reduce dataset size while maintaining patterns

For this exercise, we'll start with Option 1 (individual readings) but you can experiment with others!

In [None]:
# TODO: Feature Engineering and Preprocessing Steps
# Create magnitude features from sensor readings
# Select features for k-NN (individual readings vs magnitude features)
# Prepare features (X) and target (y)
# Handle any missing values
# Optional: Sample data to reduce size if dataset is very large
# Scale features for k-NN
# Train/test split with stratification

In [None]:
# TODO: Hyperparameter Tuning for Multi-Class k-NN
# Test different k values using cross-validation
# Find best k value for multi-class classification
# Plot k vs accuracy
# Note optimal k value for final model

## Step 5: Train Final Model and Evaluate Multi-Class Performance

Multi-class classification evaluation is more complex than binary classification. You'll need to analyse performance for each activity class.

### Multi-Class Evaluation Metrics:

- **Overall Accuracy:** Percentage of correct predictions across all activities
- **Per-Class Precision/Recall:** How well does the model perform for each specific activity?
- **Macro Average:** Average metrics across all classes (treats each activity equally)  
- **Weighted Average:** Average metrics weighted by class frequency
- **Confusion Matrix:** Shows which activities are confused with each other

### Important Questions:
- Which activities are easiest/hardest to classify?
- Are similar activities (walking/jogging) often confused?
- Does the model have bias toward more frequent activities?
- How does performance compare to random guessing?

In [None]:
# TODO: Train final k-NN model and evaluate multi-class performance
# Train final model with best k
# Make predictions on test set
# Print detailed classification report with activity names
# Calculate overall accuracy
# Compare to random baseline accuracy
# Show per-class accuracy for each activity

## Step 6: Multi-Class Confusion Matrix Analysis

The confusion matrix for multi-class problems is much more informative than binary classification. It shows exactly which activities are confused with each other.

### How to Read a Multi-Class Confusion Matrix:

- **Diagonal elements:** Correct predictions for each activity
- **Off-diagonal elements:** Confusion between different activities
- **Row sums:** Total actual samples for each activity  
- **Column sums:** Total predicted samples for each activity

### Key Analysis Questions:

- **Which activities are never confused?** Look for activities with high diagonal values and low off-diagonal values
- **Which activities are most similar?** Activities that are frequently confused might have similar sensor patterns
- **Is there systematic bias?** Does the model favour predicting certain activities over others?
- **What are the most common errors?** Which activity pairs are most frequently confused?

This analysis can help you understand the physical similarities between activities and guide future feature engineering!

In [None]:
# TODO: Create and analyze the multi-class confusion matrix
# Compute confusion matrix for all activities
# Create large heatmap visualization with activity names
# Analyze confusion patterns between activities
# Find most confused activity pairs
# Calculate and display detailed per-activity metrics

## Reflection and Advanced Challenges

### Questions to Consider:
1. **How did your multi-class k-NN perform?** Compare accuracy to random guessing and note which activities were hardest to classify.
2. **What patterns did you discover?** Which activities are most easily confused and why might that be?
3. **How did this compare to the other exercises?** Was multi-class harder than binary classification?
4. **What role did feature engineering play?** Did magnitude features help? What other features might work?

### Key Learnings from Activity Recognition:
- **Sensor fusion is powerful** - combining multiple sensor types improves performance
- **Similar activities are harder to distinguish** - walking vs jogging might be challenging
- **Feature engineering matters** - raw sensor readings vs derived features
- **Class imbalance affects performance** - activities with fewer samples are harder to classify
- **Real-world applications** - this is how fitness trackers and smartphones work!