# Report: Activity Recognition Using Sensor Data

## 1. Main Objective of the Analysis

The main objective of this analysis is to predict the type of activity performed by individuals based on sensor data. The focus is on leveraging various classification models to identify key factors influencing activity recognition. This analysis provides insights into the accuracy and reliability of different models, aiding in the development of efficient monitoring systems for health and activity tracking.

## 2. Description of the Data Set

The dataset consists of sensor readings collected from individuals performing various activities. The dataset includes the following columns:
- **Time**: Time in seconds starting from 0 rounded to the closest 0.025s
- **Frontal Accel**: Acceleration reading in G for the frontal axis
- **Vertical Accel**: Acceleration reading in G for the vertical axis
- **Lateral Accel**: Acceleration reading in G for the lateral axis
- **Antenna ID**: ID of the antenna reading sensor
- **RSSI**: Received signal strength indicator
- **Phase**: Phase of the signal
- **Frequency**: Frequency of the signal
- **Activity Label**: Label of activity (1: sit on bed, 2: sit on chair, 3: lying, 4: ambulating)

## 3. Data Exploration and Cleaning

Initial data exploration involved loading the dataset and checking for any missing values or anomalies. The dataset was cleaned by removing unnecessary columns and handling missing values. Feature engineering included calculating the overall acceleration from the frontal, vertical, and lateral acceleration components and encoding categorical variables such as patient ID and gender.

## 4. Training Classifier Models

Three different classifier models were trained:
1. **Random Forest Classifier**: This model was chosen for its robustness and ability to handle complex data structures.
2. **Logistic Regression**: Used as a baseline model for comparison.
3. **Support Vector Machine (SVM)**: Employed to explore its performance in high-dimensional spaces.

All models were trained using the same training and test splits (80% training, 20% testing). Cross-validation was performed to evaluate the generalizability of the models.

### Random Forest Classifier Results:
- **Accuracy**: 99.29%
- **Classification Report**:

          precision    recall  f1-score   support

       1       0.98      0.99      0.99      3313
       2       0.98      0.99      0.98       950
       3       1.00      1.00      1.00     10307
       4       0.96      0.84      0.90       456

accuracy                           0.99     15026


### Cross-Validation Scores:
- **Cross-validation scores**: [0.9541, 0.9615, 0.9029, 0.9330, 0.8943]
- **Mean cross-validation score**: 92.92%

### Feature Selection:
Using a Random Forest classifier, feature importances were computed, and a model was trained on selected important features. The selected features included 'Time', 'Frontal Accel', 'Vertical Accel', 'Lateral Accel', 'Antenna ID', 'RSSI', and 'Activity Label'. This model achieved an accuracy of 99.15%.

### Random Forest Classifier with Selected Features:
- **Accuracy**: 99.15%
- **Classification Report**:
          precision    recall  f1-score   support

       1       0.98      0.99      0.98      3313
       2       0.97      0.98      0.98       950
       3       1.00      1.00      1.00     10307
       4       0.95      0.84      0.89       456

accuracy                           0.99     15026
- **Selected Features**:
'Time', 'Frontal Accel', 'Vertical Accel', 'Lateral Accel',
'Antenna ID', 'RSSI', 'Activity Label'

## 5. Model Selection

The **Random Forest Classifier** was recommended due to its highest accuracy and robust performance in both initial training and cross-validation. It also provided valuable insights into feature importance, which can be crucial for understanding the key drivers of activity recognition.

## 6. Key Findings and Insights

- **Main Drivers**: The most significant predictors of activity were found to be the acceleration readings across different axes and the RSSI values. The overall acceleration was a critical factor in distinguishing between different activities.
- **Activity Patterns**: Specific activities like lying down and sitting had distinct acceleration profiles, making them easier to classify accurately.

Visualizations (not included here) such as feature importance plots and confusion matrices supported these findings, illustrating the effectiveness of the Random Forest model in differentiating between activities.

## 7. Suggestions for Next Steps

- **Model Refinement**: Further analysis can include fine-tuning the Random Forest model parameters and exploring other ensemble methods to enhance prediction accuracy.
- **Additional Data Features**: Incorporating more contextual data, such as environmental factors or physiological measurements, could improve model performance.
- **Real-time Implementation**: Developing a real-time monitoring system using the trained model to provide immediate feedback and activity recognition.

By addressing these next steps, the analysis can be further refined, leading to more accurate and actionable insights for health and activity monitoring systems.
