# Human Activity Recognition - Theory Answers

## Task 1: Exploratory Data Analysis (EDA)

### Q1: Waveform plots for each activity class

In [None]:
# Insert your plotting code here


**Answer:**
- Static activities (laying, sitting, standing) show nearly flat and stable signals.  
- Dynamic activities (walking, upstairs, downstairs) show periodic fluctuations representing step cycles.  
- Clear differences exist, so a model should be able to classify the activities.


### Q2: Static vs Dynamic differentiation using linear acceleration

In [None]:
# Insert calculation and plotting of acc = sqrt(acc_x^2 + acc_y^2 + acc_z^2)


**Answer:**
- Static activities → nearly constant acceleration magnitude.  
- Dynamic activities → higher variation and periodic patterns.  
- A threshold-based rule could separate static vs dynamic without ML.  
- However, to distinguish between *different* dynamic activities (walking vs upstairs vs downstairs), ML is needed.


### Q3: PCA Visualization

In [None]:
# Insert PCA code + scatter plots for raw, TSFEL, dataset features


**Answer:**
- PCA on raw total acceleration: weak separation.  
- PCA with TSFEL features: better separation since features capture statistical properties.  
- PCA with provided dataset features: best separation.  
- Conclusion: dataset-provided features are the best for visualization.


### Q4: Correlation Matrix

In [None]:
# Insert correlation heatmap code here


**Answer:**
- Many features are highly correlated (e.g., mean & median, variance & energy).  
- Some features are redundant.  
- Feature selection can reduce dimensionality without major info loss.


## Task 2: Decision Trees for HAR

### Q1: Decision Trees with raw, TSFEL, dataset features


**Answer:**
- Raw data → lowest accuracy (high dimensional & noisy).  
- TSFEL features → moderate performance.  
- Provided dataset features → best accuracy, precision, recall, confusion matrix.


### Q2: Varying tree depth

In [None]:
# Insert accuracy vs depth plot here


**Answer:**
- Small depth → underfitting.  
- Increasing depth → accuracy improves.  
- Too large depth → overfitting.  
- Optimal depth ~ 4–6.


### Q3: Poor performance participants/activities


**Answer:**
- Yes, confusion occurs between walking_upstairs and walking_downstairs due to similar signals.  
- Variations in participant movement style or phone placement also reduce accuracy.


## Task 3: Data Collection in the Wild

### Q1: Model on self-collected data using UCI-HAR trained tree


**Answer:**
- Accuracy is lower than on UCI dataset due to different phone placement and noisy real-world conditions.  
- Sensitive to alignment and consistency.


### Q2: Using personal dataset


**Answer:**
- With preprocessing (normalization, TSFEL features), performance improves.  
- Without preprocessing, performance drops.  
- Real-world performance is lower than in controlled datasets.


# Decision Tree Implementation - Theory Answers

## Part A: Classification Dataset

In [None]:
# Example dataset generation
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt

X, y = make_classification(
    n_features=2, n_redundant=0, n_informative=2, 
    random_state=1, n_clusters_per_class=2, class_sep=0.5
)

plt.scatter(X[:, 0], X[:, 1], c=y)
plt.title("Synthetic Classification Dataset")
plt.show()


**Answer (Theory):**
- Train on 70% of the data, test on 30%.
- Report metrics: Accuracy, per-class precision, recall.
- The decision tree should achieve reasonable separation given 2 informative features.


### 5-Fold Cross Validation & Optimum Depth

In [None]:
# Insert nested cross-validation code here


**Answer (Theory):**
- Use 5-fold cross-validation with nested CV to tune depth.
- As depth increases → better fit, but risk of overfitting.
- Optimum depth balances bias-variance (likely small, e.g. 3–6 for simple 2D dataset).


## Part B: Auto Efficiency Dataset

In [None]:
# Insert training & evaluation on Auto-Efficiency dataset


**Answer (Theory):**
- Custom decision tree performs comparably to sklearn’s `DecisionTreeClassifier/Regressor` on structured data.
- Sklearn is typically more optimized, so it may be slightly faster or more accurate with default settings.


## Part C: Runtime Complexity Experiments

In [None]:
# Insert experiment code for varying N (samples) and M (features)


**Answer (Theory):**
- **Learning Time Complexity:** O(N * M * log N) for decision tree construction.

- **Prediction Time Complexity:** O(depth of tree), typically O(log N).

- Experiments should confirm theory:

  - As N grows → training time increases roughly linearly with N.

  - As M grows → training time grows linearly with M.

  - Prediction time stays low, scaling with depth.

- Across four cases (discrete/real inputs & outputs), trends hold, though regression may take slightly more time due to MSE splits.
