# Conclusion

## Methods for outlier detection:

- **Univariate methods**: Rely on evaluating each variable or principal component separately.  
  - Z-score is a typical example: simple and interpretable, but may miss anomalies that only become visible when multiple variables act together.
    
- **Multivariate methods**: Evaluate samples based on the joint distribution of variables.  
  - Distance-based approaches include score distance (Hotelling’s T², Mahalanobis) and reconstruction error, which measure how well a sample fits the PCA model.  
  - Model-based approaches, such as Isolation Forest, capture irregular patterns directly without relying on covariance estimation.  

# Conclusion

## Is PCA necessary for outlier detection?

- **For univariate methods**  
  - Outliers in raw space may remain hidden when features are correlated.  
  - After PCA, variance is concentrated and correlations are removed, making univariate detection more precise.  
  - PCA is **highly beneficial** for univariate detection.  

- **For multivariate distance-based methods**  
  - Raw correlations can distort distance measures, leading to unstable detection.  
  - PCA reduces this effect by orthogonalizing components, improving stability.  
  - PCA is **recommended** for distance-based detection.  

- **For multivariate model-based methods**  
  - Advanced models (e.g., Isolation Forest) already account for interactions between variables. 
  - Their detection performance changes little with PCA transformation.  
  - PCA is **optional** for model-based methods.  

# Conclusion

## Limitations of PCA for outlier detection

- **Linear assumption**: PCA only captures linear correlations; outliers in non-linear structures may remain hidden.  
- **Sensitivity to scaling and noise**: Without proper preprocessing, principal components may be distorted.  
- **Threshold choice**: Cut-offs (e.g., 98% quantile) are somewhat arbitrary and may vary across datasets.  


---


## Future directions
- **Robust PCA**: More resilient to noise and outliers during training, providing stable detection.  
- **Hybrid methods**: Combine PCA with clustering, kernel methods, or ensemble techniques to capture non-linear outliers.  
- **Better preprocessing**: Standardization, noise reduction, and feature engineering can improve detection quality.  