This project delivers a Supervised Learning solution aimed at predicting user Stress_Level based on daily behavioral and physiological indicators. Utilizing a comprehensive dataset of 55,000 samples and 18 features (including sleep duration, caffeine intake, caloric consumption, daily steps, and workout metrics), we bypass medical assumptions by mapping raw daily habits directly to stress levels.
To maximize predictive performance and eliminate risk of overfitting, the system implements a robust 2-Tier Stacking Ensemble architecture:
- Tier 1 (Base Models): Trains three independent regressors representing diverse algorithmic paradigms: XGBoost (Boosting), Random Forest (Bagging), and SVR (Vector Space).
- Tier 2 (Meta-Model): Employs Ridge Regression to dynamically learn the optimal weights of the Tier 1 predictions, compensating for individual model errors to yield the final
Stress_Level.
- Language: Python
- Machine Learning: Scikit-learn, XGBoost
- Data Processing & Visualization: Pandas, NumPy, Matplotlib, Seaborn
- Deployment / Interactive Demo: Streamlit / Gradio
-
Member 1 (Team Leader / Data Engineer): Exploratory Data Analysis (EDA), automated preprocessing pipelines (
StandardScaler), handling missing values, and version control management. -
Member 2 (ML Engineer - Tier 1): Train/Test splitting, K-Fold Cross-Validation setup, and hyperparameter tuning (
GridSearchCV) for the three Base Models. -
Member 3 (ML Engineer - Tier 2): Meta-feature extraction, Tier 2 Ridge Regression configuration, baseline-vs-stacking performance evaluation metrics (
$RMSE$ ,$MAE$ ,$R^2$ ), and lead presenter. -
Member 4 (Full-stack & UI Engineer): Model serialization (
.pklpackaging), building the interactive web application demo, and designing the academic slide deck.