In [None]:
# Mental Health Depression Risk Prediction – Final Report

This notebook summarizes the entire project: data, model, evaluation, debiasing, and deployment.

In [None]:
## 1. Project Summary

- **Goal:** Build a machine learning model to estimate depression risk for students based on survey responses.
- **Usage:** Early screening / awareness tool, **not** a diagnostic device.
- **Pipeline:**
  1. Data loading and cleaning  
  2. Exploratory data analysis (01_eda.ipynb)  
  3. Preprocessing & feature engineering (02_data_cleaning.ipynb)  
  4. Model training & debiasing (03_model_training.ipynb)  
  5. Web app deployment (Streamlit)

In [1]:
import os
import pandas as pd
import json
import joblib

if 'notebooks' in os.getcwd():
    os.chdir('..')

results_df = pd.read_csv('docs/model_comparison.csv', index_col=0)
with open('models/selected_features.json') as f:
    selected_features = json.load(f)

results_df

Unnamed: 0,F1-Score,Accuracy,Precision,Recall,Specificity,ROC-AUC,Bias Gap
Random Forest (Debiased),0.863882,0.861362,0.848333,0.880012,0.842717,0.935241,0.037295
Logistic Regression (Debiased),0.862179,0.861209,0.856065,0.868381,0.854039,0.934755,0.014342


In [None]:
## 2. Data & Cleaning

- **Dataset:** Student depression / mental health survey  
- **Rows:** ~27,900  
- **Features:** ~18 original + engineered features  
- **Target:** Binary depression label  

Cleaning steps:
- Dropped rows with missing target.
- Imputed numeric features with median, categorical with most frequent value.
- Encoded categorical variables using label encoding.
- Applied light feature engineering (e.g., interaction terms).

In [None]:
## 3. Model & Performance

We trained Logistic Regression and Random Forest models and then added debiasing:

- Used `class_weight='balanced'` to handle class imbalance.
- Optionally used SMOTE to balance the training data.
- Measured not just accuracy/F1 but also sensitivity, specificity, and bias gap.

In [2]:
len(selected_features), selected_features[:10]

(15,
 ['Have you ever had suicidal thoughts ?',
  'Academic Pressure',
  'Financial Stress',
  'Work/Study Hours',
  'Dietary Habits',
  'Age',
  'Study Satisfaction',
  'Family History of Mental Illness',
  'Degree',
  'CGPA'])

In [None]:
The model uses the most predictive ~15 features (see above for examples).  
Only these features are required as input in the web app.

In [None]:
## 4. Debiasing & Sanity Checks

- Original model tended to predict "depressed" too often because the dataset was imbalanced.
- Fixes:
  - Applied class balancing (`class_weight='balanced'`).
  - Evaluated sensitivity and specificity separately.
  - Computed a **bias gap** (difference between sensitivity and specificity).
  - Tested with a manually constructed "healthy" profile (low stress, high satisfaction) and confirmed it is usually predicted as **Not Depressed**.

In [None]:
## 5. Web Application

The model is exposed through a Streamlit app (`app/app_streamlit.py`):

- Loads the trained model, scaler, and feature list.
- Asks the user to fill in numeric sliders/inputs for each feature.
- Returns:
  - High / Low risk label.
  - Probability of depression.
  - Visual bar chart of probabilities.
  - Suggested next steps and a medical disclaimer.

To run locally:

In [None]:
## 6. Limitations & Ethical Considerations

- Based on self‑reported survey data; answers may be noisy or biased.
- Dataset may not generalize to all populations.
- Model is a screening aid, not a diagnostic tool.
- Predictions must be interpreted by professionals and not used alone for serious decisions.

In [None]:
## 7. Conclusion

We built and deployed a full depression‑risk prediction pipeline:

- Cleaned and processed a large student mental health dataset.
- Trained and debiased a Logistic Regression model with strong metrics.
- Deployed the model as a Streamlit web app for interactive use.
- Documented the workflow and limitations to support safe, responsible use.

Future work:
- Collect more diverse data.
- Add more robust debiasing and calibration.
- Integrate with real counseling workflows.