AI-powered depression detection using NLP and machine learning. Includes text analysis and PHQ-9 questionnaire scoring.
This tool is for educational and research purposes only. It is NOT a substitute for professional medical advice, diagnosis, or treatment. If you or someone you know is experiencing mental health difficulties, please consult a licensed mental health professional.
- Text Analysis: Detects depression indicators from free-form text
- PHQ-9 Questionnaire: Standard clinical depression screening tool
- Two Model Options:
- Baseline: Fast TF-IDF + Logistic Regression
- Advanced: Fine-tuned DistilBERT transformer
- Risk Assessment: Categorizes risk levels (low, moderate, high, very high)
- Explainability: Shows key features contributing to predictions
- Web Interface: User-friendly Flask web application
- REST API: Easy integration with other applications
mental-health-ml/
├── data/
│ ├── raw/ # Place downloaded datasets here
│ └── processed/ # Processed data (auto-generated)
├── src/
│ ├── preprocess.py # Data cleaning and preprocessing
│ ├── train_baseline.py # Train TF-IDF + LR model
│ ├── train_transformer.py # Train DistilBERT model
│ └── inference.py # Prediction wrapper
├── notebooks/
│ └── 01_eda.py # Exploratory data analysis
├── models/ # Saved models (auto-generated)
├── outputs/
│ └── figures/ # Visualizations (auto-generated)
├── app.py # Flask web application
├── requirements.txt # Python dependencies
└── README.md # This file
# Create project directory
mkdir mental-health-ml
cd mental-health-ml
# Install Python dependencies
pip install -r requirements.txtKaggle Reddit Mental Health Dataset
- Go to: https://www.kaggle.com/datasets/suchintikasarkar/sentiment-analysis-for-mental-health
- Click "Download" (requires free Kaggle account)
- Extract the CSV file
- Place it in:
data/raw/mental_health.csv
# Create necessary directories
mkdir -p data/raw
mkdir -p data/processed
mkdir -p models
mkdir -p outputs/figures
mkdir -p notebooks
mkdir -p srcpython src/preprocess.pyThis will:
- Load the raw dataset
- Clean and anonymize text
- Remove noise (URLs, usernames, etc.)
- Save processed data to
data/processed/
python notebooks/01_eda.pyThis generates visualizations in outputs/figures/:
- Label distribution
- Text length analysis
- Top n-grams per class
- Word frequency analysis
Option A: Train Baseline Model (Fast, CPU-friendly)
python src/train_baseline.pyTraining time: ~5-10 minutes on CPU
Option B: Train Transformer Model (Better accuracy, needs GPU)
python src/train_transformer.pyTraining time: ~30-60 minutes on GPU, several hours on CPU
python src/inference.pyThis will run test predictions on sample texts.
python app.pyThen open your browser to: http://localhost:5000
- Size: ~10,000+ samples
- Labels: Binary (depression indicators vs normal)
- Source: Reddit posts from mental health subreddits
- License: Check Kaggle page for current terms
- Size: 189 clinical interviews
- Labels: PHQ-8 scores (0-24)
- Source: USC Institute for Creative Technologies
- License: Requires application and agreement
- Training Time: 5-10 minutes (CPU)
- Inference Speed: <10ms per text
- Expected Metrics:
- Precision: ~0.75-0.85
- Recall: ~0.70-0.80
- F1-Score: ~0.72-0.82
- PR-AUC: ~0.80-0.88
- Training Time: 30-60 minutes (GPU)
- Inference Speed: ~50-100ms per text
- Expected Metrics:
- Precision: ~0.80-0.90
- Recall: ~0.75-0.85
- F1-Score: ~0.78-0.87
- PR-AUC: ~0.85-0.92
curl -X POST http://localhost:5000/api/predict \
-H "Content-Type: application/json" \
-d '{"text": "I feel so alone and worthless", "return_explanation": true}'Response:
{
"label": 1,
"probability": 0.873,
"risk_level": "high",
"model_type": "baseline",
"explanation": [
{"word": "alone", "contribution": 0.234},
{"word": "worthless", "contribution": 0.198}
]
}curl -X POST http://localhost:5000/api/phq9 \
-H "Content-Type: application/json" \
-d '{"responses": [2, 2, 1, 2, 1, 2, 1, 1, 0]}'Response:
{
"total_score": 12,
"severity": "moderate",
"emergency_flag": false,
"q9_score": 0
}Edit app.py line 12:
MODEL_TYPE = 'baseline' # or 'transformer'Edit src/inference.py in the __init__ method:
self.threshold = 0.5 # Lower = higher recall, higher = higher precisionEdit hyperparameters in:
src/train_baseline.py- lines 38-42 (param_grid)src/train_transformer.py- lines 63-75 (TrainingArguments)
- Low (0-30%): Minimal indicators of depression
- Moderate (30-60%): Some indicators present
- High (60-80%): Significant indicators present
- Very High (80-100%): Strong indicators present
- Minimal (0-4): Little to no depression
- Mild (5-9): Mild depression symptoms
- Moderate (10-14): Moderate depression
- Moderately Severe (15-19): Moderately severe depression
- Severe (20-27): Severe depression
If PHQ-9 Question 9 (thoughts of self-harm) scores > 0, an emergency flag is raised and crisis resources are displayed prominently.
- Anonymization: All usernames, emails, and personal identifiers are removed
- No Storage: Web app doesn't store user inputs by default
- Consent: Always get explicit consent before storing any text data
- Encryption: Use HTTPS in production
- Not a Diagnosis: Emphasize this is a screening tool, not diagnostic
- Professional Referral: Always recommend consulting licensed professionals
- Crisis Response: Implement immediate crisis resource display for high-risk cases
- Bias Awareness: Monitor for demographic disparities in predictions
- Transparency: Explain how the model works to users
pip install transformers datasets torchReduce batch size in src/train_transformer.py:
per_device_train_batch_size=8, # instead of 16Make sure you've trained a model first:
python src/train_baseline.pyCheck that your CSV is in the correct location:
- For Kaggle:
data/raw/mental_health.csv
- PHQ-9 Information: https://www.hiv.uw.edu/page/mental-health-screening/phq-9
- Transformers Library: https://huggingface.co/docs/transformers
- Scikit-learn: https://scikit-learn.org/stable/
- Kaggle Datasets: https://www.kaggle.com/datasets
Contributions are welcome! Areas for improvement:
- Add LIME/SHAP explainability visualizations
- Implement model calibration
- Add demographic bias analysis
- Add unit tests
- Implement A/B testing framework
- Add model monitoring dashboard
This project is for educational purposes. Please check individual dataset licenses before use.
Built with ❤️ for mental health awareness and education.
Remember: It's okay to not be okay. Help is available.# MindScan_-Mental-Health-Detection-System