# Health Event Prediction for Older Adults with Dementia

## Overview
A machine learning project that predicts health events in older adults with dementia using activity patterns, sleep data, and vital signs. This model achieves **87.1% ROC-AUC accuracy** in identifying patients at risk of health complications.

## Key Results
- **ROC-AUC Score: 0.871** (87.1% accuracy)
- **Recall: 51%** - Catches over half of health events
- **Precision: 84%** - Accurate predictions
- **Top Predictors:** Blood pressure, heart rate, and daily activity patterns

## Dataset
- **Activity Data:** 1,030,559 timestamped location/activity records
- **Sleep Data:** 461,423 sleep state and vital sign measurements
- **Physiology Data:** 17,679 vital sign readings (temperature, blood pressure, weight, etc.)
- **Health Labels:** 608 timestamped health events across 6 event types

## Top Features for Prediction
1. Diastolic Blood Pressure (10.7%)
2. Systolic Blood Pressure (10.5%)
3. Heart Rate (9.8%)
4. Body Weight (7.3%)
5. Daily Activity Count (6.3%)

## Project Workflow

### 1. Data Exploration (EDA)
- Analyzed activity patterns across 60 patients over 6+ months
- Identified activity ranges: 1-912 activities per day
- Found 17.3% of days have health events

### 2. Feature Engineering
- **Activity Features:** 7-day averages, deviation from baseline, activity changes
- **Sleep Features:** Sleep duration, heart rate, respiratory rate, snoring episodes
- **Physiology Features:** Blood pressure, heart rate, body temperature, weight
- **Total Features:** 21 engineered features

### 3. Model Development
- **Algorithm:** Random Forest Classifier (200 trees, max depth 15)
- **Training Data:** 2,718 daily records
- **Train/Test Split:** 80/20
- **Cross-validation:** Yes

### 4. Results & Insights
- Combined activity + sleep + vitals improved ROC-AUC from 0.57 to 0.87
- Blood pressure is the strongest predictor (combined 21.2% importance)
- Model misses 49% of events but is highly precise (84%)

## Files in This Repository
- `aging_project_eda.ipynb` - Complete Jupyter notebook with all analysis and model training
- `health_event_prediction_model.pkl` - Trained Random Forest model (pickle format)
- `feature_importance.csv` - Feature importance scores
- `complete_health_dataset.csv` - Merged dataset with all engineered features
- `feature_importance.png` - Bar chart of top 10 features
- `model_performance.png` - Confusion matrix and ROC curve
- `activity_trends.png` - Activity patterns for sample patients

## Data Files
- `Activity.csv` - Patient activity/location data
- `Sleep.csv` - Sleep state and vital signs
- `Physiology.csv` - Daily vital sign measurements
- `Labels.csv` - Health event labels

## Technologies Used
- **Python 3.x**
- **Pandas** - Data manipulation
- **NumPy** - Numerical computations
- **Scikit-learn** - Machine learning
- **Matplotlib & Seaborn** - Visualizations

## How to Use the Model



## Key Insights
1. **Blood pressure is critical** - Systolic and diastolic BP are the strongest predictors
2. **Activity patterns matter** - Baseline activity and deviations are important signals
3. **Multi-modal data improves predictions** - Combining activity, sleep, and vitals increased accuracy by 35%
4. **Early detection is possible** - 51% recall means we can catch about half of health events proactively

## Future Improvements
- Incorporate additional patient demographics
- Test with LSTM/GRU models for time-series analysis
- Implement SHAP for model explainability
- Develop real-time alerting system
- Collect more data for improved generalization

## Author
Your Name (Graduate Student - Social Work/Research)

## License
MIT License - Feel free to use and modify

## Contact
- GitHub: [https://github.com/EMMA7300]
- Email: [emmanueleugene20@gmail.com]
