# Model Comparison: Random Forest vs. XGBoost
**

## **Overview**
This project evaluates the performance of **Random Forest** and **XGBoost** on a diabetes prediction dataset. The focus is on predictive accuracy and feature importance, without hyperparameter tuning.

---

## **Dataset**
- **Source**: National Institute of Diabetes and Digestive and Kidney Diseases.
- **Instances**: 768.
- **Features**: 8 independent variables (e.g., Glucose, BMI, Age) and 1 target (`Outcome`: 1 for diabetic, 0 for non-diabetic).
- **Quality**: Clean data with no missing values.

---

## **Workflow**
1. **Data Preprocessing**: Split into training (80%) and testing (20%) sets.
2. **Model Training**:
   - Random Forest and XGBoost were trained with default parameters.
3. **Evaluation**:
   - Metrics: Accuracy, Precision, Recall, F1 Score, and ROC-AUC.
   - Feature importance analyzed using built-in methods (Random Forest) and SHAP (XGBoost).

---

## **Results**
| **Metric**       | **Random Forest** | **XGBoost**   |
|-------------------|-------------------|---------------|
| **Accuracy**      | 0.7208            | 0.7078        |
| **Precision**     | 0.6071            | 0.5806        |
| **Recall**        | 0.6182            | 0.6545        |
| **F1 Score**      | 0.6126            | 0.6154        |
| **ROC-AUC**       | 0.8120            | 0.7666        |

- **Random Forest**: Better overall performance with higher accuracy and ROC-AUC.
- **XGBoost**: Higher recall, capturing more true positives.

---

## **Explainability**
- **Random Forest**: Top features: Glucose, BMI, Age (derived from feature importance).
- **XGBoost**: SHAP analysis confirms Glucose as the most significant predictor, followed by BMI and Age.

---

## **Future Enhancements**
1. Optimize models using hyperparameter tuning (e.g., GridSearchCV).
2. Experiment with additional algorithms (e.g., Neural Networks, SVMs).
3. Develop a dashboard for interactive data visualization and prediction analysis.

---

## **How to Use**
1. Install dependencies:
   ```bash
   pip install scikit-learn xgboost shap pandas numpy matplotlib seaborn
