# XGBoost Model Documentation for Customer Classification

## 1. Objective
The purpose of this project is to build a machine learning model that classifies customers into two categories based on various behavioral and transactional features. The model helps identify potential customers who are more likely to engage or convert, allowing for more targeted marketing strategies.

## 2. Data Overview
The dataset includes both raw and engineered features, with a balanced class distribution achieved using SMOTE. The key features used in the final model are:

- `vehicle_condition_score`
- `trade_in_history`
- `engagement_to_age_ratio`
- `incentive_received`
- `customer_engagement_score`

These features were selected using `RandomForestClassifier.feature_importances_` method.

## 3. Data Preprocessing
- Removed unnecessary columns and duplicates.
- Filled missing values using median or most frequent values.
- Scaled numerical features using `StandardScaler`.
- Applied SMOTE to balance class distribution.
- Split the dataset into 80% training and 20% testing.

## 4. Feature Engineering
Additional features were created to enhance model performance:
- `engagement_to_age_ratio` = engagement_score / (age + 1)
- `customer_engagement_score` = log1p(engagement_score * incentive_received)

## 5. Model Training
The final model selected was **XGBoostClassifier**, which outperformed other models in terms of accuracy and F1-score.

### Model Parameters
- `learning_rate`: 0.1
- `max_depth`: 6
- `n_estimators`: 100
- `random_state`: 42

## 6. Evaluation Metrics
**Confusion Matrix**:
```
[[27587  1202]
 [ 2056  9155]]
```

**Classification Report**:
```
              precision    recall  f1-score   support

           0       0.93      0.96      0.94     28789
           1       0.88      0.82      0.85     11211

    accuracy                           0.92     40000
   macro avg       0.91      0.89      0.90     40000
weighted avg       0.92      0.92      0.92     40000
```

## 7. Model Comparison
XGBoost performed best in terms of:
- Overall accuracy: **92%**
- Balanced precision and recall across both classes

## 8. Recommendations
- Use this XGBoost model in production for real-time classification.
- Continuously monitor model performance and retrain periodically with fresh data.
- Explore additional behavioral features to further improve performance.

## 9. Tools and Libraries Used
- Python
- Pandas, NumPy
- Scikit-learn
- XGBoost
- SMOTE (from imbalanced-learn)

## 10. Author
This model and documentation were prepared as part of a proof of concept to evaluate machine learning techniques for customer behavior classification.