Telco Customer Churn Classification with XGBoost

An end-to-end machine learning pipeline for customer churn prediction using the WA_Fn-UseC_-Telco-Customer-Churn.csv dataset and XGBoost.

Features

Clean preprocessing pipeline for mixed tabular data
Robust handling of TotalCharges conversion and missing values
One-hot encoding for categorical features
Train/test split with reproducible random seed
XGBoost classifier training and evaluation
Classification metrics: Accuracy, Precision, Recall, F1-score
Confusion matrix and ROC-AUC with ROC curve visualization
Top feature importance extraction
Optional hyperparameter tuning via RandomizedSearchCV
Ready-to-run Python script and Jupyter notebooks

Project Structure

.
├── WA_Fn-UseC_-Telco-Customer-Churn.csv
├── xgboost_churn_pipeline.py
├── xgboost_churn_pipeline.ipynb
├── xgboost_churn_pipeline.executed.ipynb
├── churn_quickstart.ipynb
├── churn_quickstart.executed.ipynb
└── Readme.md

Requirements

Python 3.12+
Virtual environment (venv)

Core dependencies:

pandas
scikit-learn
xgboost
matplotlib
notebook / jupyter

Installation

python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install pandas scikit-learn xgboost matplotlib notebook

Usage

1) Run the Python script

source .venv/bin/activate
python xgboost_churn_pipeline.py

Optional arguments:

# Specify dataset path explicitly
python xgboost_churn_pipeline.py --data-path "WA_Fn-UseC_-Telco-Customer-Churn.csv"

# Enable hyperparameter tuning (slower)
python xgboost_churn_pipeline.py --tune

2) Run the Jupyter Notebook

source .venv/bin/activate
jupyter notebook

Open:

xgboost_churn_pipeline.ipynb for full pipeline
churn_quickstart.ipynb for a quick dataset sanity check

Data Preprocessing Logic

The pipeline includes:

Drop customerID if present
Convert TotalCharges to numeric with coercion
Fill missing numeric values with median
Fill missing categorical values with mode
Encode target Churn: Yes -> 1, No -> 0
One-hot encode categorical predictors

Evaluation Outputs

After training, the pipeline prints:

Accuracy
Precision
Recall
F1-score
Confusion Matrix
ROC-AUC score

And displays:

ROC Curve
Top important features from XGBoost

Reproducibility

Train/test split uses random_state=42
Model defaults also use random_state=42

Notes

The script auto-detects a Telco churn CSV in the current directory if --data-path is not provided.
Hyperparameter tuning can significantly increase runtime.

Contributing

Contributions are welcome.

Fork the repository
Create a feature branch
Commit your changes with clear messages
Open a pull request describing motivation and impact

License

This project is available under the MIT License.
If you reuse this code, please keep attribution in your repository.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
Readme.md		Readme.md
WA_Fn-UseC_-Telco-Customer-Churn.csv		WA_Fn-UseC_-Telco-Customer-Churn.csv
xgboost_churn_pipeline.ipynb		xgboost_churn_pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Telco Customer Churn Classification with XGBoost

Features

Project Structure

Requirements

Installation

Usage

1) Run the Python script

2) Run the Jupyter Notebook

Data Preprocessing Logic

Evaluation Outputs

Reproducibility

Notes

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Telco Customer Churn Classification with XGBoost

Features

Project Structure

Requirements

Installation

Usage

1) Run the Python script

2) Run the Jupyter Notebook

Data Preprocessing Logic

Evaluation Outputs

Reproducibility

Notes

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages