A machine learning classification project that predicts which telecom customers are likely to churn. Compares Logistic Regression, Random Forest and Gradient Boosting on 7,043 real customer records.
- 🎯 Project Overview
- 📊 Key Questions Answered
- 📈 Visualizations
- 🛠️ Technologies Used
- 📁 Project Structure
- 🚀 How to Run
- 💡 Key Findings
- 👨💻 Author
This project applies binary classification to predict customer churn for a telecom company. The dataset contains 7,043 customers with features covering demographics, account information, subscribed services and billing details.
The analysis covers:
- Exploratory data analysis of churn patterns
- Data cleaning, feature engineering and scaling
- Model training — Logistic Regression, Random Forest and Gradient Boosting
- Model comparison using accuracy and 5-fold cross-validation
- Feature importance analysis to identify key churn drivers
- What is the overall churn rate?
- How does contract type affect churn?
- Do monthly charges and tenure influence churn behaviour?
- Which model best predicts customer churn?
- What are the most important features in predicting churn?
- Language: Python 3.12
- Data Manipulation: Pandas, NumPy
- Machine Learning: Scikit-learn
- Visualization: Matplotlib, Seaborn
- Environment: Jupyter Notebook
ChurnGuard/
├── analysis.ipynb ← Main analysis notebook
├── requirements.txt
├── LICENSE
├── README.md
├── data/
│ └── WA_Fn-UseC_-Telco-Customer-Churn.csv ← Raw dataset (not tracked by git)
└── outputs/
├── churn_overview.png
├── tenure_charges.png
├── model_comparison.png
├── confusion_matrix.png
└── feature_importance.png
1. Install dependencies:
pip install -r requirements.txt2. Download the dataset:
Get the CSV from Kaggle and place it inside the data/ folder.
3. Run the notebook:
jupyter notebook analysis.ipynbRun all cells top to bottom. Charts will be saved automatically to outputs/.
- Overall churn rate is approximately 26% across all customers
- Month-to-month contracts have a significantly higher churn rate than one or two-year contracts
- Fibre optic internet service customers churn at a higher rate than DSL customers
- Customers with shorter tenure and higher monthly charges are more likely to churn
- Tenure, monthly charges and contract type are the top predictors of churn
Berke Arda Turk
Data Science & AI Enthusiast | Computer Science (B.ASc)
🌐 Portfolio · 💼 LinkedIn · 🐙 GitHub




