This project builds a complete end-to-end machine learning solution to predict customer churn in a telecom company. The objective is to identify customers who are likely to discontinue their services, enabling businesses to take proactive retention strategies.
The system is developed using FastAPI for backend APIs, Streamlit for an interactive dashboard, and Docker for containerization, making it scalable and production-ready.
Customer churn is a major challenge in the telecom industry. Retaining customers is more cost-effective than acquiring new ones. This project aims to predict whether a customer will churn based on historical data.
Exploratory Data Analysis was performed to understand customer behavior and churn patterns.
Key analysis performed:
- Churn distribution analysis (class imbalance)
- Relationship between churn and contract type
- Impact of tenure on churn
- Monthly charges vs churn behavior
- Service-based churn analysis (OnlineSecurity, TechSupport, etc.)
- Data cleaning and missing value handling
To improve model performance, several transformations were applied:
- Removed irrelevant column (
customerID) - Converted
TotalChargesto numeric and handled missing values - Created new feature:
MonthlyCharges_category - Converted categorical variables into numerical format
- Applied encoding for multi-category features
Multiple models were trained and evaluated:
- Logistic Regression
- Decision Tree
- Random Forest
- Gradient Boosting
- K-Nearest Neighbors (KNN)
- Naive Bayes
- Stacking Classifier
The models were evaluated using classification metrics:
- Accuracy
- Precision
- Recall
- ROC-AUC Score
Note: Recall was prioritized to ensure maximum identification of churn customers.
Random Forest (GridSearchCV tuned) was selected as the final model because:
- It achieved higher recall, which is crucial in churn prediction
- It effectively captured customer behavior patterns
- It provided stable and reliable performance
- Feature importance was extracted using Random Forest
- LIME was used for local interpretability
- SHAP was used to understand feature contribution
Top influencing features:
- Contract type (Month-to-month)
- Tenure
- Online Security
- Tech Support
- Monthly Charges
A complete machine learning pipeline was built to:
- Automate preprocessing
- Prevent data leakage
- Ensure consistent predictions
User → Streamlit Dashboard → FastAPI API → ML Model → Prediction
Programming:
- Python
Data Processing & Analysis:
- Pandas
- NumPy
Machine Learning:
- Scikit-learn
- SMOTE
Model Explainability:
- SHAP
- LIME
API Development:
- FastAPI
- Pydantic
Frontend Dashboard:
- Streamlit
Data Visualization:
- Matplotlib
- Seaborn
Containerization & Deployment:
- Docker
Development Tools:
- Jupyter Notebook
- Git
- GitHub
PROJECT-2-TELECOM_CUSTOMER_CHURN_ANALYSIS
|
+---api
| |___ main.py
+---apps
| |___ streamlit.py
|
+---datasets
| |___ WA_Fn-UseC_-Telco-Customer-Churn.csv
|
+---DockerFile
| |___ .dockerignore
| |___ docker-compose.yml
| |___ Dockerfile.fastapi
| |___ Dockerfile.streamlit
| |___ requirements_fastapi.txt
| |___ requirements_streamlit.txt
|
+---DockerFile_run_directly_by_image
| |___ docker-compose.yml
| |___ start.bat
| |___ stop.bat
|
+---models
| |___ churn_pipeline.pkl
|
+---notebooks
| |___ create_joblib_file.ipynb
| |___ customer_churn_analysis.ipynb
| |___ ml_pipeline_final_model.ipynb
|
+---src
| |___ featureengineering.py
| |___ pipeline.py
| |___ preprocessor.py
| |___ pydantic_model.py
| |___ shap.py
|
+--- .gitignore
+--- README.md
+--- requirements.txt
+--- requirements_exactly.txt
+--- screenshot.png
-
Install Docker Desktop
-
Clone the repository
git clone https://github.com/chetansgode/Telecom_Customer_Churn_Analysis.git
-
Navigate to project folder
+---DockerFile | |___ .dockerignore | |___ docker-compose.yml | |___ Dockerfile.fastapi | |___ Dockerfile.streamlit | |___ requirements_fastapi.txt | |___ requirements_streamlit.txt
-
Run:
docker compose up --build
-
Access:
- Streamlit → http://localhost:8501
- FastAPI Docs → http://localhost:8000/docs
-
Stop:
docker compose down
-
Install Docker Desktop and start it
-
Download only this folder:
docker_run_directly_by_image
- Open the folder and double click:
start.bat
- Access:
- Streamlit → http://localhost:8501
- FastAPI → http://localhost:8000/docs
- To stop: double click stop.bat
- The model achieved strong performance with a focus on high recall
- Successfully identifies most customers likely to churn
- Helps businesses take proactive retention actions
- Model performance depends on dataset quality
- Class imbalance may still affect predictions
- Model requires retraining for new data
- Not deployed on cloud infrastructure
- Implement advanced models like XGBoost
- Deploy on cloud platforms (AWS/Azure)
- Add real-time prediction pipeline
- Implement model monitoring and retraining
Random Forest was selected as the final model due to its higher recall, making it more effective in identifying customers likely to churn. This system can help telecom companies reduce customer loss and improve retention strategies.
Name : Chetan S. Gode
