MasterSoft - Customer Churn Prediction

End-to-End Machine Learning Project

Overview

This project aims to predict customer churn using machine learning models. The dataset contains information about customer demographics, service plans, billing, and churn status, which is analyzed and used to train various classification models. The best-performing model is deployed as an API for real-time predictions.

Dataset Description

Rows: 360
Columns: 21
Target Column: Churn (Yes/No)

This dataset contains customer data, including demographic information, service details, and billing information. The data is used to understand the reasons for customer churn and create machine learning models for churn prediction.

Column Information

The dataset consists of the following columns:

customerID (object): Unique identifier for the customer.
gender, Partner, Dependents, PhoneService, etc. (object): Categorical columns representing various customer features.
SeniorCitizen, tenure (int64): Numeric columns representing customer attributes.
MonthlyCharges, TotalCharges (float64): Numeric columns related to billing information.
Churn (object): Target column with one missing value that was removed.

Data Preprocessing

Handling Missing Values:
- Removed one missing value in the Churn column (at index 359).
Handling Non-Numeric Data in TotalCharges:
- Converted TotalCharges to float64 and checked for non-numeric entries.
- If necessary, missing values would be imputed using mean, median, or mode, but in this case, no non-numeric values were found.

Exploratory Data Analysis

Tenure vs. Churn

Customers with tenures below 20 months are highly prone to churn. This indicates the importance of early retention strategies.
After 60 months, the churn rate drops significantly, suggesting that long-tenured customers are more loyal.

Gender vs. Churn, TotalCharges vs. Churn, and More

Univariate and bivariate analyses were conducted to explore relationships between customer attributes and churn behavior.

Feature Engineering and Scaling

Feature Encoding

Categorical Features: Converted categorical variables (e.g., gender, Partner) into numerical format using One-Hot Encoding or Label Encoding.

Feature Scaling

Applied StandardScaler or MinMaxScaler to numerical features (tenure, MonthlyCharges, TotalCharges) for normalization.

New Feature Creation

Created a feature combining StreamingTV and StreamingMovies.
Created an average_monthly_charge feature (TotalCharges / tenure).

Model Training

Train-Test Split

Split the dataset into training and testing sets using an 80-20 split to ensure reproducibility.

Model Selection

The following models were trained to predict customer churn:

Logistic Regression
Random Forest
Gradient Boosting
Support Vector Machine (SVM)
K-Nearest Neighbors (KNN)

Model Performance Evaluation

Model	Accuracy	Precision	Recall	F1-score	ROC-AUC Score
Logistic Regression	77.78%	61.54%	42.11%	50.00%	75.47%
Random Forest	76.39%	60.00%	31.58%	41.38%	73.98%
Gradient Boosting	76.39%	60.00%	31.58%	41.38%	66.33%
Support Vector Machine (SVM)	80.56%	66.67%	52.63%	58.82%	76.17%
K-Nearest Neighbors	77.78%	61.54%	42.11%	50.00%	74.68%

Best Model: The SVM model performed the best, achieving the highest accuracy (80.56%), precision (66.67%), and recall (52.63%).

Feature Importance

The most important features contributing to churn prediction (using Random Forest and Gradient Boosting) are:

Contract
MonthlyCharges
TechSupport
average_monthly_charge
TotalCharges
tenure
PaymentMethod
OnlineSecurity
SeniorCitizen
PaperlessBilling

Hyperparameter Tuning

Method: GridSearchCV was used for hyperparameter tuning to improve model performance.

Tuned Hyperparameters:

C = 10: Low regularization for fitting complex patterns.
gamma = 1: Medium influence for decision boundaries.
kernel = 'linear': Data separation using a linear hyperplane.
probability = True: Enabled probability estimates for better decision-making.

Business Insights and Recommendations

Business Usefulness

The SVM model can help predict potential churners, allowing the business to take proactive actions like offering incentives or personalized support.
Features like Contract, MonthlyCharges, TechSupport, and TotalCharges highlight areas needing improvement to reduce churn.

Recommendations

Contract Type: Offer discounts to secure longer contracts, as customers with longer contracts are less likely to churn.
Monthly Charges: Consider discounted bundles or individualized pricing for at-risk customers.
TechSupport and OnlineSecurity: Provide free trials or discounts to enhance customer satisfaction.
Loyalty Programs: Reward loyal customers to improve retention.
Payment Options: Market PaperlessBilling and offer flexible payment options to customers with higher TotalCharges.

Model Deployment

Saving the Trained Model

The best-trained SVM model was saved as a .pkl file for future use.

Flask API for Model Deployment

A simple Flask application was built to accept input data through an API and return churn predictions.

Route: /predict
Deployment Platform: The Flask app was deployed on Render.com.

Deployed Link: https://mastersoft.onrender.com/

Usage

Clone the repository.
Install the required packages from requirements.txt.
Run the Flask app locally using python app.py.
Use the deployed API for predictions by sending POST requests to /predict with the necessary input data.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
templates		templates
Curstomer Chrun.csv		Curstomer Chrun.csv
Gyanbardhan.ipynb		Gyanbardhan.ipynb
README.md		README.md
Task on ML for Machine Test.doc		Task on ML for Machine Test.doc
app.py		app.py
requirements.txt		requirements.txt
scaler_mc.pkl		scaler_mc.pkl
scaler_t.pkl		scaler_t.pkl
scaler_tc.pkl		scaler_tc.pkl
svc_model.pkl		svc_model.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MasterSoft - Customer Churn Prediction

End-to-End Machine Learning Project

Overview

Table of Contents

Dataset Description

Column Information

Data Preprocessing

Exploratory Data Analysis

Tenure vs. Churn

Gender vs. Churn, TotalCharges vs. Churn, and More

Feature Engineering and Scaling

Feature Encoding

Feature Scaling

New Feature Creation

Model Training

Train-Test Split

Model Selection

Model Performance Evaluation

Feature Importance

Hyperparameter Tuning

Business Insights and Recommendations

Business Usefulness

Recommendations

Model Deployment

Saving the Trained Model

Flask API for Model Deployment

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MasterSoft - Customer Churn Prediction

End-to-End Machine Learning Project

Overview

Table of Contents

Dataset Description

Column Information

Data Preprocessing

Exploratory Data Analysis

Tenure vs. Churn

Gender vs. Churn, TotalCharges vs. Churn, and More

Feature Engineering and Scaling

Feature Encoding

Feature Scaling

New Feature Creation

Model Training

Train-Test Split

Model Selection

Model Performance Evaluation

Feature Importance

Hyperparameter Tuning

Business Insights and Recommendations

Business Usefulness

Recommendations

Model Deployment

Saving the Trained Model

Flask API for Model Deployment

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages