SyriaTel Customer Churn

README Outline

Within this README.md file you will find:

Introduction
Overview of Repository Contents
Project Objectives
Overview of the Process
Findings & Recommendations
Conclusion / Summary

Introduction

Build a classifier to identify whether a customer will "soon" churn and stop doing business with SyriaTel. Ultimate goal is to label at risk customers to enable the Company to "save" these customers via promotions or other outreach measures.

Repository Contents

README.md
telecom_churn_classifier.ipynb - clean jupyter notebook containing all code models
telecom_churn.csv - dataset used
backup_files - directory containing rough, working, and in-process code
SyriaTel Customer Data Analysis.pdf - non-technical presentation
directory.pdf - pdf of github directory
jup_notebook.pdf - pdf of jupyter notebook

Project Objectives

Build a classifier to predict whether a customer will soon stop doing business with SyriaTel. Follow CRISP-DM Machine Learning process to explore dataset, prepare data for modeling, modeling, and post-model evaluation. We will also be focused on identifying which performance metrics will likely be best to evaluate our performance and ability to properly identify churning customers. Provide as an output a list of customers who are most likely to churn according to our best model to company.

Overview of the Process

Following CRISP-DM, the process outlined within telecom_churn_classifier.ipynb follows 6 key steps, including:

Business Understanding: Outlines facts and requirements of the project. Specifically, a classifier will be built and trained on various SyriaTel customer data to predict whether a customer will be labeled as a 1 (churn) or a 0 (non-churn customer). Understanding which customers are likely to churn, in addition to various patterns within the data should enable SyriaTel to perform more targeted customer outreach and hopefully relate various customer features with the strength of the customer relationship going forward.
Data Understanding: focused on unpacking all data that will be used in this classification problem (again primarily SyriaTel customer data). This section will focus on the distribution of our data, any imbalances within our target predictor, and the identification of which features are likely to impact or be associated with churn.
Data Preparation: Further preprocessing of our data to prepare for modeling. This includes splitting into training and test sets, encoding necessary columns, and handling any other data processing prior to modeling. This is also the section in which synthetic training data is created via SMOTE to help with class imbalance.
Modeling: this section trains and evaluates the performance of a number of machine learning models, primarily focused on decision trees, random forests, and XG Boosting algorithms
Evaluation: Final / optimal model is selected and final performance metrics of final model discussed and evaluated. Focused on F1 Score, Recall, and Accuracy as performance metrics.
Deployment: Generate predictions on all data to provide SyriaTel with a list of customers that are at highest risk of churning based on our final / best classifier.

Findings & Recommendations

The best performing model we saw was our tuned XG Boosting algorithm, with an AUC of 0.865, f1 score of 80%, recall of 80% and overall acuracy of 94%. Looking at our final model feature importance, the most important features appear to be whether or not a customer is on an international plan, whether or not a customer is on a voice mail plan, and the number of customer service calls to date. Additionally, a list of 431 customers deemed "at-risk" of churning by our model. While these customers have already churned, model can be used going forward to generate a similar risk of existing customers with risk of churn. Recommended that Company begins targeted outreach / customer-saving metrics on this list of customers first. Additionally, customers identified by our model as low-risk of churning may be able to be targeted via price increases / other revenue raising exercises.

Conclusions & Summary

Through an iterative modeling and data preparation process, we were able to tune a model with 80% recall, and overall accuracy of 94%. Throughout this process, recall and f1 score were favored over other metrics as the Company is likely not as concerned with false positives as customer-saving metrics targeted at this mis-labeled customers likely do not cost the Company much in comparison to potentially having that customer churn.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
Classification_Modeling.ipynb		Classification_Modeling.ipynb
Customer-Churn_Presentation.pdf		Customer-Churn_Presentation.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SyriaTel Customer Churn

README Outline

Introduction

Repository Contents

Project Objectives

Overview of the Process

Findings & Recommendations

Conclusions & Summary

About

Releases

Packages

Languages

akaigraham/Predicting-Customer-Churn

Folders and files

Latest commit

History

Repository files navigation

SyriaTel Customer Churn

README Outline

Introduction

Repository Contents

Project Objectives

Overview of the Process

Findings & Recommendations

Conclusions & Summary

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages