Customer Churn Prediction

Installation

The code was tested using Python version 3.9. and Pyspark 3.1.2 . For other necessary libraries please use requirements.txt

pip install -r requirements.txt

Project Motivation

Sparkify is the fictitious music service similar to Spotify where the users can have either a free account with advertisement or paid account without. The log contains the information about user interaction with the service e.g., number of items per session, artist/ songs they are listened to or information about the device etc.

This project aim is to analyze the dataset containing user churns in order to identify the customer who are dissatisfied with the service before they cancel the subscription.

For this project, it was of interest in investigation of :

Which factors have impact on the cancelling the subscription?
How well can we predict customer churn?

The analysis, data exploration as well as building/ tuning machine learning models involves Pyspark.

File Descriptions

The repository contains one notebook with all necessary tools and steps.

Load Data into Spark and Clean Dataset
Exploratory Data Analysis
Feature Engineering
Modeling
Churn Prediction

The analysis was limited to data subset. The data was provided by Udacity and unfortunately couldn’t be uploaded due to the size.

Results

The baseline is to compare different ML algorithms based on their default settings except to include the column classWeightCol to re-balance the weights

Model	f1_score	Area Under ROC	accuracy	time ms
LogisticRegression	0.52	0.43	0.5	309
LogisticRegression + Threshold	0.37	0.43	0.37	-
LinearSVC	0.52	0.39	0.5	3257
RandomForest	0.6	0.59	0.67	79
Gradient-boosted Tree	0.66	0.51	0.7	852

Parameter Tuning was done for all above models except LinearSVC due to the fact that this algorithm achieved the same results as LogisticRegression but the training last 10 times longer.

Model	f1_score	Area Under ROC	accuracy
LogisticRegression Test Set	0.49	0.46	0.47
LogisticRegression Validation Set	0.72	0.53	0.7
RandomForest Test Set	0.56	0.5	0.67
RandomForest Validation Set	0.74	0.52	0.8
Gradient-boosted Tree Test Set	0.71	0.48	0.73
Gradient-boosted Validation Test Set	0.58	0.41	0.57

The main findings of the code can be found at the post available here.

Licensing, Authors, Acknowledgements

Great thanks to Udacity for their contribution during the process.

TO THE FULLEST EXTENT PERMITTED UNDER APPLICABLE LAW, THE CODE COMPONENTS ARE PROVIDED BY THE AUTHORS, COPYRIGHT HOLDERS, CONTRIBUTORS, LICENSORS, “AS IS”.

DISCLAIMED ARE ANY REPRESENTATIONS OR WARRANTIES OF ANY KIND, WHETHER ORAL OR WRITTEN, WHETHER EXPRESS, IMPLIED, OR ARISING BY STATUTE, CUSTOM, COURSE OF DEALING, OR TRADE USAGE, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT.

IN NO EVENT WILL THE COPYRIGHT OWNER, CONTRIBUTORS, LICENSORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION). HOWEVER, CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THE CODE COMPONENTS, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
Sparkify.ipynb		Sparkify.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Prediction

Table of Contents

Installation

Project Motivation

File Descriptions

Results

Licensing, Authors, Acknowledgements

Copyright (C) 2021 August

TO THE FULLEST EXTENT PERMITTED UNDER APPLICABLE LAW, THE CODE COMPONENTS ARE PROVIDED BY THE AUTHORS, COPYRIGHT HOLDERS, CONTRIBUTORS, LICENSORS, “AS IS”.

About

Releases

Packages

Languages

LN5user/sparkify

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction

Table of Contents

Installation

Project Motivation

File Descriptions

Results

Licensing, Authors, Acknowledgements

Copyright (C) 2021 August

TO THE FULLEST EXTENT PERMITTED UNDER APPLICABLE LAW, THE CODE COMPONENTS ARE PROVIDED BY THE AUTHORS, COPYRIGHT HOLDERS, CONTRIBUTORS, LICENSORS, “AS IS”.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages