churn-prediction-sparkify

Machine learning project modeling user churn of a hypothetical music streaming service

Data Science Nanodegree - Capstone Project: PySpark Customer Churn Prediction for the Sparkify Music Streaming Service

Head over to Medium to read my blogpost at

https://davidweisspost.medium.com/churn-prediction-with-pyspark-52ddece92ba4

The Capstone project for Udacity's Data Scientist Nanodegree. This project involves predicting Customer Churn for a hypothetical music streaming app Sparkify, using Spark's MLlib to engineer features and build a classification model. The dataset used here is a medium-sized (248 MB, with 544,000 rows) version of the whole dataset (which is 12 GB).
This project is worked on IBM Cloud's Watson Studio, uploading the data cluster, with a Python 3.7/Spark 3.0 enabled Jupyter Notebook.

Using pyspark, the project broadly involves the following:

Loading and cleaning the data
Exploratory Data Analysis
Feature Engineering - appropriate features are selected based on the EDA
Modelling - two different classification models are tested and evaluated
Model Tuning - Hyperparameter tuning using grid search
Concluding Remarks

Installation

Python 3.6+
pyspark.*
Jupyter - available through this link, or IBM Watson Studio (Lite)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
churn-prediction.ipynb		churn-prediction.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

churn-prediction-sparkify

Installation

About

Releases

Packages

Languages

davidweisscode/churn-prediction-sparkify

Folders and files

Latest commit

History

Repository files navigation

churn-prediction-sparkify

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages