Sparkify Customer Churn Prediction

Installation

Following libraries must be installed for running the notebook:

- pyspark
- pandas
- matplotlib
- seaborn
- user_agents (pip install pyyaml ua-parser user-agents)

Project Motivation and Description

This project aims to predict customer churn for the fictional music streaming service Sparkify using PySpark.

In more detail, the goal is to identify users with a high churn risk by finding patterns in the behaviour of already churned users.

For this project a small subset of the user log data has been provided to locally build a pyspark model that can be scaled on a much bigger dataset on a distributed spark cluster.

The key parts of the project are:

Exploratory data analysis to get familiar with the data and identify potential features
Feature engineering (based on the previous analysis) that provides proper input for the machine learning model, leveraging sparks pipeline concept
Selection, training, evaluation and optimization of a classification model
A blog post on medium that documents the project with a technical audience in mind

File Descriptions

Sparkify.ipynb: The jupyter notebook containing the exploratory data analysis, feature engineering and model development
mini_sparkify_event_data.json: The provided subset of the user log data from Sparkify

Authors and Acknowledgements

This project has been implemented as part of the Udacity Data Scientist Nanodegree program. The data has been provided by Udacity accordingly.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.ipynb_checkpoints		.ipynb_checkpoints
checkpoint/d016a832-d6c4-44d5-8b34-acfe0fdc64a7/rdd-1020		checkpoint/d016a832-d6c4-44d5-8b34-acfe0fdc64a7/rdd-1020
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
Sparkify.ipynb		Sparkify.ipynb
mini_sparkify_event_data.json		mini_sparkify_event_data.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

checkpoint/d016a832-d6c4-44d5-8b34-acfe0fdc64a7/rdd-1020

checkpoint/d016a832-d6c4-44d5-8b34-acfe0fdc64a7/rdd-1020

.gitattributes

.gitattributes

LICENSE

LICENSE

README.md

README.md

Sparkify.ipynb

Sparkify.ipynb

mini_sparkify_event_data.json

mini_sparkify_event_data.json

Repository files navigation

Sparkify Customer Churn Prediction

Table of Contents

Installation

Project Motivation and Description

File Descriptions

Authors and Acknowledgements

About

Releases

Packages

Languages

License

bytesbysophie/sparkify-customer-churn-prediction

Folders and files

Latest commit

History

Repository files navigation

Sparkify Customer Churn Prediction

Table of Contents

Installation

Project Motivation and Description

File Descriptions

Authors and Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages