Skip to content

ETL Pipeline to identify customer churn for music streaming service using pyspark library

Notifications You must be signed in to change notification settings

DevMahmoud10/Sparkify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sparkify

sparkify

Table of Contents

Motivation

In this project, I followed this process:

  • Data exploration to know more about data features and stats.
  • Define Churn and label data based on churn definition
  • Apply some feature engineering to apply some innovation and customizations to the generated model.
  • Data transformation and splitting
  • Model training, refinement and Evaluation

to analyze Sparkify(virtual music streaming service) data, in the way to achieve the goal of the project which is predicting which group of users are expected to churn - either downgrading from premium to free or cancel their subscriptions altogether in the future to take decision based on it, by sending offers or trying to make them still with our service.

Insights

In this notebook, we implemented a model trying to predict customer churn. We removed rows with no userId, converted timestamp to a human-readable format, converted gender to a binary numeric column. features engineered to be suitable for our model.

  • These selected models:
    • logistic regression
    • Deciesion Tree
    • Gradient Boosted Trees (GBM)
    • Random Forest

used to compare and select GBM as the final model implemented for predicting the final result. We used cross-validation and grid search to fine-tune our model. We achieved about 0.91 F1-score on validation set which is about 35% improvement compared to our naive logistic regression baseline model.

Requirements

You can find all the required libraries used in this project in requirements.txt .

Links

Author

Mahmoud Ahmed

About

ETL Pipeline to identify customer churn for music streaming service using pyspark library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published