sparkify

Udacity Data Science nano degree Capstone project

Goal

Predict the user churn for some Song streaming company

Business Context

The dataset seems to belong to some digital music streaming Platforms. Whenever a user interacts with the music streaming service the logs get generated like timestamp and the activity page(i.e logOut, like, logIn etc). There are basically types of user 1. free -tier and 2. subscription-based. so based on the activity of the user we need to figure out which user is likely to churn out. So, some kind of discounts can be rolled out to those specific users.

Problem Statement

In this project, we will try to find out the likeliness that a user will continue to use the music streaming App based on past user behavior activity. So that the risk of losing potential users can be minimized.

Prerequisite

This project uses the following software and Python libraries:

Python3
Spark
Pyspark
pandas
Matplotlib
Seaborn

Project Folder Description

Sparkify.ipnyb: This files contains all the code related to this project. .PNG: These files contains the plots for Data Exploration. final_dataset.CSV: This file contains the cleaned data with features and target Column as label.

Findings

Gradient Boosted Tree has the best performance with the F1 score of 85 approx. Since the dataset was highly unbalanced we tried evaluating the model with under sampled dataset. but there wasn't enough improvemrnt in the model performance. gradient boosted Tree Evaluation Metrics

metric	value
Precision	0.8333
Recall	0.8695
F1	0.8510

Blog

I have written a blog Explaining the techniques and approch for solving this Challenge blog

NOTE: the dateset mini_sparkify_event_data.json was large enough, so I was unable to upload it to the repo due to Github size limits .

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
cvModel_dt.model		cvModel_dt.model
cvModel_dts.model		cvModel_dts.model
cvModel_gbt.model		cvModel_gbt.model
cvModel_gbts.model		cvModel_gbts.model
cvModel_lr.model		cvModel_lr.model
cvModel_lrs.model		cvModel_lrs.model
.gitignore		.gitignore
Active_cancelled_user.PNG		Active_cancelled_user.PNG
Authencation_wise_churn_Analysis.PNG		Authencation_wise_churn_Analysis.PNG
Gender_wise_churn.PNG		Gender_wise_churn.PNG
LICENSE		LICENSE
Location_wise_Analysis.PNG		Location_wise_Analysis.PNG
MonthWiseChurnAnalysis.PNG		MonthWiseChurnAnalysis.PNG
README.md		README.md
Sparkify.html		Sparkify.html
Sparkify.ipynb		Sparkify.ipynb
Subscription_wise_churn.PNG		Subscription_wise_churn.PNG
Top_3_pages_used.PNG		Top_3_pages_used.PNG
final_dataset.CSV		final_dataset.CSV
medium-sparkify-event-data.json.gz		medium-sparkify-event-data.json.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sparkify

Goal

Business Context

Problem Statement

Prerequisite

Project Folder Description

Findings

Blog

About

Releases

Packages

Languages

License

ds-praveenkumar/sparkify

Folders and files

Latest commit

History

Repository files navigation

sparkify

Goal

Business Context

Problem Statement

Prerequisite

Project Folder Description

Findings

Blog

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages