Skip to content

EhsanMohd/Capstone-Project-Sparkify-Udacity-Data-Scientist-Nanodegree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Capstone-Project-Sparkify-Udacity-Data-Scientist-Nanodegree

Installations NumPy Pandas Seaborn Matplotlib PySpark SQL PySpark ML

Project Motivation In this project, we are going to analyze and find the churn rate and reasons for churning by the customers of a music website using PySpark. Churn rate is the rate at which customers stop using products of a business entity. Generally, it's the percentage of subscribers who discontinue their subscriptions within a given timeframe. Problem statement We are given user data such as gender, the number of artists the user listened to, the user's subscription type, the number of songs he has listened to, the number of songs liked/disliked by the user, the number of advertisements the user was presented with, and so on. Using this information we need to predict whether a given user will cancel their subscription.  Metrics  We will be using the F1 score and accuracy for evaluating the performance of the prediction model. Accuracy is the fraction of predictions the prediction model got right. F1 score can be defined as the Harmonic mean of precision and recall. Precision can be defined as the number of true positives divided by the total of true positives and false positives whereas recall can be defined as true positives divided by the total of true positives and false negatives.

The project involved:

Loading and cleaning a small subset (128MB) of a full dataset available (12GB) Conducting Exploratory Data Analysis to understand the data and what features are useful for predicting churn Feature Engineering to create features that will be used in the modelling process Modelling using machine learning algorithms such as Logistic Regression, Random Forest, Gradient Boosted Trees

File Descriptions There is one exploratory notebook and html file of the notebook.

Medium Blog Post: https://medium.com/@ehsan81181/sparkify-project-51c859556c73

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published