This project was designed to predict churns for users of a fictive music streaming platform 'Sparkify'. The Data for this project (12 GB) is provided by udacity.
Installations This project requires Python 3.x and the following Python libraries installed:
- Pyspark
- Pandas
- Numpy
- Matplotlib
- Datetime
- Time
Summary: The project contains the following tasks:
- Exploratory Data Analysis
- Modeling: Predicitng churn for users with Gardient Boosted Tree Classifier
Data
- Data can be found on Udacity. The size of the used datasets exceeds github limits.
Acknowledgments Thanks to Udacity for this amazing project in the DS Nanodegree for providing the datasets.