9 June to 1 September 2023
Waze is a community driven navigation app that helps millions of users get to where they’re going through real-time road alerts and an up-to-the-moment map. Typically, high user retention rates indicate satisfied users who repeatedly use the Waze app over time. This project develops a churn prediction model to help prevent churn, improve user retention, and grow Waze’s business. The questions answered in this project were Who are the users most likely to churn? Why do users churn? and, When do users churn?
This project was part of my Google Advanced Data Analytics Certificate.
FINDINGS
It was established that the data is insufficient for reliably predicting user churn and that further granular data is needed on app usage and geography. Given the data, it could be determined that users who are professional drivers and who use the app more in a month are the biggest predictors of whether a user will churn or be retained.
PROJECT OVERVIEW
This was a three-stage project, in which I was involved after the first stage. Jupyter Notebooks of code I wrote are found in this repository for stages 2 through 5.
- Data was imported and explored for useful user churn information
- A project proposal was accepted by Waze for an in-depth EDA (stage 2), statistical testing (stage 3), and predictive modelling (stages 4 & 5)
- Churn rate is highest for users who didn’t drive using the app much in the last month
- Device types had similar churn rates
- Key conclusion: Statistical tests need to be run on variable classes (e.g., device used) to determine significant relationships with churn
- Calculations show that iPhone users have a higher average use of the app compared to Android users
- However, this difference is not statistically significant
- Key conclusion: More marketing-relevant data is needed for statistically examining churn by device use and other variables.
- Ran a binomial logistic regression with slightly better than benchmark precision but very low recall
- Contrary to what was expected from EDA findings, the amount of driving was the second-least-important variable for predicting churn
- Features of interest were extracted, and a random forest model and a GBM model on predicting user churn were developed and performances compared
- The GBM outperformed the random forest model, and it had similar levels of precision and accuracy to the logistic regression, with a much better (though still unsatisfactory) recall score
- The models confirmed the insufficiency of the data and the need for driver-level data collection (e.g., drive times and geographic information) and user interaction with the app (e.g., input a road hazard).