GitHub - bufferapp/churnado: Churn forecasting tool

Churn forecasting at Buffer.

Introduction

Churn occurs when customers or subscribers stop doing business with a company or service. Predicting those events helps us to know more about our service and how customers benefit from it.

That said, there is no correct way to do churn prediction. This repository contains our approach to do churn prediction with Machine Learning!

Requirements

To run and develop churnado you'll need a the followint environmental variables under a .env file.

REDSHIFT_ENDPOINT
REDSHIFT_DB_PORT
REDSHIFT_USER
REDSHIFT_PASSWORD
REDSHIFT_DB_NAME

Defining the Problem

The goal of Churnado is to predict whether or not a customer will cancel his or her subscription within a given time period. This makes the it a binary classification problem (churn or not churn).

Once we feel confident in our binary classification model, we may move on to more complex models that try to predict the amount of time until a churn event. In this case, we are no longer dealing with a classification model. It isn't a regression model either.

The Measure of Success

Initially, with the binary classification model, we will use the area under the receiver operating characteristic curve (AUC) as the success metric. We could use model accuracy (number of users classified correctly divided by the total number of users) as the success metric, however imbalanced classes causes this metric measure of success to be insufficient -- we could assume that nobody churns and have an accuracy of over 90%.

The receiver operating characteristic curve (ROC) works like this. It plots sensitivity, the probability of predicting a real positive will be a positive, against 1-specificity, or the probability of predicting a false positive.This curve represents every possible trade-off between sensitivity and specificity that is available for this classifier.

When the area under this curve is maximized, the false positive rate increases much more slowly than the true positive rate, meaning that we are accurately predicting positives (churn) without incorrectly labeling so many negatives (non churns).

Model Evaluation

To evaluate our models, we will maintain a hold-out validation set that is not used to train the model. Notice that we will then have three separate datasets: a training set, a testing set, and a validation set.

The reason that we need the hold-out validation set is that information from the testing set "leaks" into the model each time we use the testing set to score our model's performance during training.

Determining a Baseline

Our predictive models must beat the performance of two models:

A "dumb" model that uses the average churn rate to randomly assign users a value of "churned" or "not churned".
A simple logistic regression model.

Remember that these models must be out-performed on the hold-out validation set.

Defining Inputs and Outputs

We define a customer as churned if they cancel their subscription. Our inputs will consist of snapshot data (billing info) and time series data (detailed usage info). We will use 8 weeks of snapshot and time series data to build our feature sets. We will try to predict whether or not a customer will churn in the next 4 weeks.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
data		data
dbt		dbt
notebooks		notebooks
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Requirements

Defining the Problem

The Measure of Success

Model Evaluation

Determining a Baseline

Defining Inputs and Outputs

About

Releases

Packages

Contributors 2

Languages

License

bufferapp/churnado

Folders and files

Latest commit

History

Repository files navigation

Introduction

Requirements

Defining the Problem

The Measure of Success

Model Evaluation

Determining a Baseline

Defining Inputs and Outputs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages