# Forecasting customer churn 

Churn prediction is the task of identifying users that are likely to stop using a service, product or website. In this notebook, you will learn how to:

#### Train & consume a model to forecast user churn
* Define the boundary at which churn happens.
* Define a churn period.
* Train a model using data from the past.
* Make predictions for probability of churn for each user.

### Let's get started!

In [44]:
import graphlab as gl
import datetime
gl.canvas.set_target('ipynb') # make sure plots appear inline

###  Load previously saved data

In the previous notebook, we had saved the data in a binary format. Let us try and load the data back.

In [45]:
interactions_ts = gl.TimeSeries("data/user_activity_data_rocket_2.ts/")
users = gl.SFrame("data/users_rocket_2.sf/")

## Training a churn predictor

We define churn to be **no activity** within a period of time (called the `churn_period`). Hence,
a user/customer is said to have churned if periods of activity is followed
by no activity for a `churn_period` (for example, 30 days). 

<img src="https://dato.com/learn/userguide/churn_prediction/images/churn-illustration.png", align="left">

In [46]:
churn_period_apr =  datetime.datetime(year = 2016, month = 4, day = 1)


### Making a train-validation split

Next, we perform a **train-validation** split where we randomly split the data such that one split contains data for a `fraction` of the users while the second split contains all data for the rest of the users.

In [47]:
(train, valid) = gl.churn_predictor.random_split(interactions_ts, user_id = 'user_id', fraction = 0.9, seed = 12)

In [48]:
print "Users in the training dataset   : %s" % len(train['user_id'].unique())
print "Users in the validation dataset : %s" % len(valid['user_id'].unique())

Users in the training dataset   : 15512
Users in the validation dataset : 1722


### Training a churn predictor model

In [65]:
model = gl.churn_predictor.create(train, user_id='user_id', user_data = users, churn_period= datetime.timedelta(days=7), time_boundaries = [churn_period_apr])

PROGRESS: Grouping observation_data by user.
PROGRESS: Resampling grouped observation_data by time-period 1 day, 0:00:00.


PROGRESS: Generating features at time-boundaries.
PROGRESS: --------------------------------------------------
PROGRESS: Features for 2016-03-31 17:00:00
PROGRESS: Joining user_data with aggregated features.
PROGRESS: --------------------------------------------------
PROGRESS: Training a classifier model.


PROGRESS: --------------------------------------------------
PROGRESS: Model training complete: Next steps
PROGRESS: --------------------------------------------------
PROGRESS: (1) Evaluate the model at various timestamps in the past:
PROGRESS:       metrics = model.evaluate(data, time_in_past)
PROGRESS: (2) Make a churn forecast for a timestamp in the future:
PROGRESS:       predictions = model.predict(data, time_in_future)


In [66]:
model

Class                          : ChurnPredictor

Schema
------
Number of observations         : 216357
Number of users                : 15512
Number of feature columns      : 26
Features used                  : ['date', 'rev', 'e_purchaseamount', 'e_purchaseprice', 'hasemail', 'e_viptier', 'xrate', 'e_source', 'e_vip_boost', 'e_vip_points', 'e_creditsbeforepurchase', 'e_level', 'e_machine', 'u_playertenure', 'u_fbstatus', 'u_totalcredits', 'credits', 'rn', 'rank', 'txns', 'txns_on_day', 'rank_desc', 'next_event_time', 'previous_event_time', 'last_event_time', 'first_event_time']

Parameters
----------
Lookback periods               : [7, 14, 21, 60, 90]
Number of time boundaries      : 1
Time period                    : 1 day, 0:00:00
Churn period                   : 7 days, 0:00:00

## Consuming predictions made by the model

Here the question to ask is will they churn after a certain period of time. To validate we can see if they user has used us after that evaluation period. Voila! I was confusing it with expiration time (customer churn not usage churn)

In [67]:
predictions = model.predict(valid, user_data=users)
predictions

PROGRESS: Making a churn forecast for the time window:
PROGRESS: --------------------------------------------------
PROGRESS:  Start : 2016-04-30 23:54:56
PROGRESS:  End   : 2016-05-07 23:54:56
PROGRESS: --------------------------------------------------
PROGRESS: Grouping dataset by user.
PROGRESS: Resampling grouped observation_data by time-period 1 day, 0:00:00.


PROGRESS: Generating features for boundary 2016-04-30 23:54:56.
PROGRESS: Joining user_data with aggregated features.


user_id,probability
0040aec2-4192-3439-b555-b 46f6430ec8c ...,0.724074065685
005d7707-7454-3a2f-901e- 238608d333a6 ...,0.590564727783
0062ff02-0134-3e82-ac89-8 5168e08eba9 ...,0.906486093998
0066edd1-9176-36a2-b272-1 ee9a17897f0 ...,0.977488160133
006d83eb-98d2-3b19-b5b0-f 6cb02f2f3c2 ...,0.421454906464
008b43e1-d02b- 32b7-b903-572bfada771b ...,0.977488160133
00be7585-9013-3267-952f- 9ba3474c2d56 ...,0.845390200615
00e7f57b-893e- 3ff6-bfb9-48b64247e6f1 ...,0.856807112694
00f0ed6c-c637-3a88-a7b3-2 a95b41f2884 ...,0.977488160133
01105fe4-66ee-3fea- 8f33-15a5bfad88ba ...,0.676837623119


In [69]:
predictions['probability'].show()

## Evaluating the model

In [70]:
metrics = model.evaluate(valid, user_data=users, time_boundary=churn_period_apr)
metrics

PROGRESS: Making a churn forecast for the time window:
PROGRESS: --------------------------------------------------
PROGRESS:  Start : 2016-04-01 00:00:00
PROGRESS:  End   : 2016-04-08 00:00:00
PROGRESS: --------------------------------------------------
PROGRESS: Grouping dataset by user.
PROGRESS: Resampling grouped observation_data by time-period 1 day, 0:00:00.


PROGRESS: Generating features for boundary 2016-04-01 00:00:00.
PROGRESS: Joining user_data with aggregated features.
PROGRESS: Not enough data to make predictions for 517 user(s). 


{'auc': 0.9148537473964322, 'evaluation_data': Columns:
 	user_id	str
 	probability	float
 	label	int
 
 Rows: 1205
 
 Data:
 +-------------------------------+----------------+-------+
 |            user_id            |  probability   | label |
 +-------------------------------+----------------+-------+
 | 0040aec2-4192-3439-b555-b4... | 0.435962349176 |   0   |
 | 005d7707-7454-3a2f-901e-23... | 0.417634397745 |   1   |
 | 0062ff02-0134-3e82-ac89-85... | 0.906486093998 |   1   |
 | 0066edd1-9176-36a2-b272-1e... | 0.785240352154 |   1   |
 | 006d83eb-98d2-3b19-b5b0-f6... | 0.413374304771 |   1   |
 | 008b43e1-d02b-32b7-b903-57... | 0.977488160133 |   1   |
 | 00be7585-9013-3267-952f-9b... | 0.810217738152 |   1   |
 | 00f0ed6c-c637-3a88-a7b3-2a... | 0.977488160133 |   1   |
 | 01105fe4-66ee-3fea-8f33-15... | 0.553018927574 |   0   |
 | 011451c1-5f54-3cbf-85a1-12... | 0.734498620033 |   1   |
 +-------------------------------+----------------+-------+
 [1205 rows x 3 columns]
 Note: Onl

In [71]:
model.save('data/churn_model_rocket_2.mdl')