 #  <p style="text-align: center;">Predicting Customer Lifetime Value</p> 

**Objective** use past purchase history of your customers to build a model that can predict the Customer Lifetime Value (CLV) for new customers based on historical spending data

Process
load and clean data
run correlation analysis 
build model 
run models 

In [3]:
#load packages
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import os
import matplotlib.pylab as plt
from sklearn.model_selection  import train_test_split
from sklearn.linear_model import LinearRegression
import sklearn.metrics

#import data
raw_data = pd.read_csv("history.csv")

raw_data.dtypes


CUST_ID    int64
MONTH_1    int64
MONTH_2    int64
MONTH_3    int64
MONTH_4    int64
MONTH_5    int64
MONTH_6    int64
CLV        int64
dtype: object

The dataset consists of the customer ID, the amount the customer spent on your website for the first months of his relationship with the business and his ultimate life time value given (est. ~3 years worth)

In [19]:
raw_data

Unnamed: 0,CUST_ID,MONTH_1,MONTH_2,MONTH_3,MONTH_4,MONTH_5,MONTH_6,CLV
0,1001,150,75,200,100,175,75,13125
1,1002,25,50,150,200,175,200,9375
2,1003,75,150,0,25,75,25,5156
3,1004,200,200,25,100,75,150,11756
4,1005,200,200,125,75,175,200,15525
...,...,...,...,...,...,...,...,...
95,1096,150,200,25,125,50,75,9763
96,1097,100,100,125,150,100,125,9625
97,1098,100,75,200,200,100,50,9750
98,1099,25,150,150,125,100,175,8113


## Do Correlation Analysis

In [9]:
#remove customer id

clean_data = raw_data.drop('CUST_ID', axis=1)

#run correlation
clean_data.corr()['CLV']

MONTH_1    0.734122
MONTH_2    0.250397
MONTH_3    0.371742
MONTH_4    0.297408
MONTH_5    0.376775
MONTH_6    0.327064
CLV        1.000000
Name: CLV, dtype: float64

The correlation results between months & CLV look strong enough to make a model and predict clv

In [32]:
clean_data

Unnamed: 0,MONTH_1,MONTH_2,MONTH_3,MONTH_4,MONTH_5,MONTH_6,CLV
0,150,75,200,100,175,75,13125
1,25,50,150,200,175,200,9375
2,75,150,0,25,75,25,5156
3,200,200,25,100,75,150,11756
4,200,200,125,75,175,200,15525
...,...,...,...,...,...,...,...
95,150,200,25,125,50,75,9763
96,100,100,125,150,100,125,9625
97,100,75,200,200,100,50,9750
98,25,150,150,125,100,175,8113


## Do Training and Testing Split

Let us split the data into training and testing datasets in the ratio 90:10.

In [25]:
#isolate months from the CLV column
predictors = clean_data.drop('CLV', axis=1)
targets=clean_data.CLV

#train & test data in ratio 90:10, print to confirm
pred_train, pred_test, tar_train, tar_test = train_test_split(predictors, targets, test_size=0.1)
print("Predictor - Training :", pred_train.shape, "Predictor - Testing : ", pred_test.shape)


Predictor - Training : (90, 6) Predictor - Testing :  (10, 6)


## Build and Test Model
We build a Linear Regression equation for predicting CLV and then check its accuracy by predicting against the test dataset

In [29]:
#Build the model on training data
model = LinearRegression()
model.fit(pred_train, tar_train)
print("Coefficients: \n", model.coef_)
print("Intercept:", model.intercept_)

#Test on testing data
predictions = model.predict(pred_test)
predictions

sklearn.metrics.r2_score(tar_test, predictions)

Coefficients: 
 [34.67111066 11.0386445  15.51927571 11.74383426  7.79659343  5.26242818]
Intercept: -81.0331764100556


0.9844958145826597

It shows a 98% accuracy. This is an excellent model for predicting CLV

## Predicting for a new Customer
Let us say we have a new customer who in his first 3 months have spend $100, $0, $50 on your website. Let us use the model to predict his CLV.

In [31]:
new_data = np.array([100,0,50,0,0,0]).reshape(1, -1)
new_pred=model.predict(new_data) 
print("The CLV for the new customer is : $",new_pred[0])

The CLV for the new customer is : $ 4162.0416755046535
