# KKBox Customer Lifetime Value Analysis

---

# Part I: <font color=green>*Extraction, Transformation, and Loading*</font>

---

In [None]:
# General Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from scipy import stats
import datetime 

## Import and Prep Data

In [None]:
# Import Transaction Files
transactions0 = pd.read_csv('D:/J-5 Local/KKBox Sources/transaction0.csv')
transactions1 = pd.read_csv('D:/J-5 Local/KKBox Sources/transaction1.csv')
transactions2 = pd.read_csv('D:/J-5 Local/KKBox Sources/transaction2.csv')
transactions3 = pd.read_csv('D:/J-5 Local/KKBox Sources/transaction3.csv')
transactions4 = pd.read_csv('D:/J-5 Local/KKBox Sources/transaction4.csv')

# Concat all files into one
transactions = pd.concat([transactions0,transactions1,transactions2,transactions3,transactions4])

# Delete temp uploads
del transactions0
del transactions1
del transactions2
del transactions3
del transactions4

# Import Churn Files
DRV_Feb2016 = pd.read_csv('D:/J-5 Local/DRV_Feb2016_With_Cluster')

In [None]:
# Convert Date columns into DateTime Object
transactions['transaction_date'] = pd.to_datetime(transactions['transaction_date'])
transactions['membership_expire_date'] = pd.to_datetime(transactions['membership_expire_date'])
DRV_Feb2016['membership_expire_date'] = pd.to_datetime(DRV_Feb2016['membership_expire_date'])
DRV_Feb2016['registration_init_time'] = pd.to_datetime(DRV_Feb2016['registration_init_time'])

As this the 3rd project with this dataset, we will simply be exploring data with respect to the use case of Survival Analysis and Customer Lifetime Value. Please refer to the previous projects if you wish to know more about the dataset as a whole.

The goal of this section is to prepare and format the dataset so that it is prepared for our Survival Analysis

### <font color=purple>Filter DF With Necessary Columns</font>

In [None]:
# Calculate Average Price Paid (from KKBOX)
DRV_Feb2016['avg_paid_per_day'] = (9.50/30)

In [None]:
# Create Master Dataset
DRV_Feb2016 = DRV_Feb2016[['msno','payment_plan_days','registration_init_time','membership_expire_date','avg_paid_per_day','registered_via','city_agg','Cluster','is_churn']]

### <font color=purple>Inspect Payment Plan Days</font>

In contractural settings, how often what one pays is critical in determining lifetime value. Let's look at our current use-case to see what payment plan periods are being utilized by our users. As this project is taking place at the same time of our Churn and initial Customer Segmentation projects, we will be observing all data through January 2016

In [None]:
# Payment plan days distribution
transactions[transactions['transaction_date'] < datetime.datetime(2016,2,28)]['payment_plan_days'].value_counts().head(10)

Before 2016 KKBox made a switch from 31 day payments to 30 day payments. For the simplicity, we will be combining these values.

In [None]:
# Make a transaction DF just for users who have transaction dates beyond 2016
transactions = transactions[transactions['transaction_date'] < datetime.datetime(2016,2,28)]

# Convert 31 to 30
transactions['payment_plan_days'] = transactions['payment_plan_days'].apply(lambda x: 30 if x == 31 else x)

## Feature Engineering

### <font color=purple>*Do all users have a single unique Payment Plan Period?*</font>

Next we want to determine whether or not a user has had a single recurring payment plan period through his lifetime. Aside from comparing unique payment plan periods to each other, it would also be interesting to determine whether users who have had multiple payments have a higher LTV than those who have not.

In [None]:
# Members vs # of Unique payment_plan_days
temp = transactions.groupby('msno')['payment_plan_days'].nunique().reset_index()
temp['payment_plan_days'].value_counts()

Here we see that some users do not have an exclusive payment plan and have switched from plan to plan over their lifetime. In order to have an accurate analysis we will segment across users with single plans vs users with various plans. Let's add these values as a new feature.

In [None]:
# Add unique_payment_plan_days to Master DF
temp.columns = ['msno', 'unique_payment_plans']
DRV_Feb2016 = pd.merge(DRV_Feb2016, temp, on='msno', how='inner')

### <font color=purple>*Calculate Tenure*</font>

Now we will calculate membership tenure. As our dataset is from January 1st 2015 to February 28th 2016, we will calculate tenure as ***February 28th 2016 - Earliest Transaction Date***.

In [None]:
# Calculate tenure
temp = transactions.groupby('msno')['transaction_date'].min().reset_index()
DRV_Feb2016 = pd.merge(DRV_Feb2016, temp, on='msno')
DRV_Feb2016['tenure'] = (DRV_Feb2016['membership_expire_date'] - DRV_Feb2016['registration_init_time']).dt.days

In [None]:
DRV_Feb2016 = DRV_Feb2016.drop(['membership_expire_date','transaction_date'], axis=1)

## Export Data

In [None]:
DRV_Feb2016.to_csv('D:/J-5 Local/CLV Data/CLV_Feb2016.csv')