 #  <p style="text-align: center;">Discovering customer attrition patterns</p> 

In this example, we analyze customer attrition data to discover patterns. These will help us dive deeper into those patterns and do root cause analysis of why they are happening. We will use association rules mining algorithm for this purpose.

## Load the Dataset and Transform

We first load the data and view it.


In [7]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import os
import matplotlib.pylab as plt
from apyori import apriori

#Load the prospect dataset
raw_data = pd.read_csv("Data-06-05-warranty-contract-attrition.csv")

raw_data.head()

Unnamed: 0,LIFETIME,TYPE,REASON,AGE_GROUP,EMP_STATUS,MARITAL_STATUS,RENEWALS,PROBLEMS,OFFERS
0,1 - 3 M,CANCEL,BETTER DEALS,< 20,STUDENT,SINGLE,0,0 to 5,0 to 2
1,1 - 3 M,CANCEL,BETTER DEALS,< 20,STUDENT,SINGLE,0,0 to 5,0 to 2
2,1Y - 2Y,CANCEL,NOT HAPPY,30 - 50,EMPLOYED,MARRIED,1,10 plus,0 to 2
3,1Y - 2Y,EXPIRY,BETTER DEALS,30 - 50,EMPLOYED,MARRIED,1,0 to 5,2 to 5
4,1Y - 2Y,CANCEL,NOT HAPPY,30 - 50,UNEMPLOYED,SINGLE,1,10 plus,0 to 2


The CSV contains information about each customer who have left the business. It contains attributes like LIFETIME of the customer, How the customer left, reasons, problems and demographics.

For doing association rules mining, the data needs to be in a specific format. Each line should be a transaction with a list of items for that transaction. We will take the CSV file data convert them into values like "name=value" to create this specific data structure

In [8]:


basket_str = ""
for rowNum, row in raw_data.iterrows():
    
    #Break lines
    if (rowNum != 0):
        basket_str = basket_str + "\n"
    #Add the rowid as the first column
    basket_str = basket_str + str(rowNum) 
    #Add columns
    for colName, col in row.iteritems():
        basket_str = basket_str + ",\"" + colName + "=" + str(col) +"\""

#print(basket_str)
basket_file=open("warranty_basket.csv","w")
basket_file.write(basket_str)
basket_file.close()


## Build Association Rules

We now use the apriori algorithm to build association rules. We then extract the results and populate a data frame for future use. The apriori provides the LHS for multiple combinations of the items. We capture the counts along with confidence and lift in this example

In [9]:
#read back
basket_data=pd.read_csv("warranty_basket.csv",header=None)
filt_data = basket_data.drop(basket_data.columns[[0]], axis=1)
results= list(apriori(filt_data.values))

rulesList= pd.DataFrame(columns=('LHS', 'RHS', 'COUNT', 'CONFIDENCE','LIFT'))
rowCount=0

#Convert results into a Data Frame
for row in results:
    for affinity in row[2]:
        rulesList.loc[rowCount] = [ ', '.join(affinity.items_base) ,\
                                    affinity.items_add, \
                                    len(affinity.items_base) ,\
                                    affinity.confidence,\
                                    affinity.lift]
        rowCount +=1


## Using the Rules

We can take a look at the rules by simply doing a head.

In [10]:
rulesList.head()

Unnamed: 0,LHS,RHS,COUNT,CONFIDENCE,LIFT
0,,(AGE_GROUP=20 - 30),0,0.34,1.0
1,,(AGE_GROUP=30 - 50),0,0.32,1.0
2,,(AGE_GROUP=50PLUS ),0,0.16,1.0
3,,(AGE_GROUP=< 20),0,0.18,1.0
4,,(EMP_STATUS=EMPLOYED),0,0.54,1.0


We can also filter rules where the count of elements is 1 and the confidence is > 70%

In [11]:
rulesList[(rulesList.COUNT <= 1) & (rulesList.CONFIDENCE > 0.7)].head(5)

Unnamed: 0,LHS,RHS,COUNT,CONFIDENCE,LIFT
38,LIFETIME=3M to 1Y,(AGE_GROUP=20 - 30),1,1.0,2.941176
70,AGE_GROUP=20 - 30,(TYPE=CANCEL),1,0.941176,1.568627
79,AGE_GROUP=30 - 50,(LIFETIME=1Y - 2Y),1,1.0,3.125
80,LIFETIME=1Y - 2Y,(AGE_GROUP=30 - 50),1,1.0,3.125
83,MARITAL_STATUS=MARRIED,(AGE_GROUP=30 - 50),1,0.833333,2.604167


Looking at the rules, we can easily see some patterns. Customers who have left the business between 3 months and 1 year are always in the age group 20-30. Similarly, customers in aget group 20-30 always cancelled the service. These are interesting facts that can be analyzed further by the business.