# Introduction
In the case of subscription-based businesses, customer churn happens when a customer unsubscribes. In below session, we will find out a list of churned customer and set promotion strategy.

In [2]:
import pandas as pd
import datetime as dt

In [4]:
data = pd.read_csv("/Users/emma/Downloads/rfm_xmas19.txt", parse_dates=["trans_date"])
group_by_customer = data.groupby("customer_id")
last_transaction = group_by_customer["trans_date"].max()
best_churn = pd.DataFrame(last_transaction)
cutoff_day = pd.to_datetime('16/10/2019')
best_churn['churned']=0
best_churn.loc[best_churn['trans_date']<cutoff_day,'churned']=1
best_churn.head(10)

Unnamed: 0_level_0,trans_date,churned
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1
FM1112,2019-10-14,1
FM1113,2019-11-09,0
FM1114,2019-11-12,0
FM1115,2019-12-05,0
FM1116,2019-05-25,1
FM1117,2019-04-02,1
FM1118,2019-12-14,0
FM1119,2019-12-05,0
FM1120,2019-12-06,0
FM1121,2019-11-03,0


In following session, we'll focus on finding the best customers. This is a two-part problem: Find a ranking mechanism. Determine a threshold to identify the best customers. To use a weighted sum model to classify customers and two criteria are taken into account: Amount spent and number of purchases made, and that the scores should be the same weight, which means (1/2 × Number of purchases)+(1/2 × Amount spent). Also,to use a technique called min-max feature scaling. The goal of this technique is to compare different scales in a meaningful way.

In [5]:
best_churn["nr_of_transactions"] = group_by_customer.size()
best_churn["amount_spent"] = group_by_customer.sum()
best_churn = best_churn.drop('trans_date',1)
best_churn.head()

Unnamed: 0_level_0,churned,nr_of_transactions,amount_spent
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
FM1112,1,15,1012
FM1113,0,20,1490
FM1114,0,19,1432
FM1115,0,22,1659
FM1116,1,13,857


In [7]:
#min-max feature scaling
best_churn['scaled_tran']=(best_churn['nr_of_transactions']-best_churn['nr_of_transactions'].min()) / (best_churn['nr_of_transactions'].max() - best_churn['nr_of_transactions'].min())
best_churn['scaled_amount']=(best_churn['amount_spent']-best_churn['amount_spent'].min()) / (best_churn['amount_spent'].max() - best_churn['amount_spent'].min())
best_churn['score']=100*(0.5*best_churn['scaled_tran']+0.5*best_churn['scaled_amount'])
best_churn = best_churn.sort_values(by='score',ascending=False)
best_churn.head()

Unnamed: 0_level_0,churned,nr_of_transactions,amount_spent,scaled_tran,scaled_amount,score
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
FM4424,0,39,2933,1.0,1.0,100.0
FM4320,1,38,2647,0.971429,0.89727,93.434934
FM3799,1,36,2513,0.914286,0.849138,88.171182
FM5109,0,35,2506,0.885714,0.846624,86.616892
FM3805,1,35,2453,0.885714,0.827586,85.665025


In the following session, we need to decide on a threshold to determine which customers are "the best." Should it be the first 20 customers? The first 40 customers?



In [13]:
coupon = data['tran_amount'].mean()*0.3
nr_of_customers = 1000/coupon
print(coupon)
print(nr_of_customers)

19.4975736
51.28843314123969


According to above result, we decide to round the coupon value to $20 & decide to send the coupon to the top 50 churned customers.

In [14]:
top_50_churned = best_churn[best_churn['churned']==1].head(50)
top_50_churned

Unnamed: 0_level_0,churned,nr_of_transactions,amount_spent,scaled_tran,scaled_amount,score
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
FM4320,1,38,2647,0.971429,0.89727,93.434934
FM3799,1,36,2513,0.914286,0.849138,88.171182
FM3805,1,35,2453,0.885714,0.827586,85.665025
FM5752,1,33,2612,0.828571,0.884698,85.663485
FM4074,1,34,2462,0.857143,0.830819,84.398091
FM1215,1,35,2362,0.885714,0.794899,84.030686
FM2620,1,35,2360,0.885714,0.794181,83.994766
FM1580,1,33,2329,0.828571,0.783046,80.58087
FM2951,1,32,2382,0.8,0.802083,80.104167
FM3163,1,31,2413,0.771429,0.813218,79.232348
