**什么是RFM模型？**

RFM模型是一个被广泛使用的客户关系分析模型，主要以用户行为来区分客户，RFM分别是：R(Recency)-最近一次消费距现在的时长，F(Frequency)消费频率，消费次数，M(Monetary)消费金额。

**怎么使用RFM模型来做预测？**

通过以上3个指标可以在3个维度上区分出8组不同类别的用户。分别如下：![RFM-customers](./images/RFM-customers.jpg)

**案例背景**

一份客户的交易数据，包括交易ID，客户Id，交易时间，交易金额，交易类型，商家希望通过这份数据，对客户进行分类，为促销活动做出决策。

In [1]:
import pandas as pd
import numpy as np

In [101]:
trad_flow = pd.read_csv('./dataset/RFM_TRAD_FLOW.csv',encoding='gbk')
trad_flow.head()

Unnamed: 0,transID,cumid,time,amount,type_label,type
0,9407,10001,14JUN09:17:58:34,199.0,正常,Normal
1,9625,10001,16JUN09:15:09:13,369.0,正常,Normal
2,11837,10001,01JUL09:14:50:36,369.0,正常,Normal
3,26629,10001,14DEC09:18:05:32,359.0,正常,Normal
4,30850,10001,12APR10:13:02:20,399.0,正常,Normal


In [102]:
trad_flow.shape  # 共26662条交易记录

(26662, 6)

In [103]:
len(trad_flow.cumid.unique())  # 共1200 个客户id

1200

In [104]:
from datetime import datetime
trad_flow['new_time'] = trad_flow.time.apply(lambda x:datetime.strptime(x,'%d%b%y:%H:%M:%S'))
trad_flow.head()

Unnamed: 0,transID,cumid,time,amount,type_label,type,new_time
0,9407,10001,14JUN09:17:58:34,199.0,正常,Normal,2009-06-14 17:58:34
1,9625,10001,16JUN09:15:09:13,369.0,正常,Normal,2009-06-16 15:09:13
2,11837,10001,01JUL09:14:50:36,369.0,正常,Normal,2009-07-01 14:50:36
3,26629,10001,14DEC09:18:05:32,359.0,正常,Normal,2009-12-14 18:05:32
4,30850,10001,12APR10:13:02:20,399.0,正常,Normal,2010-04-12 13:02:20


交易记录的最早时间是2009-05-14，最晚时间是2010-09-25，时间跨度是500天

In [69]:
print(trad_flow.new_time.min())
print(trad_flow.new_time.max())
print(trad_flow.new_time.max()-trad_flow.new_time.min())

2009-05-14 17:20:38
2010-09-25 21:17:30
499 days 03:56:52


构建R特征，将每个客户的最近一次交易时间列出，转换成时间戳

In [106]:
import time 
r_trans = trad_flow[['cumid','new_time']].groupby(['cumid']).max()
r_trans.new_time = r_trans.new_time.apply(lambda x:time.mktime(x.timetuple()))
r_trans.head()

Unnamed: 0_level_0,new_time
cumid,Unnamed: 1_level_1
10001,1284699000.0
10002,1278129000.0
10003,1282983000.0
10004,1283057000.0
10005,1282127000.0


通过计算客户在过去时间内购买打折产品的次数占购买次数的比例，来表征客户对打折产品的偏好

In [107]:
# 计算出每个客户，每种交易类型的次数
f_trans = trad_flow[['cumid','type','transID']].groupby(['cumid','type']).count()
f_trans = pd.pivot_table(f_trans,index='cumid',columns='type',values='transID')
f_trans.fillna(0,inplace=True)
f_trans['interest'] = f_trans.Special_offer/(f_trans.Normal+f_trans.Special_offer)
f_trans.head()

type,Normal,Presented,Special_offer,returned_goods,interest
cumid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
10001,15.0,8.0,2.0,2.0,0.117647
10002,12.0,5.0,0.0,1.0,0.0
10003,15.0,8.0,1.0,1.0,0.0625
10004,15.0,12.0,2.0,1.0,0.117647
10005,8.0,5.0,0.0,1.0,0.0


通过计算客户在过去一段时间内交易金额，来表征客户的价值信息。

In [108]:
m_trans = trad_flow[['cumid','amount','type']].groupby(['cumid','type']).sum()
m_trans = pd.pivot_table(m_trans,index='cumid',columns='type',values='amount')
m_trans.fillna(0,inplace=True)
# 计算出客户的实际消费金额Normal+Special_offer+returned_goods
m_trans['real_value'] = m_trans.Normal+m_trans.Special_offer+m_trans.returned_goods
m_trans.head()

type,Normal,Presented,Special_offer,returned_goods,real_value
cumid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
10001,3608.0,0.0,420.0,-694.0,3334.0
10002,1894.0,0.0,0.0,-242.0,1652.0
10003,3503.0,0.0,156.0,-224.0,3435.0
10004,2979.0,0.0,373.0,-40.0,3312.0
10005,2368.0,0.0,0.0,-249.0,2119.0


构建模型，建立客户分类，筛选目标客户

In [112]:
# 对R特征列使用等频分箱
from sklearn.preprocessing import Binarizer
r_threshold = pd.qcut(r_trans.new_time,2,retbins=True)[1][1]
binarizer = Binarizer(threshold=r_threshold)
r_trans_q = binarizer.transform(r_trans.new_time.values.reshape(-1,1))
r_trans_q = pd.DataFrame(r_trans_q,index=r_trans.index,columns=['time'])
r_trans_q.head()

Unnamed: 0_level_0,time
cumid,Unnamed: 1_level_1
10001,1.0
10002,0.0
10003,0.0
10004,0.0
10005,0.0


In [113]:
# 对F特征列使用等频分箱
f_threshold = pd.qcut(f_trans.interest,2,retbins=True)[1][1]
binarizer = Binarizer(threshold=f_threshold)
f_trans_q = binarizer.transform(f_trans.interest.values.reshape(-1,1))
f_trans_q = pd.DataFrame(f_trans_q,index=f_trans.index,columns=['interest'])
f_trans_q.head()

Unnamed: 0_level_0,interest
cumid,Unnamed: 1_level_1
10001,1.0
10002,0.0
10003,0.0
10004,1.0
10005,0.0


In [114]:
# 对M特征列使用等频分箱
m_threshold = pd.qcut(m_trans.real_value,2,retbins=True)[1][1]
binarizer = Binarizer(threshold=m_threshold)
m_trans_q = binarizer.transform(m_trans.real_value.values.reshape(-1,1))
m_trans_q = pd.DataFrame(m_trans_q,index=m_trans.index,columns=['value'])
m_trans_q.head()

Unnamed: 0_level_0,value
cumid,Unnamed: 1_level_1
10001,1.0
10002,0.0
10003,1.0
10004,1.0
10005,0.0


In [115]:
# 构建RFM 
trans_rfm = pd.concat([r_trans_q,f_trans_q,m_trans_q],axis=1)
trans_rfm.head()

Unnamed: 0_level_0,time,interest,value
cumid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
10001,1.0,1.0,1.0
10002,0.0,0.0,0.0
10003,0.0,0.0,1.0
10004,0.0,1.0,1.0
10005,0.0,0.0,0.0


In [116]:
# 定义标签
label = {
    (0,0,0):'无兴趣-低价值-沉默',
    (1,0,0):'有兴趣-低价值-沉默',
    (1,0,1):'有兴趣-低价值-活跃',
    (0,0,1):'无兴趣-低价值-活跃',
    (0,1,0):'无兴趣-高价值-沉默',
    (1,1,0):'有兴趣-高价值-沉默',
    (1,1,1):'有兴趣-高价值-活跃',
    (0,1,1):'无兴趣-高价值-活跃'
}
trans_rfm['label'] = trans_rfm[['interest','value','time']].apply(lambda x:label[(x[0],x[1],x[2])],axis=1)
trans_rfm.head()

Unnamed: 0_level_0,time,interest,value,label
cumid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
10001,1.0,1.0,1.0,有兴趣-高价值-活跃
10002,0.0,0.0,0.0,无兴趣-低价值-沉默
10003,0.0,0.0,1.0,无兴趣-高价值-沉默
10004,0.0,1.0,1.0,有兴趣-高价值-沉默
10005,0.0,0.0,0.0,无兴趣-低价值-沉默
