Download the dataset and run RFM analysis. In each subsegment, divide users into 4 classes. Count the number of days since the last purchase from the maximum purchase date in the dataset.

**You need to answer the following questions:**

What is the maximum number of purchases made by one user?

What is the upper limit for the amount of purchases for users with class 4 in subsegment M? (In other words: users with a purchase amount from 0 to X fall into class 4 in subsegment M)?

What is the lower limit for the number of purchases for users with class 1 in subsegment F?

What is the upper limit for the number of purchases for users with class 2 in subsegment R?

How many users are in segment 111?

How many users are in segment 311?

Which RFM segment has the largest number of users?

Which RFM segment has the smallest number of users? How many users fall into the smallest segment?

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('RFM_ht_data.csv',  parse_dates=['InvoiceDate'],  low_memory=False)
#without the low_memory=False parameter, there was a warning about mixed data type

In [3]:
df.head()

Unnamed: 0,InvoiceNo,CustomerCode,InvoiceDate,Amount
0,C0011810010001,19067290,2020-09-01,1716.0
1,C0011810010017,13233933,2020-09-01,1489.74
2,C0011810010020,99057968,2020-09-01,151.47
3,C0011810010021,80007276,2020-09-01,146.72
4,C0011810010024,13164076,2020-09-01,104.0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 332730 entries, 0 to 332729
Data columns (total 4 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   InvoiceNo     332730 non-null  object        
 1   CustomerCode  332730 non-null  object        
 2   InvoiceDate   332730 non-null  datetime64[ns]
 3   Amount        332730 non-null  float64       
dtypes: datetime64[ns](1), float64(1), object(2)
memory usage: 10.2+ MB


In [5]:
#Calculating the maximum invoice date in the dataset
last_date = df.InvoiceDate.max()

In [6]:
last_date

Timestamp('2020-09-30 00:00:00')

In [7]:
#Creating the dataframe with the RFM metrics for each user
RFM_data = df.groupby('CustomerCode')\
.agg(Recency = ('InvoiceDate', lambda x: (last_date - x.max()).days)\
     , Frequency = ('InvoiceNo', 'nunique'), Monetary=('Amount','sum')).reset_index()

In [8]:
RFM_data

Unnamed: 0,CustomerCode,Recency,Frequency,Monetary
0,02213019,19,1,1609.20
1,02213042,22,3,9685.48
2,02213071,29,1,415.00
3,02213088,23,1,305.00
4,02213092,25,1,1412.88
...,...,...,...,...
123728,99099927,10,1,961.10
123729,99099936,0,1,1521.78
123730,99099959,8,2,1444.56
123731,99099963,19,1,3018.91


**What is the maximum number of purchases made by one user?**

In [9]:
print(f' The maximum number of purchases is {RFM_data.Frequency.max()}')

 The maximum number of purchases is 204


In [10]:
#Let's comtinue to calculate RFM-segments
#Let's calculate quantiles
rfm_quantiles = RFM_data[['Recency', 'Monetary', 'Frequency']].quantile(q=(0.25, 0.5, 0.75))

In [11]:
rfm_quantiles

Unnamed: 0,Recency,Monetary,Frequency
0.25,2.0,765.0,1.0
0.5,8.0,1834.48,2.0
0.75,16.0,4008.84,3.0


In [12]:
#Let's write a function that will assign a rank depending on the quantile
#According to the task, better indicators correspond to a lower rank
def RClass(value,parameter_name, quantiles_table):
    if value <= quantiles_table[parameter_name][0.25]:
        return 1
    elif value <= quantiles_table[parameter_name][0.50]:
        return 2
    elif value <= quantiles_table[parameter_name][0.75]: 
        return 3
    else:
        return 4


def FMClass(value, parameter_name,quantiles_table):
    if value <= quantiles_table[parameter_name][0.25]:
        return 4
    elif value <= quantiles_table[parameter_name][0.50]:
        return 3
    elif value <= quantiles_table[parameter_name][0.75]: 
        return 2
    else:
        return 1

In [13]:
#Adding columns to the dataset using the function
RFM_data['R_Quartile'] = RFM_data['Recency'].apply(lambda x: RClass(x, 'Recency', rfm_quantiles))

RFM_data['F_Quartile'] = RFM_data['Frequency'].apply(lambda x: FMClass (x, 'Frequency', rfm_quantiles))

RFM_data['M_Quartile'] = RFM_data['Monetary'].apply(lambda x: FMClass (x, 'Monetary', rfm_quantiles))

RFM_data['RFMClass'] = RFM_data[['R_Quartile', 'F_Quartile', 'M_Quartile']].astype(str).apply(''.join, axis=1)

In [14]:
RFM_data.head()

Unnamed: 0,CustomerCode,Recency,Frequency,Monetary,R_Quartile,F_Quartile,M_Quartile,RFMClass
0,2213019,19,1,1609.2,4,4,3,443
1,2213042,22,3,9685.48,4,2,1,421
2,2213071,29,1,415.0,4,4,4,444
3,2213088,23,1,305.0,4,4,4,444
4,2213092,25,1,1412.88,4,4,3,443


**hat is the upper bound for the number of purchases of users with class 4 in subsegment M? (In other words: users with the number of purchases  from 0 to X fall into class 4 in subsegment M)?**

In [17]:
print(f'Answer: {rfm_quantiles.Monetary[0.25]}')

Answer: 765.0


**What is the lower bound of the number of purchases from class 1 users in subsegment F?**

In [16]:
print(f'Answer: {rfm_quantiles.Frequency[0.75]+1}') #as the 75-th percentile is in the previous class

Answer: 4.0


**What is the upper bound for the number of purchases of users with class 2 in subsegment R?**

In [18]:
print(f'Answer: {rfm_quantiles.Recency[0.50]}')

Answer: 8.0


**How many users are there in segment 111?**

In [19]:
one_one_one = RFM_data.query("RFMClass == '111'").RFMClass.size

In [20]:
print(f'Answer: {one_one_one}')

Answer: 9705


**How many users are there in segment 311?**

In [21]:
three_one_one = RFM_data.query("RFMClass == '311'").RFMClass.size

In [22]:
print(f'Answer: {three_one_one}')

Answer: 1609


**Which RFM-segment includes the largest number of users?**

In [23]:
RFM_number = RFM_data.RFMClass.value_counts().reset_index().rename(columns={'index':'RFMClass','RFMClass':'Number'})

In [24]:
RFM_descending = RFM_number.sort_values('Number', ascending = False)

In [25]:
print(f'Answer: {RFM_descending.RFMClass[0]}') 

Answer: 444


**Which RFM-segment includes the lowest number of users?**

In [27]:
RFM_ascending = RFM_number.sort_values('Number').reset_index()

In [28]:
print(f'Answer: {RFM_ascending.RFMClass[0]}') 

Answer: 414


**How many users fall into the smallest segment?**

In [29]:
print(f'Answer: {RFM_ascending.Number[0]}') 

Answer: 2
