## 1. Recency, Frequency, Monetary Value analysis
In this chapter, we will dive into a very popular technique called RFM segmentation, which stands for Recency, Frequency and Monetary value segmentation.

#### What is RFM segmentation?
To do this, we are going to calculate three customer behavior metrics - Recency - which measures how recent was each customer's last purchase, Frequency - which measures how many purchases the customer has done in the last 12 months, And MonetaryValue - measures how much has the customer spent in the last 12 months. 

We will use these values to assign customers to RFM segments.

#### Grouping RFM values
Once we have calculated these numbers, the next step is to group them into some sort of categorization such as `high, medium and low`. 

In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv(r"../input/chap-2/datamart_rfm_scores_named_segment.csv")
df.head()

Unnamed: 0,CustomerID,Recency,Frequency,MonetaryValue,R,F,M,RFM_Segment,RFM_Score,RFM_Level
0,12747,3,25,948.7,4,4,4,444,12.0,Top
1,12748,1,888,7046.16,4,4,4,444,12.0,Top
2,12749,4,37,813.45,4,4,4,444,12.0,Top
3,12820,4,17,268.02,4,3,3,433,10.0,Top
4,12822,71,9,146.15,2,2,3,223,7.0,Middle


There are multiple ways to do that. 
- We can break customers into groups of equal size based on `percentile` values of each metric.
- We can assign either high or low value to each metric based on a `80/20% - Pareto split` 
- Or we can use existing knowledge from previous `business insights` about certain threshold values for each metric.

In the next section you will learn how to assign a percentile to a metric, and then create a label to be used for segmentation.

#### Short review of percentiles
The process of calculating percentiles is fairly simple: 
- First, you sort the customers based on that metric.
- Then, you break the customers into a number of groups that you think is relevant. The groups are equal in size.
- Finally, you assign a label to each group. Luckily, in pandas we already have a function built in for calculating percentiles called `qcut()`.

#### Calculate percentiles with Python
To understand the concepts behind percentile calculations - we have created a simple dataset with 8 CustomerIDs and random Spend values representing their total spend with the company.

In [2]:
data = pd.DataFrame({'CustomerID': range(8), 
                     'Spend': np.array([137, 335, 172, 355, 303, 233, 244, 229])})
data

Unnamed: 0,CustomerID,Spend
0,0,137
1,1,335
2,2,172
3,3,355
4,4,303
5,5,233
6,6,244
7,7,229


We will now assign a quartile value to each of these customers. 
- First, we will use the `qcut()` function on the `Spend variable`, and define that we want 4 groups of equal sizes - called quartiles. 
- We will also pass a `range()` function to the labels argument so our groups have integer names, with highest value quartile labeled as 4, and lowest as 1. 
- Next, we add a column to our dataframe. 
- And then we print it after sorting by the quartile value.

In [3]:
spend_quartiles = pd.qcut(data['Spend'], 
                          q = 4,
                          labels = range(1, 5))
data['Spend_Quartiles'] = spend_quartiles
data.sort_values('Spend')

Unnamed: 0,CustomerID,Spend,Spend_Quartiles
0,0,137,1
2,2,172,1
7,7,229,2
5,5,233,2
6,6,244,3
4,4,303,3
1,1,335,4
3,3,355,4


#### Assigning labels
- When assigning labels we want them to represent what is the top and the bottom percentile based on sorted values, but the **highest value** of the metric **is not always the best**. 

>> For example, the `recency metric` which calculates days since the last purchase, is better when it's low rather than high.

- For this example, we have created a sample dataset with 8 CustomerIDs and their Recency in days.

Let's create a list of labels - only this time the values are reversed as lower recency is rated higher. 

- We will use the `qcut() function` on the `Recency variable`, and define that we want 4 groups of equal size. 
- We will pass the list of labels we created above. 
- Next, we add a column to our dataset. 
- And then print it after sorting by the `recency_days` value.

In [4]:
# add Recency_Days to data
data = pd.DataFrame({'CustomerID': range(8), 
                     'Recency_Days': np.array([ 37, 235, 396,  72, 255, 393, 203, 133])
                    })

# store labels from 4 to 1 in a decreasing order
r_labels = list(range(4, 0, -1))

# divide into group based on quartiles
recency_quartiles = pd.qcut(data['Recency_Days'], q=4, labels = r_labels)

# add new column
data['Recency_Quartile'] = recency_quartiles

# sort
data.sort_values('Recency_Days')

Unnamed: 0,CustomerID,Recency_Days,Recency_Quartile
0,0,37,4
3,3,72,4
7,7,133,3
6,6,203,3
1,1,235,2
4,4,255,2
5,5,393,1
2,2,396,1


As you can see, the lower the recency, the higher the quartile value. When assigning labels, you should always think whether higher or lower values should be of a higher rank.

#### Custom labels
We can also create custom named labels. 
- First, we create named `labels as strings` in a `descending order`. We use `descending order` because we are *ranking Recency metric*.
- Then we run everything like previously and get a new `Recency label` based on the previously defined values.

In [5]:
data = pd.DataFrame({'CustomerID': range(8), 
                     'Recency_Days': np.array([ 37, 235, 396,  72, 255, 393, 203, 133])
                    })
# create string labels
r_labels = ['Active', 'Lapsed', 'InActive', 'Churned']

# divide into groups based on quantiles
recency_quartiles = pd.qcut(data['Recency_Days'], q=4, labels = r_labels)

# add new_column
data['Recency_Quartile'] = recency_quartiles

# sort
data.sort_values('Recency_Days')

Unnamed: 0,CustomerID,Recency_Days,Recency_Quartile
0,0,37,Active
3,3,72,Active
7,7,133,Lapsed
6,6,203,Lapsed
1,1,235,InActive
4,4,255,InActive
5,5,393,Churned
2,2,396,Churned


Although this is a small sample, it does show the main concepts of how to use percentiles to group customers based on their usage behavior.

## 2. Calculating RFM metrics
In this lesson we will learn how to calculate `Recency, Frequency and Monetary Value` for each customer.

#### Definitions
- First step, let's nail down the definitions of the RFM values: `Recency` is just the number of days since the last transaction of the customer - the lower it is, the better, since every company wants its customers to be recent and active. 
- `Frequency` calculates the number of transactions in the last 12 months, although there are variations such as average monthly transactions which depict the essence of this metric as well. 
- And third, the `monetary value` is just the total value that the customer has spent with the company in the last 12 months. 

In [6]:
datamart = pd.read_csv(r"../input/chap-2/datamart_rfm_scores_named_segment.csv")
datamart.head()

Unnamed: 0,CustomerID,Recency,Frequency,MonetaryValue,R,F,M,RFM_Segment,RFM_Score,RFM_Level
0,12747,3,25,948.7,4,4,4,444,12.0,Top
1,12748,1,888,7046.16,4,4,4,444,12.0,Top
2,12749,4,37,813.45,4,4,4,444,12.0,Top
3,12820,4,17,268.02,4,3,3,433,10.0,Top
4,12822,71,9,146.15,2,2,3,223,7.0,Middle


One comment though - the 12 months is a standard way to do this, but it can be chosen arbitrarily depending on the business model and the lifecycle of the products and customers.

#### Dataset and preparations
As in the [previous lessons](https://github.com/Nhan121/Lectures_notes-teaching-in-VN-/blob/master/Statistics/Machine%20Learning/Clustering%20%26%20Segmentation/Customer%20Segmentation/Cohort_Analysis.ipynb), we will use the same [online dataset](https://www.kaggle.com/dovannhan/chapter-1-dataset?select=online.csv). 

In [7]:
online_path = r"../input/chap-2/online12M.csv"
online = pd.read_csv(online_path)
online = online.iloc[:, 1:]
online.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,572558,22745,POPPY'S PLAYHOUSE BEDROOM,6,2011-10-25,2.1,14286,United Kingdom
1,577485,23196,VINTAGE LEAF MAGNETIC NOTEPAD,1,2011-11-20,1.45,16360,United Kingdom
2,560034,23299,FOOD COVER WITH BEADS SET 2,6,2011-07-14,3.75,13933,United Kingdom
3,578307,72349B,SET/6 PURPLE BUTTERFLY T-LIGHTS,1,2011-11-23,2.1,17290,United Kingdom
4,554656,21756,BATH BUILDING BLOCK WORD,3,2011-05-25,5.95,17663,United Kingdom


Now, we will do some data preparation before calculating the RFM values.

In [8]:
online['Total_Sum'] = online['Quantity'] * online['UnitPrice']
online.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,Total_Sum
0,572558,22745,POPPY'S PLAYHOUSE BEDROOM,6,2011-10-25,2.1,14286,United Kingdom,12.6
1,577485,23196,VINTAGE LEAF MAGNETIC NOTEPAD,1,2011-11-20,1.45,16360,United Kingdom,1.45
2,560034,23299,FOOD COVER WITH BEADS SET 2,6,2011-07-14,3.75,13933,United Kingdom,22.5
3,578307,72349B,SET/6 PURPLE BUTTERFLY T-LIGHTS,1,2011-11-23,2.1,17290,United Kingdom,2.1
4,554656,21756,BATH BUILDING BLOCK WORD,3,2011-05-25,5.95,17663,United Kingdom,17.85


#### Data preparation steps
The online dataset has already been pre-processed and only includes the recent 12 months of data.

We can confirm that by viewing `min()` and `max()` of the `InvoiceDate` which you can see covers the full year. 

In [9]:
print('Min: {}, Max: {}'.format(min(online.InvoiceDate), max(online.InvoiceDate)))

Min: 2010-12-10, Max: 2011-12-09


In the real world, we would be working with the most **recent snapshot** of the data of today or yesterday, but in this case the data comes from 2010 and 2011, so we have to create a hypothetical snapshot date that we'll use as a starting point to calculate metrics as if we're doing the analysis on the most recent data. 

In [10]:
import datetime as dt
online['InvoiceDate'] = pd.to_datetime(online['InvoiceDate'])
snapshot_date = max(online.InvoiceDate) + dt.timedelta(days = 1)
snapshot_date

Timestamp('2011-12-10 00:00:00')

So what we do is take the last `InvoiceDate` from the dataset and add one more day using the `timedelta()` function from datetime library. With `days = 1 argument` we create a period of 1 day which we can then add to our date.

#### Calculate RFM metrics
Now that we're done with preparations, we can finally calculate the RFM metrics. 
- First, we aggregate the data on a `Customer level`, and calculate three metrics: we use the `InvoiceDate` and pass it to the `lambda function`, and then take a difference between our `snapshot date` - which would be today in the real world - and the most recent or `max()` invoice `date`.

- Then we count the invoices for our frequency metric, and sum all the spend that's recorded in the `TotalSum variable`.

In [11]:
data_mart = online.groupby(['CustomerID']).agg({'InvoiceDate': lambda x: (snapshot_date - x.max()).days,
                                                'InvoiceNo': 'count',
                                                'Total_Sum': 'sum'
                                               })

 This gives us the number of days between hypothetical today and the last transaction.
 
- Next, we rename the columns in the new dataframe for easier interpretation.

In [12]:
data_mart = data_mart.rename(columns = {'InvoiceDate': 'Recency',
                                        'InvoiceNo' : 'Frequency',
                                        'Total_Sum' : 'Monetary_Value'
                                     },
                             inplace = False
                            )

 And finally, let's view the result!

In [13]:
data_mart.head()

Unnamed: 0_level_0,Recency,Frequency,Monetary_Value
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12747,3,25,948.7
12748,1,888,7046.16
12749,4,37,813.45
12820,4,17,268.02
12822,71,9,146.15




#### Final RFM values
The result is a table which has a row for each customer with their recency, frequency and monetary value as of today, as if we were running the analysis the day after this data was pulled from the retailer's website which would be the case in the real world. 

This is all we need for our next steps in building powerful and intuitive RFM segments.

### Comments
- For the average Frequency value? we have

In [14]:
data_mart.Frequency.mean()

18.71424650013725

## 3. Building RFM segments
We will now group customers into 4 segments of same size for each `RFM` value. We will use the same approach we did with the dummy data in the previous lesson.

#### Data
We have previously created a dataset containing `Recency, Frequency` and `Monetary_Value` metrics for each customer. 

Now our next step will be to use them to assign quartiles of these metrics for each customer.

In [15]:
data_mart.head()

Unnamed: 0_level_0,Recency,Frequency,Monetary_Value
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12747,3,25,948.7
12748,1,888,7046.16
12749,4,37,813.45
12820,4,17,268.02
12822,71,9,146.15


#### Recency quartile
In order to do that.
- We will first create a generator of values for labels with the `range() function`.

In [16]:
r_labels = list(range(4, 0, -1))

When generating labels we will have to decide whether the high value of the metric should be good or bad, and appropriately define the order of the labels. Since the recency value measures the days since last transaction, we will rate customers who have been active more recently better than the less recent customers. 

We pass these labels to the qcut() function which sorts the customers by their recency values in an increasing order and then assigns values from 4 to 1 based on the quartile they fall to.

In [17]:
recency_quartiles = pd.qcut(data_mart['Recency'], q=4, labels = r_labels)

Finally, we assign these values to a new column called R.

In [18]:
data_mart = data_mart.assign(R = recency_quartiles.values)
data_mart.head()

Unnamed: 0_level_0,Recency,Frequency,Monetary_Value,R
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
12747,3,25,948.7,4
12748,1,888,7046.16,4
12749,4,37,813.45,4
12820,4,17,268.02,4
12822,71,9,146.15,2


#### Frequency and monetary quartiles
Now, we do the same thing for Frequency and Monetary Value. 
- The first difference is that the `labels` have a different order than recency, because frequency and monetary values are considered better when they are higher: we want customers to spend more and visit more often. Hence, we assign higher labels to higher values. 

In [19]:
f_labels, m_labels = range(1, 5), range(1, 5)

- Then - as in the previous use case - we pass the column, together with the number of equally sized groups and the labels to the `qcut() function`. 

In [20]:
f_quartiles = pd.qcut(data_mart['Frequency'], q=4, labels = f_labels)
m_quartiles = pd.qcut(data_mart['Monetary_Value'], q=4, labels = m_labels)

- Finally, we create two new more columns `F` and `M` for `Frequency` and `MonetaryValue quartile`s. And there we have it - all customers have a quartile value assigned that we will use to build the segmentation.

In [21]:
data_mart = data_mart.assign(F = f_quartiles.values)
data_mart = data_mart.assign(M = m_quartiles.values)
data_mart.head()

Unnamed: 0_level_0,Recency,Frequency,Monetary_Value,R,F,M
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
12747,3,25,948.7,4,4,4
12748,1,888,7046.16,4,4,4
12749,4,37,813.45,4,4,4
12820,4,17,268.02,4,3,3
12822,71,9,146.15,2,2,3


#### Build RFM segment and RFM score
Now the final step is to create the RFM Segment which is just a concatenated string of RFM values, and the RFM Score which is the sum of RFM values. 
- First, we will define a `join_rfm()` function that creates the concatenated list from RFM values converted to strings. 

In [22]:
def join_rfm(x):
    return str(x['R']) + str(x['F']) + str(x['M'])

- Then, we will create `RFM_Segment` by applying this function to the dataset across the columns.

In [23]:
data_mart['RFM_Segment'] = data_mart.apply(join_rfm, axis = 1)

- Finally we will create the `RFM_Score` by summing RFM values across the columns.

In [24]:
data_mart['RFM_Score'] = data_mart[['R', 'F', 'M']].sum(axis = 1)

#### Final result
This is the result we get - customers get 2 different variables that we created - an RFM segment based from three different RFM values, and the RFM score that sums up the RFM values and indicates a relative customer value.

In [25]:
data_mart.head()

Unnamed: 0_level_0,Recency,Frequency,Monetary_Value,R,F,M,RFM_Segment,RFM_Score
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
12747,3,25,948.7,4,4,4,4.04.04.0,12
12748,1,888,7046.16,4,4,4,4.04.04.0,12
12749,4,37,813.45,4,4,4,4.04.04.0,12
12820,4,17,268.02,4,3,3,4.03.03.0,10
12822,71,9,146.15,2,2,3,2.02.03.0,7


### 3.2. PRACTICES
#### Exercise 3.2.1. Calculate 3 groups for recency and frequency
You will now group the customers into three separate groups based on `Recency`, and `Frequency.`

The dataset has been loaded as `datamart`, you can use console to view top rows of it. Also, `pandas` has been loaded as `pd`.

We will use the result from the exercise in the next one, where you will group customers based on the `MonetaryValue` and finally calculate and `RFM_Score`.

#### SOLUTION

In [26]:
# Create labels for Recency and Frequency
r_labels = range(3, 0, -1); f_labels = range(1, 4)

# Assign these labels to three equal percentile groups 
r_groups = pd.qcut(datamart['Recency'], q=3, labels=r_labels)

# Assign these labels to three equal percentile groups 
f_groups = pd.qcut(datamart['Frequency'], q=3, labels=f_labels)

# Create new columns R and F 
datamart = datamart.assign(R=r_groups.values, F=f_groups.values)
datamart.head()

Unnamed: 0,CustomerID,Recency,Frequency,MonetaryValue,R,F,M,RFM_Segment,RFM_Score,RFM_Level
0,12747,3,25,948.7,3,3,4,444,12.0,Top
1,12748,1,888,7046.16,3,3,4,444,12.0,Top
2,12749,4,37,813.45,3,3,4,444,12.0,Top
3,12820,4,17,268.02,3,3,3,433,10.0,Top
4,12822,71,9,146.15,2,2,3,223,7.0,Middle


#### Exercise 3.2.2. Calculate RFM Score
Great work, you will now finish the job by assigning customers to three groups based on the `MonetaryValue percentiles` and then calculate an `RFM_Score` which is a sum of the `R`, `F`, and `M` values.

The datamart has been loaded with the `R` and `F` values you have created in the previous exercise.
#### SOLUTION.

In [27]:
# Create labels for MonetaryValue
m_labels = range(1, 4)

# Assign these labels to three equal percentile groups 
m_groups = pd.qcut(datamart['MonetaryValue'], q=3, labels=m_labels)

# Create new column M
datamart = datamart.assign(M=m_groups.values)

# Calculate RFM_Score
datamart['RFM_Score'] = datamart[['R','F','M']].sum(axis=1)
print(datamart['RFM_Score'].head())

0    9
1    9
2    9
3    9
4    6
Name: RFM_Score, dtype: int64


## 4. Analyzing RFM segments
We are now going to use the segmentation we created previously and analyze it.

#### Largest RFM segments
Let's first review the top 10 largest RFM segments. 

As we can see the lowest and the highest rated RFM segments are among the largest ones. It is always the best practice to investigate the size of the segments before you use them for targeting or other business applications.

In [28]:
datamart.groupby(['RFM_Segment']).size().sort_values(ascending = False)[:10]

RFM_Segment
444    372
111    345
211    169
344    156
233    129
222    128
333    120
122    117
311    114
433    113
dtype: int64

#### Filtering on RFM segments
Another practical aspect of this segmentation is that it allows to do a simple selection of customers based on their `RFM segment`. 

In this case we select the bottom RFM segment with the lowest segmentation of 111.

In [29]:
datamart[datamart['RFM_Segment'] == 111][:5]

Unnamed: 0,CustomerID,Recency,Frequency,MonetaryValue,R,F,M,RFM_Segment,RFM_Score,RFM_Level
16,12837,174,2,10.55,1,1,1,111,3,Low
28,12852,295,2,32.55,1,1,1,111,3,Low
51,12902,265,4,42.03,1,1,1,111,3,Low
58,12915,149,2,35.9,1,1,1,111,3,Low
63,12922,162,4,57.24,1,1,1,111,3,Low


#### Summary metrics per RFM score
The RFM Score has a smaller number of unique values and we will analyze some metrics for these groups. 

Lets calculate average recency, frequency and monetary value, and then the count of customers in each group. 

As you can see, the sizes are fairly similar across the `RFM Score` groups, and each of the `RFM value`s are better with the higher `RFM Score` segment.

In [30]:
data_mart.groupby('RFM_Score').agg({'Recency': 'mean' ,
                                   'Frequency': 'mean',
                                   'Monetary_Value': ['mean', 'count']
                                  }).round(1)

Unnamed: 0_level_0,Recency,Frequency,Monetary_Value,Monetary_Value
Unnamed: 0_level_1,mean,mean,mean,count
RFM_Score,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
3,246.9,2.1,28.4,345
4,162.2,3.1,47.8,337
5,138.9,4.3,78.2,393
6,101.0,6.3,146.3,444
7,78.0,8.5,160.2,382
8,62.6,12.8,196.3,376
9,46.8,16.7,330.3,345
10,31.9,24.0,443.1,355
11,21.8,38.9,705.3,294
12,8.0,75.6,1653.9,372


#### Grouping into named segments
Although intuitive, it is still hard to fully interpret this segmentation. Often, we will group customers based on their RFM scores into an even smaller number of segments. 
- First we will create a `segment_me() function` to which we will pass a dataframe and return different segments - either gold, silver or bronze - based on their `RFM_Score values`. 

In [31]:
def segment_me(df):
    if df['RFM_Score'] > 9:
        return 'Gold'
    elif(df['RFM_Score'] > 5) and (df['RFM_Score'] < 9):
        return 'Silver'
    else:
        return 'Bronze'

- Then, we will apply this function to our datamart and create a variable called `general_segment`. Great, let's analyze the `RFM values` and group sizes across these 3 segments!

In [32]:
data_mart['General_Segment'] = data_mart.apply(segment_me, axis = 1)
data_mart.groupby('General_Segment').agg({'Recency': 'mean',
                                          'Frequency': 'mean',
                                         'Monetary_Value': ['mean', 'count']
                                        }).round(1)

Unnamed: 0_level_0,Recency,Frequency,Monetary_Value,Monetary_Value
Unnamed: 0_level_1,mean,mean,mean,count
General_Segment,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Bronze,148.3,6.5,120.1,1420
Gold,20.3,47.1,959.7,1021
Silver,81.7,9.0,166.3,1202


#### New segments and their values
From the first look our segmentation does make sense. In reality it can take multiple takes of trial and error to find the right cut-offs.

### 4.2. PRACTICES
#### Question 4.2.1. Find average value for RFM score segment
What is the average `Monetary_Value` for the segment with `RFM_Score` of 9 (nine)?
#### Answers

In [33]:
data_mart.groupby('RFM_Score').mean().at[9, 'Monetary_Value']

330.26971304347813

#### Exercise 4.2.2. Creating custom segments
It's your turn to create a custom segmentation based on RFM_Score values. You will create a function to build segmentation and then assign it to each customer.

The dataset with the RFM values, RFM Segment and Score has been loaded as datamart, together with pandas and numpy libraries. Feel free to explore the data in the console.
#### SOLUTION

In [34]:
# Define rfm_level function
def rfm_level(df):
    if df['RFM_Score'] >= 10:
        return 'Top'
    elif ((df['RFM_Score'] >= 6) and (df['RFM_Score'] < 10)):
        return 'Middle'
    else:
        return 'Low'

# Create a new variable RFM_Level
data_mart['RFM_Level'] = data_mart.apply(rfm_level, axis=1)

data_mart.head()

Unnamed: 0_level_0,Recency,Frequency,Monetary_Value,R,F,M,RFM_Segment,RFM_Score,General_Segment,RFM_Level
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
12747,3,25,948.7,4,4,4,4.04.04.0,12,Gold,Top
12748,1,888,7046.16,4,4,4,4.04.04.0,12,Gold,Top
12749,4,37,813.45,4,4,4,4.04.04.0,12,Gold,Top
12820,4,17,268.02,4,3,3,4.03.03.0,10,Gold,Top
12822,71,9,146.15,2,2,3,2.02.03.0,7,Silver,Middle


#### Exercise 4.2.3. Analyzing custom segments
As a final step, you will analyze average values of Recency, Frequency and MonetaryValue for the custom segments you've created.

We have loaded the datamart dataset with the segment values you have calculated in the previous exercise. Feel free to explore it in the console. pandas library is also loaded as pd.
#### SOLUTION

In [35]:
# Calculate average values for each RFM_Level, and return a size of each segment 
rfm_level_agg = data_mart.groupby('RFM_Level').agg({
                                                 'Recency': 'mean',
                                                 'Frequency': 'mean',
                                                 'Monetary_Value': ['mean', 'count']
                                                }).round(1)

# Print the aggregated dataset
rfm_level_agg

Unnamed: 0_level_0,Recency,Frequency,Monetary_Value,Monetary_Value
Unnamed: 0_level_1,mean,mean,mean,count
RFM_Level,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Low,180.8,3.2,52.7,1075
Middle,73.9,10.7,202.9,1547
Top,20.3,47.1,959.7,1021
