<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Separate-Into-Quantiles" data-toc-modified-id="Separate-Into-Quantiles-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Separate Into Quantiles</a></span></li></ul></div>

### Separate Into Quantiles

We will split into 4 quantiles, or quartiles, which is common for RFM analyses. We can always test other quantile splits to see if we're able to gain any further insights or more true/optimized segments of our customers.

For now, we will split into 4. 

In [2]:
# Split into 4 quantiles, which is standard for RFM analyses, we can always 
quantiles = rfm_scale_log.quantile(q=[0.25, 0.5, 0.75])
quantiles = quantiles.to_dict()
quantiles

In [None]:
# Create copy to add quantiles to
segmented_rfm_scores = rfm_scale_log
segmented_rfm_scores.head()

We will now assign the RFM scores. For recency, we want the most recent, so the most recent will be ranked as our #1 customer. For frequency and monetary value, we want the highest, so the 4th quantile will be our #1 customer. 

In [None]:
# Define functions to calculate R, F & M
# x = value
# p = recency, monetary_value, frequency
# d = quantiles dictionary

def r_score(x,p,d):
    if x <= d[p][0.25]:
        return 4
    elif x <= d[p][0.5]:
        return 3
    elif x<= d[p][0.75]:
        return 2
    else:
        return 1
    
def fm_score(x,p,d):
    if x <= d[p][0.25]:
        return 1
    elif x <= d[p][0.5]:
        return 2
    elif x<= d[p][0.75]:
        return 3
    else:
        return 4

In [None]:
# Apply functions

# Recency
segmented_rfm_scores['r_quantile'] = segmented_rfm_scores['recency'].apply(
    r_score, args = ('recency', quantiles,)
)

# Frequency
segmented_rfm_scores['f_quantile'] = segmented_rfm_scores['frequency'].apply(
    fm_score, args = ('frequency', quantiles,)
)

# Monetary Value
segmented_rfm_scores['m_quantile'] = segmented_rfm_scores['monetary_value'].apply(
    fm_score, args = ('monetary_value', quantiles,)
)

# Average Order Value
segmented_rfm_scores['aov_quantile'] = segmented_rfm_scores['monetary_value'].apply(
    fm_score, args = ('aov', quantiles,)
)

In [None]:
# Preview new df
segmented_rfm_scores.sort_values(by='m_quantile').head()

In [None]:
from sklearn.cluster import KMeans

# Within-cluster sum of the squared residuals
wcss = []
for i in range(0, 11):
    kmeans = KMeans(n_clusters=i+1, random_state=44).fit(segmented_rfm_scores.iloc[:,3:])
    wcss.append(kmeans.inertia_)
    
# Plot
sns.pointplot(x=list(range(0, 11)), y=wcss)
plt.title('Elbow Plot - RFM')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.show()