<a href="https://www.kaggle.com/code/alpy18/customer-segmentation-with-rfm-flo?scriptVersionId=113819139" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# İş Problemi
Online ayakkabı mağazası olan FLO müşterilerini segmentlere ayırıp bu segmentlere göre pazarlama stratejileri belirlemek istiyor. Buna yönelik olarak müşterilerin davranışları tanımlanacak ve bu davranışlardaki öbeklenmelere göre gruplar oluşturulacak.


# Veri Seti Hikayesi
Veri seti Flo’dan son alışverişlerini 2020 -2021 yıllarında OmniChannel(hem online hem offline alışveriş yapan) olarak yapan müşterilerin geçmiş alışveriş davranışlarından elde edilen bilgilerden oluşmaktadır.

Değişek | Açıklama
-|-
master_id | Eşsiz müşteri numarası
order_channel | Alışveriş yapılan platforma ait hangi kanalın kullanıldığı (Android, ios, Desktop, Mobile)
last_order_channel | En son alışverişin yapıldığı kanal
first_order_date | Müşterinin yaptığı ilk alışveriş tarihi
last_order_date | Müşterinin yaptığı son alışveriş tarihi
last_order_date_online | Müşterinin online platformda yaptığı son alışveriş tarihi
last_order_date_offline | Müşterinin offline platformda yaptığı son alışveriş tarihi
order_num_total_ever_online | Müşterinin online platformda yaptığı toplam alışveriş sayısı
order_num_total_ever_offline | Müşterinin offline'da yaptığı toplam alışveriş sayısı
customer_value_total_ever_offline | Müşterinin offline alışverişlerinde ödediği toplam ücret
customer_value_total_ever_online | Müşterinin online alışverişlerinde ödediği toplam ücret
interested_in_categories_12 | Müşterinin son 12 ayda alışveriş yaptığı kategorilerin listesi

In [1]:
import pandas as pd
import datetime as dt

In [2]:
df = pd.read_csv("/kaggle/input/flo-20k/flo_data_20k.csv")
df.head()

Unnamed: 0,master_id,order_channel,last_order_channel,first_order_date,last_order_date,last_order_date_online,last_order_date_offline,order_num_total_ever_online,order_num_total_ever_offline,customer_value_total_ever_offline,customer_value_total_ever_online,interested_in_categories_12
0,cc294636-19f0-11eb-8d74-000d3a38a36f,Android App,Offline,2020-10-30,2021-02-26,2021-02-21,2021-02-26,4.0,1.0,139.99,799.38,[KADIN]
1,f431bd5a-ab7b-11e9-a2fc-000d3a38a36f,Android App,Mobile,2017-02-08,2021-02-16,2021-02-16,2020-01-10,19.0,2.0,159.97,1853.58,"[ERKEK, COCUK, KADIN, AKTIFSPOR]"
2,69b69676-1a40-11ea-941b-000d3a38a36f,Android App,Android App,2019-11-27,2020-11-27,2020-11-27,2019-12-01,3.0,2.0,189.97,395.35,"[ERKEK, KADIN]"
3,1854e56c-491f-11eb-806e-000d3a38a36f,Android App,Android App,2021-01-06,2021-01-17,2021-01-17,2021-01-06,1.0,1.0,39.99,81.98,"[AKTIFCOCUK, COCUK]"
4,d6ea1074-f1f5-11e9-9346-000d3a38a36f,Desktop,Desktop,2019-08-03,2021-03-07,2021-03-07,2019-08-03,1.0,1.0,49.99,159.99,[AKTIFSPOR]


In [3]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
order_num_total_ever_online,19945.0,3.110855,4.225647,1.0,1.0,2.0,4.0,200.0
order_num_total_ever_offline,19945.0,1.913913,2.06288,1.0,1.0,1.0,2.0,109.0
customer_value_total_ever_offline,19945.0,253.922597,301.532853,10.0,99.99,179.98,319.97,18119.14
customer_value_total_ever_online,19945.0,497.32169,832.601886,12.99,149.98,286.46,578.44,45220.13


In [4]:
df.isnull().sum()

master_id                            0
order_channel                        0
last_order_channel                   0
first_order_date                     0
last_order_date                      0
last_order_date_online               0
last_order_date_offline              0
order_num_total_ever_online          0
order_num_total_ever_offline         0
customer_value_total_ever_offline    0
customer_value_total_ever_online     0
interested_in_categories_12          0
dtype: int64

In [5]:
df.dtypes

master_id                             object
order_channel                         object
last_order_channel                    object
first_order_date                      object
last_order_date                       object
last_order_date_online                object
last_order_date_offline               object
order_num_total_ever_online          float64
order_num_total_ever_offline         float64
customer_value_total_ever_offline    float64
customer_value_total_ever_online     float64
interested_in_categories_12           object
dtype: object

In [6]:
df["omnichannel_total_order"] = df["order_num_total_ever_online"] + df["order_num_total_ever_offline"]
df["omnichannel_total_expense"] = df["customer_value_total_ever_offline"] + df["customer_value_total_ever_online"]
df.head()

Unnamed: 0,master_id,order_channel,last_order_channel,first_order_date,last_order_date,last_order_date_online,last_order_date_offline,order_num_total_ever_online,order_num_total_ever_offline,customer_value_total_ever_offline,customer_value_total_ever_online,interested_in_categories_12,omnichannel_total_order,omnichannel_total_expense
0,cc294636-19f0-11eb-8d74-000d3a38a36f,Android App,Offline,2020-10-30,2021-02-26,2021-02-21,2021-02-26,4.0,1.0,139.99,799.38,[KADIN],5.0,939.37
1,f431bd5a-ab7b-11e9-a2fc-000d3a38a36f,Android App,Mobile,2017-02-08,2021-02-16,2021-02-16,2020-01-10,19.0,2.0,159.97,1853.58,"[ERKEK, COCUK, KADIN, AKTIFSPOR]",21.0,2013.55
2,69b69676-1a40-11ea-941b-000d3a38a36f,Android App,Android App,2019-11-27,2020-11-27,2020-11-27,2019-12-01,3.0,2.0,189.97,395.35,"[ERKEK, KADIN]",5.0,585.32
3,1854e56c-491f-11eb-806e-000d3a38a36f,Android App,Android App,2021-01-06,2021-01-17,2021-01-17,2021-01-06,1.0,1.0,39.99,81.98,"[AKTIFCOCUK, COCUK]",2.0,121.97
4,d6ea1074-f1f5-11e9-9346-000d3a38a36f,Desktop,Desktop,2019-08-03,2021-03-07,2021-03-07,2019-08-03,1.0,1.0,49.99,159.99,[AKTIFSPOR],2.0,209.98


In [7]:
date_columns = df.columns[df.columns.str.contains("date")]
df[date_columns] = df[date_columns].apply(pd.to_datetime)
df.dtypes

master_id                                    object
order_channel                                object
last_order_channel                           object
first_order_date                     datetime64[ns]
last_order_date                      datetime64[ns]
last_order_date_online               datetime64[ns]
last_order_date_offline              datetime64[ns]
order_num_total_ever_online                 float64
order_num_total_ever_offline                float64
customer_value_total_ever_offline           float64
customer_value_total_ever_online            float64
interested_in_categories_12                  object
omnichannel_total_order                     float64
omnichannel_total_expense                   float64
dtype: object

In [8]:
df.groupby("order_channel").agg({"master_id": "count",
                                 "omnichannel_total_order": "sum",
                                 "omnichannel_total_expense": "sum"})

Unnamed: 0_level_0,master_id,omnichannel_total_order,omnichannel_total_expense
order_channel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Android App,9495,52269.0,7819062.76
Desktop,2735,10920.0,1610321.46
Ios App,2833,15351.0,2525999.93
Mobile,4882,21679.0,3028183.16


In [9]:
df["last_order_date"].max()

Timestamp('2021-05-30 00:00:00')

In [10]:
analysis_date = dt.datetime(2021, 6, 1)

In [11]:
rfm = pd.DataFrame()
rfm["customer_id"] = df["master_id"]
rfm["recency"] = (analysis_date - df["last_order_date"]).astype("timedelta64[D]")
rfm["frequency"] = df["omnichannel_total_order"]
rfm["monetary"] = df["omnichannel_total_expense"]
rfm.head()

Unnamed: 0,customer_id,recency,frequency,monetary
0,cc294636-19f0-11eb-8d74-000d3a38a36f,95.0,5.0,939.37
1,f431bd5a-ab7b-11e9-a2fc-000d3a38a36f,105.0,21.0,2013.55
2,69b69676-1a40-11ea-941b-000d3a38a36f,186.0,5.0,585.32
3,1854e56c-491f-11eb-806e-000d3a38a36f,135.0,2.0,121.97
4,d6ea1074-f1f5-11e9-9346-000d3a38a36f,86.0,2.0,209.98


In [12]:
rfm["recency_score"] = pd.qcut(rfm["recency"], 5, labels=[5, 4, 3, 2, 1])
rfm["frequency_score"] = pd.qcut(rfm["frequency"].rank(method="first"), 5, labels=[1, 2, 3, 4, 5])
rfm["monetary_score"] = pd.qcut(rfm["monetary"], 5, labels=[1, 2, 3, 4, 5])
rfm.head()

Unnamed: 0,customer_id,recency,frequency,monetary,recency_score,frequency_score,monetary_score
0,cc294636-19f0-11eb-8d74-000d3a38a36f,95.0,5.0,939.37,3,4,4
1,f431bd5a-ab7b-11e9-a2fc-000d3a38a36f,105.0,21.0,2013.55,3,5,5
2,69b69676-1a40-11ea-941b-000d3a38a36f,186.0,5.0,585.32,2,4,3
3,1854e56c-491f-11eb-806e-000d3a38a36f,135.0,2.0,121.97,3,1,1
4,d6ea1074-f1f5-11e9-9346-000d3a38a36f,86.0,2.0,209.98,3,1,1


In [13]:
rfm["RF_SCORE"] = (rfm["recency_score"].astype(str) + rfm["frequency_score"].astype(str))
rfm.head()

Unnamed: 0,customer_id,recency,frequency,monetary,recency_score,frequency_score,monetary_score,RF_SCORE
0,cc294636-19f0-11eb-8d74-000d3a38a36f,95.0,5.0,939.37,3,4,4,34
1,f431bd5a-ab7b-11e9-a2fc-000d3a38a36f,105.0,21.0,2013.55,3,5,5,35
2,69b69676-1a40-11ea-941b-000d3a38a36f,186.0,5.0,585.32,2,4,3,24
3,1854e56c-491f-11eb-806e-000d3a38a36f,135.0,2.0,121.97,3,1,1,31
4,d6ea1074-f1f5-11e9-9346-000d3a38a36f,86.0,2.0,209.98,3,1,1,31


In [14]:
seg_map = {
    r'[1-2][1-2]': 'hibernating',
    r'[1-2][3-4]': 'at_risk',
    r'[1-2]5': 'cant_loose',
    r'3[1-2]': 'about_to_sleep',
    r'33': 'need_attention',
    r'[3-4][4-5]': 'loyal_customers',
    r'41': 'promising',
    r'51': 'new_customers',
    r'[4-5][2-3]': 'potential_loyalists',
    r'5[4-5]': 'champions'
}

rfm["segment"] = rfm["RF_SCORE"].replace(seg_map, regex=True)
rfm.head()

Unnamed: 0,customer_id,recency,frequency,monetary,recency_score,frequency_score,monetary_score,RF_SCORE,segment
0,cc294636-19f0-11eb-8d74-000d3a38a36f,95.0,5.0,939.37,3,4,4,34,loyal_customers
1,f431bd5a-ab7b-11e9-a2fc-000d3a38a36f,105.0,21.0,2013.55,3,5,5,35,loyal_customers
2,69b69676-1a40-11ea-941b-000d3a38a36f,186.0,5.0,585.32,2,4,3,24,at_risk
3,1854e56c-491f-11eb-806e-000d3a38a36f,135.0,2.0,121.97,3,1,1,31,about_to_sleep
4,d6ea1074-f1f5-11e9-9346-000d3a38a36f,86.0,2.0,209.98,3,1,1,31,about_to_sleep


In [15]:
rfm.groupby("segment")[["recency", "frequency", "monetary"]].agg(["mean", "count"])

Unnamed: 0_level_0,recency,recency,frequency,frequency,monetary,monetary
Unnamed: 0_level_1,mean,count,mean,count,mean,count
segment,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
about_to_sleep,113.785144,1629,2.401473,1629,359.008963,1629
at_risk,241.606835,3131,4.472373,3131,646.610236,3131
cant_loose,235.444167,1200,10.698333,1200,1474.468217,1200
champions,17.106625,1932,8.934265,1932,1406.625109,1932
hibernating,247.949501,3604,2.394007,3604,366.267062,3604
loyal_customers,82.594763,3361,8.374591,3361,1216.818616,3361
need_attention,113.828676,823,3.727825,823,562.142965,823
new_customers,17.917647,680,2.0,680,339.955544,680
potential_loyalists,37.155888,2938,3.304289,2938,533.184466,2938
promising,58.921175,647,2.0,647,335.672705,647


FLO bünyesine yeni bir kadın ayakkabı markası dahil ediyor. Dahil ettiği markanın ürün fiyatları genel müşteri
tercihlerinin üstünde. Bu nedenle markanın tanıtımı ve ürün satışları için ilgilenecek profildeki müşterilerle özel olarak
iletişime geçmek isteniliyor. Sadık müşterilerinden(champions, loyal_customers) ve kadın kategorisinden alışveriş
yapan kişiler özel olarak iletişim kurulacak müşteriler. Bu müşterilerin id numaralarını csv dosyasına kaydediniz.

In [16]:
tanitim = pd.DataFrame()
tanitim = rfm[((rfm["segment"] == "champions") | (rfm["segment"] == "loyal_customers")) & df["interested_in_categories_12"].str.contains("KADIN")]["customer_id"]
tanitim.to_csv("tanitim.csv", index=False)
tanitim.head()

0     cc294636-19f0-11eb-8d74-000d3a38a36f
1     f431bd5a-ab7b-11e9-a2fc-000d3a38a36f
8     cfbda69e-5b4f-11ea-aca7-000d3a38a36f
11    c2e15af2-9eed-11e9-9897-000d3a38a36f
33    fb840306-1219-11ea-a001-000d3a38a36f
Name: customer_id, dtype: object

Erkek ve Çocuk ürünlerinde %40'a yakın indirim planlanmaktadır. Bu indirimle ilgili kategorilerle ilgilenen geçmişte
iyi müşteri olan ama uzun süredir alışveriş yapmayan kaybedilmemesi gereken müşteriler, uykuda olanlar ve yeni
gelen müşteriler özel olarak hedef alınmak isteniyor. Uygun profildeki müşterilerin id'lerini csv dosyasına kaydediniz.

In [17]:
indirim = pd.DataFrame()
indirim = rfm.loc[(rfm["segment"].isin(["cant_loose","hibernating","new_customers"])) & (df["interested_in_categories_12"].str.contains("ERKEK") | df["interested_in_categories_12"].str.contains("COCUK")), "customer_id"]
indirim.to_csv("indirim.csv", index=False)
indirim.head()

7     3f1b4dc8-8a7d-11ea-8ec0-000d3a38a36f
10    ae608ece-c9d8-11ea-a31e-000d3a38a36f
15    13ed97a4-b167-11e9-89fa-000d3a38a36f
19    2730793e-3908-11ea-85d6-000d3a38a36f
21    7b289956-d691-11e9-93bc-000d3a38a36f
Name: customer_id, dtype: object