# BG-NBD ve Gamma-Gamma ile CLTV Tahmini


## İş Problemi: İngiltere merkezli perakende şirketi satış ve pazarlama faaliyetleri için roadmap belirlemek istemektedir. Şirketin orta uzun vadeli plan yapabilmesi için var olan müşterilerin gelecekte şirkete sağlayacakları potansiyel değerin tahmin edilmesi gerekmektedir.


## Veri Seti Hikayesi: Online Retail II isimli veri seti İngiltere merkezli bir perakende şirketinin 01/12/2009 - 09/12/2011 tarihleri arasındaki online satış işlemlerini içeriyor. Şirketin ürün kataloğunda hediyelik eşyalar yer almaktadır ve çoğu müşterisinin toptancı olduğu bilgisi mevcuttur.


## Görev 2: Farklı Zaman Periyotlarından Oluşan CLTV Analizi

### Adım 1: 2010-2011 UK müşterileri için 1 aylık ve 12 aylık CLTV hesaplayınız.

### Adım 2: 1 aylık CLTV'de en yüksek olan 10 kişi ile 12 aylık'taki en yüksek 10 kişiyi analiz ediniz.

### Adım 3: Fark var mı? Varsa sizce neden olabilir?

## Görev 3: Segmentasyon ve Aksiyon Önerileri

### Adım 1: 2010-2011 UK müşterileri için 6 aylık CLTV'ye göre tüm müşterilerinizi 4 gruba (segmente) ayırınız ve grup isimlerini veri setine ekleyiniz.

### Adım 2: 4 grup içerisinden seçeceğiniz 2 grup için yönetime kısa kısa 6 aylık aksiyon önerilerinde bulununuz.


In [1]:
import pandas as pd
import datetime as dt
!pip install lifetimes
from lifetimes import BetaGeoFitter
from lifetimes import GammaGammaFitter
from sklearn.preprocessing import MinMaxScaler
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
pd.options.mode.chained_assignment = None



Aykırı değerleri baskılamak için gerekli olan outlier_thresholds ve replace_with_thresholds fonksiyonlarını tanımlayınız.

Not: cltv hesaplanırken frequency değerleri integer olması gerekmektedir.Bu nedenle alt ve üst limitlerini round() ile yuvarlayınız.

In [2]:
def outlier_thresholds(dataframe, variable):
  quartile1 = dataframe[variable].quantile(0.01)
  quartile3 = dataframe[variable].quantile(0.99)
  interquantile_range = quartile3 - quartile1
  up_limit = quartile3 + 1.5*interquantile_range
  low_limit = quartile1 - 1.5*interquantile_range
  return low_limit, up_limit

In [3]:
# Aykırı değerleri baskılayalım.
def replace_with_thresholds(dataframe, variable):
  low_limit, up_limit = outlier_thresholds(dataframe, variable)
  dataframe.loc[(dataframe[variable] < low_limit), variable] = round(low_limit,0)
  dataframe.loc[(dataframe[variable] > up_limit), variable] = round(up_limit,0)

### Verinin Okunması

In [4]:
df_ = pd.read_excel("online_retail_II.xlsx",
                    sheet_name="Year 2010-2011")

In [5]:
df = df_.copy()

In [6]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Quantity,541910.0,9.55,218.08,-80995.0,1.0,3.0,10.0,80995.0
Price,541910.0,4.61,96.76,-11062.06,1.25,2.08,4.13,38970.0
Customer ID,406830.0,15287.68,1713.6,12346.0,13953.0,15152.0,16791.0,18287.0


In [7]:
df.isnull().sum()

Invoice             0
StockCode           0
Description      1454
Quantity            0
InvoiceDate         0
Price               0
Customer ID    135080
Country             0
dtype: int64

In [8]:
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


### Veri Ön İşleme

In [9]:
df.dropna(inplace=True)

In [10]:
df = df[~df["Invoice"].str.contains("C", na=False)]

In [11]:
df = df[df["Quantity"] > 0]
df = df[df["Price"] > 0]

In [12]:
replace_with_thresholds(df, "Quantity")
replace_with_thresholds(df, "Price")

In [13]:
df["TotalPrice"] = df["Quantity"] * df["Price"]

In [14]:
df["InvoiceDate"].max()

Timestamp('2011-12-09 12:50:00')

In [15]:
today_date = dt.datetime(2011, 12, 11)

### Lifetime Veri Yapısının Hazırlanması

In [16]:
cltv_df = df.groupby('Customer ID').agg(
    {'InvoiceDate': [lambda InvoiceDate: (InvoiceDate.max() - InvoiceDate.min()).days,
                     lambda InvoiceDate: (today_date - InvoiceDate.min()).days],
     'Invoice': lambda Invoice: Invoice.nunique(),
     'TotalPrice': lambda TotalPrice: TotalPrice.sum()})

In [17]:
cltv_df.columns = cltv_df.columns.droplevel(0)

In [18]:
cltv_df.columns = ['recency', 'T', 'frequency', 'monetary']

In [19]:
cltv_df["avg_monetary"] = cltv_df["monetary"] / cltv_df["frequency"]

In [20]:
cltv_df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
recency,4338.0,130.45,132.04,0.0,0.0,92.5,251.75,373.0
T,4338.0,223.83,117.85,1.0,113.0,249.0,327.0,374.0
frequency,4338.0,4.27,7.7,1.0,1.0,2.0,5.0,209.0
monetary,4338.0,1892.01,7704.11,3.75,303.31,663.1,1631.11,266136.72
avg_monetary,4338.0,364.1,367.21,3.45,176.85,288.23,422.03,6207.67


In [21]:
cltv_df = cltv_df[(cltv_df['frequency'] > 1)]

In [22]:
cltv_df["recency"] = cltv_df["recency"] / 7

In [23]:
cltv_df["T"] = cltv_df["T"] / 7

## Görev 1: BG-NBD ve Gamma-Gamma Modellerini Kurarak 6 Aylık CLTV Tahmini Yapılması

### Adım 1: 2010-2011 yıllarındaki veriyi kullanarak İngiltere’deki müşteriler için 6 aylık CLTV tahmini yapınız.

### Adım 2: Elde ettiğiniz sonuçları yorumlayıp, değerlendiriniz.


In [24]:
bgf = BetaGeoFitter(penalizer_coef=0.001)

bgf.fit(cltv_df['frequency'],
        cltv_df['recency'],
        cltv_df['T'])

<lifetimes.BetaGeoFitter: fitted with 2845 subjects, a: 0.12, alpha: 11.41, b: 2.49, r: 2.18>

In [25]:
cltv_df["bgf_expected_purc_6_month"] = bgf.predict(4*6,
                                              cltv_df['frequency'],
                                              cltv_df['recency'],
                                              cltv_df['T'])

In [26]:
cltv_df.head()

Unnamed: 0_level_0,recency,T,frequency,monetary,avg_monetary,bgf_expected_purc_6_month
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
12347.0,52.14,52.57,7,4310.0,615.71,3.32
12348.0,40.29,51.29,4,1770.24,442.56,2.16
12352.0,37.14,42.43,8,1755.74,219.47,4.28
12356.0,43.14,46.57,3,2811.43,937.14,2.02
12358.0,21.29,21.57,2,1150.06,575.03,2.83


In [27]:
ggf = GammaGammaFitter(penalizer_coef=0.01)

ggf.fit(cltv_df['frequency'],
        cltv_df['avg_monetary'])

<lifetimes.GammaGammaFitter: fitted with 2845 subjects, p: 3.79, q: 0.34, v: 3.73>

In [28]:
cltv_df["exp_average_value"] = ggf.conditional_expected_average_profit(cltv_df['frequency'],
                                                                cltv_df['avg_monetary'])

In [30]:
cltv = ggf.customer_lifetime_value(bgf,
                                   cltv_df['frequency'],
                                   cltv_df['recency'],
                                   cltv_df['T'],
                                   cltv_df['avg_monetary'],
                                   time=6,
                                   freq="W",
                                   discount_rate=0.01)
cltv_df["cltv"] = cltv

In [31]:
cltv_df.head()

Unnamed: 0_level_0,recency,T,frequency,monetary,avg_monetary,bgf_expected_purc_6_month,exp_average_value,cltv
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
12347.0,52.14,52.57,7,4310.0,615.71,3.32,631.91,2200.73
12348.0,40.29,51.29,4,1770.24,442.56,2.16,463.6,1050.05
12352.0,37.14,42.43,8,1755.74,219.47,4.28,224.81,1007.37
12356.0,43.14,46.57,3,2811.43,937.14,2.02,996.0,2109.62
12358.0,21.29,21.57,2,1150.06,575.03,2.83,631.7,1869.96


## Görev 2: Farklı Zaman Periyotlarından Oluşan CLTV Analizi

### Adım 1: 2010-2011 UK müşterileri için 1 aylık ve 12 aylık CLTV hesaplayınız.

### Adım 2: 1 aylık CLTV'de en yüksek olan 10 kişi ile 12 aylık'taki en yüksek 10 kişiyi analiz ediniz.

In [32]:
uk_customer_ids = (df[df["Country"] == "United Kingdom"]["Customer ID"]).unique()

In [33]:
uk_cltv_df = cltv_df[cltv_df.index.isin(uk_customer_ids)]

In [34]:
bgf.fit(uk_cltv_df["frequency"], uk_cltv_df["recency"], uk_cltv_df["T"])

<lifetimes.BetaGeoFitter: fitted with 2570 subjects, a: 0.12, alpha: 11.66, b: 2.51, r: 2.21>

In [35]:
ggf.fit(uk_cltv_df["frequency"], uk_cltv_df["avg_monetary"])

<lifetimes.GammaGammaFitter: fitted with 2570 subjects, p: 3.81, q: 0.35, v: 3.75>

In [36]:
uk_cltv_df["cltv_1_month"] = ggf.customer_lifetime_value(bgf,
                                                         uk_cltv_df["frequency"],
                                                         uk_cltv_df["recency"],
                                                         uk_cltv_df["T"],
                                                         uk_cltv_df["avg_monetary"],
                                                         time=1,
                                                         discount_rate=0.01,
                                                         freq="W")

In [37]:
uk_cltv_df.sort_values("cltv_1_month", ascending=False).head(10)

Unnamed: 0_level_0,recency,T,frequency,monetary,avg_monetary,bgf_expected_purc_6_month,exp_average_value,cltv,cltv_1_month
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
18102.0,52.29,52.57,60,231420.72,3857.01,22.81,3868.26,92439.05,16014.38
14096.0,13.86,14.57,17,53780.22,3163.54,16.78,3196.39,56126.68,9869.16
17450.0,51.29,52.57,46,131609.17,2861.07,17.6,2871.98,52960.41,9177.2
17511.0,52.86,53.43,31,90949.92,2933.87,11.98,2950.5,37044.6,6420.25
16684.0,50.43,51.29,28,61858.62,2209.24,11.25,2223.15,26207.69,4544.4
13694.0,52.71,53.43,50,63782.15,1275.64,18.86,1280.16,25298.89,4382.8
14088.0,44.57,46.14,13,50238.91,3864.53,6.11,3917.11,25092.09,4360.98
16000.0,0.0,0.43,3,6996.98,2332.33,9.38,2476.84,24270.21,4360.33
15311.0,53.29,53.43,91,60767.9,667.78,33.77,669.1,23673.92,4099.86
13089.0,52.29,52.86,97,58816.98,606.36,36.21,607.49,23046.82,3991.66


In [38]:
uk_cltv_df["cltv_12_month"] = ggf.customer_lifetime_value(bgf,
                                                         uk_cltv_df["frequency"],
                                                         uk_cltv_df["recency"],
                                                         uk_cltv_df["T"],
                                                         uk_cltv_df["avg_monetary"],
                                                         time=12,
                                                         discount_rate=0.01,
                                                         freq="W")

In [39]:
uk_cltv_df.sort_values("cltv_12_month", ascending=False).head(10)

Unnamed: 0_level_0,recency,T,frequency,monetary,avg_monetary,bgf_expected_purc_6_month,exp_average_value,cltv,cltv_1_month,cltv_12_month
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
18102.0,52.29,52.57,60,231420.72,3857.01,22.81,3868.26,92439.05,16014.38,176004.59
14096.0,13.86,14.57,17,53780.22,3163.54,16.78,3196.39,56126.68,9869.16,105042.91
17450.0,51.29,52.57,46,131609.17,2861.07,17.6,2871.98,52960.41,9177.2,100853.81
17511.0,52.86,53.43,31,90949.92,2933.87,11.98,2950.5,37044.6,6420.25,70570.91
16684.0,50.43,51.29,28,61858.62,2209.24,11.25,2223.15,26207.69,4544.4,49904.0
13694.0,52.71,53.43,50,63782.15,1275.64,18.86,1280.16,25298.89,4382.8,48183.55
14088.0,44.57,46.14,13,50238.91,3864.53,6.11,3917.11,25092.09,4360.98,47749.29
15311.0,53.29,53.43,91,60767.9,667.78,33.77,669.1,23673.92,4099.86,45078.95
16000.0,0.0,0.43,3,6996.98,2332.33,9.38,2476.84,24270.21,4360.33,44509.86
13089.0,52.29,52.86,97,58816.98,606.36,36.21,607.49,23046.82,3991.66,43879.28


## Görev 3: Segmentasyon ve Aksiyon Önerileri

### Adım 1: 2010-2011 UK müşterileri için 6 aylık CLTV'ye göre tüm müşterilerinizi 4 gruba (segmente) ayırınız ve grup isimlerini veri setine ekleyiniz.

### Adım 2: 4 grup içerisinden seçeceğiniz 2 grup için yönetime kısa kısa 6 aylık aksiyon önerilerinde bulununuz.


In [40]:
uk_cltv_df["cltv_6_month"] = ggf.customer_lifetime_value(bgf,
                                                         uk_cltv_df["frequency"],
                                                         uk_cltv_df["recency"],
                                                         uk_cltv_df["T"],
                                                         uk_cltv_df["avg_monetary"],
                                                         time=6,
                                                         discount_rate=0.01,
                                                         freq="W")

In [41]:
uk_cltv_df["segment"] = pd.qcut(uk_cltv_df["cltv_6_month"], 4, ["D", "B", "C", "A"])

In [42]:
uk_cltv_df.groupby("segment").agg(["mean", "sum", "count"])

Unnamed: 0_level_0,recency,recency,recency,T,T,T,frequency,frequency,frequency,monetary,monetary,monetary,avg_monetary,avg_monetary,avg_monetary,bgf_expected_purc_6_month,bgf_expected_purc_6_month,bgf_expected_purc_6_month,exp_average_value,exp_average_value,exp_average_value,cltv,cltv,cltv,cltv_1_month,cltv_1_month,cltv_1_month,cltv_12_month,cltv_12_month,cltv_12_month,cltv_6_month,cltv_6_month,cltv_6_month
Unnamed: 0_level_1,mean,sum,count,mean,sum,count,mean,sum,count,mean,sum,count,mean,sum,count,mean,sum,count,mean,sum,count,mean,sum,count,mean,sum,count,mean,sum,count,mean,sum,count
segment,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2,Unnamed: 33_level_2
D,22.07,14191.0,643,40.56,26077.43,643,3.07,1973,643,530.58,341165.52,643,178.88,115021.17,643,1.65,1058.88,643,194.01,124748.64,643,269.16,173070.54,643,47.33,30432.08,643,514.23,330651.08,643,270.54,173959.85,643
B,30.83,19795.14,642,38.12,24472.14,642,3.99,2561,642,953.07,611873.27,642,260.73,167388.65,642,2.79,1789.03,642,278.35,178701.93,642,711.75,456944.06,642,124.74,80084.03,642,1353.76,869112.52,642,712.54,457452.48,642
C,29.85,19166.29,642,35.13,22551.57,642,5.46,3504,642,1727.41,1108999.92,642,352.8,226498.11,642,3.76,2410.96,642,371.62,238582.47,642,1274.85,818452.45,642,223.56,143524.29,642,2417.84,1552251.37,642,1274.25,818065.8,642
A,31.45,20219.43,643,34.51,22189.43,643,11.29,7258,643,6517.84,4190972.59,643,591.52,380346.5,643,6.36,4091.64,643,614.43,395077.76,643,3863.28,2484090.7,643,675.6,434408.15,643,7315.8,4704061.61,643,3852.97,2477457.79,643
