## Kural Tabanlı Sınıflandırma ile Potansiyel Müşteri Getirisi Hesaplama

### İş Problemi

 Bir oyun şirketi müşterilerinin bazı özelliklerini kullanarak seviye tabanlı (level based) yeni müşteri tanımları (persona)
 oluşturmak ve bu yeni müşteri tanımlarına göre segmentler oluşturup bu segmentlere göre yeni gelebilecek müşterilerin şirkete
 ortalama ne kadar kazandırabileceğini tahmin etmek istemektedir.

Örneğin: Türkiye’den IOS kullanıcısı olan 25 yaşındaki bir erkek kullanıcının ortalama ne kadar kazandırabileceği belirlenmek isteniyor.

In [78]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("persona.csv")
df.head()


Unnamed: 0,PRICE,SOURCE,SEX,COUNTRY,AGE
0,39,android,male,bra,17
1,39,android,male,bra,17
2,49,android,male,bra,17
3,29,android,male,tur,17
4,49,android,male,tur,17


In [79]:
df["SOURCE"].nunique()

2

In [80]:
df["PRICE"].nunique()

6

In [81]:
df["PRICE"].value_counts()

29    1305
39    1260
49    1031
19     992
59     212
9      200
Name: PRICE, dtype: int64

In [82]:
df["COUNTRY"].value_counts()

usa    2065
bra    1496
deu     455
tur     451
fra     303
can     230
Name: COUNTRY, dtype: int64

In [83]:
df.groupby("COUNTRY").agg({"PRICE":["sum"]}).T

Unnamed: 0,COUNTRY,bra,can,deu,fra,tur,usa
PRICE,sum,51354,7730,15485,10177,15689,70225


In [84]:
df["SOURCE"].value_counts()

android    2974
ios        2026
Name: SOURCE, dtype: int64

In [85]:
df.groupby("COUNTRY")["PRICE"].mean()

COUNTRY
bra    34.327540
can    33.608696
deu    34.032967
fra    33.587459
tur    34.787140
usa    34.007264
Name: PRICE, dtype: float64

In [86]:
df.groupby("SOURCE")["PRICE"].mean()

SOURCE
android    34.174849
ios        34.069102
Name: PRICE, dtype: float64

In [87]:
df.groupby(["COUNTRY","SOURCE"])["PRICE"].mean()


COUNTRY  SOURCE 
bra      android    34.387029
         ios        34.222222
can      android    33.330709
         ios        33.951456
deu      android    33.869888
         ios        34.268817
fra      android    34.312500
         ios        32.776224
tur      android    36.229437
         ios        33.272727
usa      android    33.760357
         ios        34.371703
Name: PRICE, dtype: float64

 Görev 2:  COUNTRY, SOURCE, SEX, AGE kırılımında ortalama kazançlar nedir?

In [88]:
ort_kazanc = df.groupby(["COUNTRY","SEX","AGE","SOURCE"])["PRICE"].mean()
ort_kazanc

COUNTRY  SEX     AGE  SOURCE 
bra      female  15   android    38.714286
                      ios        36.777778
                 16   android    35.944444
                      ios        33.687500
                 17   android    35.666667
                                   ...    
usa      male    53   ios        34.000000
                 55   ios        29.000000
                 57   android    29.000000
                 59   ios        46.500000
                 65   android    25.666667
Name: PRICE, Length: 348, dtype: float64

### Görev 3:  ÇıktıyıPRICE’agöre sıralayınız

In [89]:
agg_df = ort_kazanc.sort_values(ascending=False)
agg_df

COUNTRY  SEX     AGE  SOURCE 
fra      female  24   android    59.0
usa      male    36   android    59.0
bra      male    46   android    59.0
usa      male    32   ios        54.0
fra      male    20   ios        49.0
                                 ... 
usa      female  38   ios        19.0
tur      male    21   android    19.0
usa      female  30   ios        19.0
bra      female  34   ios        19.0
deu      male    26   android     9.0
Name: PRICE, Length: 348, dtype: float64

Görev 4:  Indeksteyer alan isimleri değişken ismine çeviriniz

In [111]:
agg_df=agg_df.reset_index()

Görev 5:  Age değişkenini kategorik değişkene çeviriniz ve agg_df’eekleyiniz

In [91]:
bins=[0,18,23,30,40,70]
labels=["0_18","19_23","24_30","31_40","41_70"]
df["AGE_CAT"]=pd.cut(df["AGE"],bins=bins,labels=labels)
df.head()

Unnamed: 0,PRICE,SOURCE,SEX,COUNTRY,AGE,AGE_CAT
0,39,android,male,bra,17,0_18
1,39,android,male,bra,17,0_18
2,49,android,male,bra,17,0_18
3,29,android,male,tur,17,0_18
4,49,android,male,tur,17,0_18


Görev 6:  Yeni seviye tabanlı müşterileri (persona) tanımlayınız.

In [99]:
df["customers_level_based"] = ""

conditions = ["COUNTRY", "SOURCE", "SEX", "AGE_CAT"]

for col in conditions:
    df["customers_level_based"] += df[col].astype(str).str.upper() + "_"

df["customers_level_based"] = df["customers_level_based"].str.rstrip("_")
df["customers_level_based"]

0          BRA_ANDROID_MALE_0_18
1          BRA_ANDROID_MALE_0_18
2          BRA_ANDROID_MALE_0_18
3          TUR_ANDROID_MALE_0_18
4          TUR_ANDROID_MALE_0_18
                  ...           
4995    BRA_ANDROID_FEMALE_31_40
4996    BRA_ANDROID_FEMALE_31_40
4997    BRA_ANDROID_FEMALE_31_40
4998    BRA_ANDROID_FEMALE_31_40
4999    BRA_ANDROID_FEMALE_31_40
Name: customers_level_based, Length: 5000, dtype: object

In [113]:
df.groupby("customers_level_based")["PRICE"].mean()


customers_level_based
BRA_ANDROID_FEMALE_0_18     35.439394
BRA_ANDROID_FEMALE_19_23    34.114943
BRA_ANDROID_FEMALE_24_30    34.540541
BRA_ANDROID_FEMALE_31_40    34.696203
BRA_ANDROID_FEMALE_41_70    35.086957
                              ...    
USA_IOS_MALE_0_18           34.054348
USA_IOS_MALE_19_23          35.304348
USA_IOS_MALE_24_30          36.096774
USA_IOS_MALE_31_40          32.333333
USA_IOS_MALE_41_70          35.842105
Name: PRICE, Length: 109, dtype: float64

In [117]:
agg_df["customers_level_based"] = df["customers_level_based"]

In [114]:
agg_df["SEGMENT"] = pd.qcut(agg_df["PRICE"],4,labels=["D","C","B","A"])

In [120]:
agg_df.groupby("SEGMENT").agg({"PRICE":["sum","max","min","mean"]})

Unnamed: 0_level_0,PRICE,PRICE,PRICE,PRICE
Unnamed: 0_level_1,sum,max,min,mean
SEGMENT,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
D,2375.32585,31.105263,9.0,27.302596
C,3128.667165,34.0,31.173913,32.933339
B,2870.329792,37.0,34.185185,35.43617
A,3521.952577,59.0,37.095238,41.434736


In [139]:
new_user= "TR_ADNROID_FEMALE_31_40"
agg_df[agg_df["customers_level_based"]==new_user]["PRICE"].mean()

nan

In [135]:
new_user1="FRA_IOS_FEMALE_31_40"
agg_df[agg_df["customers_level_based"]==new_user1]["PRICE"].mean()

34.77471066589176