## MARKET BASKET ANALYSIS with Apriori

![](https://miro.medium.com/max/2880/1*DHfQvlMVBaJCHpYmj1kmCw.png)

In [1]:
import numpy as np 
import pandas as pd
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

import mlxtend as ml
print('MXTend Version: %s' % ml.__version__)
print('Pandas Version: %s' % pd.__version__)
print('Numpy Version: %s' % np.__version__)

/kaggle/input/datasets-for-appiori/basket_analysis.csv
MXTend Version: 0.18.0
Pandas Version: 1.2.3
Numpy Version: 1.19.5


In [2]:
df = pd.read_csv('../input/datasets-for-appiori/basket_analysis.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,Apple,Bread,Butter,Cheese,Corn,Dill,Eggs,Ice cream,Kidney Beans,Milk,Nutmeg,Onion,Sugar,Unicorn,Yogurt,chocolate
0,0,False,True,False,False,True,True,False,True,False,False,False,False,True,False,True,True
1,1,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
2,2,True,False,True,False,False,True,False,True,False,True,False,False,False,False,True,True
3,3,False,False,True,True,False,True,False,False,False,True,True,True,False,False,False,False
4,4,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [3]:
df.drop('Unnamed: 0',axis=1,inplace=True)

In [4]:
df.head()

Unnamed: 0,Apple,Bread,Butter,Cheese,Corn,Dill,Eggs,Ice cream,Kidney Beans,Milk,Nutmeg,Onion,Sugar,Unicorn,Yogurt,chocolate
0,False,True,False,False,True,True,False,True,False,False,False,False,True,False,True,True
1,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
2,True,False,True,False,False,True,False,True,False,True,False,False,False,False,True,True
3,False,False,True,True,False,True,False,False,False,True,True,True,False,False,False,False
4,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False


**Apriori Analizi Kuralları**
* Veri seti tabular veya transactional yapıda olmalıdır
* Veriler kategorik olmalıdır.
* Verideki değişkenlerin yönleri in,out veya both olarak tanımlanmalıdır.

* **Not:** Veri setini içerik aktardıktan sonra nested list (iç içe liste) tipinde ise tabular bir yapıya çevirmemiz gerekiyor. Bunun için mlxtend modülü içerisinde yer alan preprocessing sınıfı içerisinde TransactionEncoder fonksiyonunu kullanabilirsiniz. Bizim bu veri setinde bu işleme ihtiyacımız yok.

<code>from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(df).transform(df)
df = pd.DataFrame(te_ary, columns=te.columns_)</code>

*TransactionEncoder hakkında daha fazla bilgi için: http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/*

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 999 entries, 0 to 998
Data columns (total 16 columns):
 #   Column        Non-Null Count  Dtype
---  ------        --------------  -----
 0   Apple         999 non-null    bool 
 1   Bread         999 non-null    bool 
 2   Butter        999 non-null    bool 
 3   Cheese        999 non-null    bool 
 4   Corn          999 non-null    bool 
 5   Dill          999 non-null    bool 
 6   Eggs          999 non-null    bool 
 7   Ice cream     999 non-null    bool 
 8   Kidney Beans  999 non-null    bool 
 9   Milk          999 non-null    bool 
 10  Nutmeg        999 non-null    bool 
 11  Onion         999 non-null    bool 
 12  Sugar         999 non-null    bool 
 13  Unicorn       999 non-null    bool 
 14  Yogurt        999 non-null    bool 
 15  chocolate     999 non-null    bool 
dtypes: bool(16)
memory usage: 15.7 KB


**Model Oluşturma**<br>

<code>from mlxtend.frequent_patterns import apriori</code>

*Apriori hakkında daha fazla bilgi için : http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/*

In [6]:
apriori(df, min_support=0.15)[1:25]

Unnamed: 0,support,itemsets
1,0.384384,(1)
2,0.42042,(2)
3,0.404404,(3)
4,0.407407,(4)
5,0.398398,(5)
6,0.384384,(6)
7,0.41041,(7)
8,0.408408,(8)
9,0.405405,(9)
10,0.401401,(10)


Tablodaki itemset kolonu içerisinde yazan rakamları ürünleri (0-15) ifade etmektedir. 0 no’lu ürün Apple, 1 no’lu ürün Bread olarak 14 no’lu ürün ise Yogurt ifade etmektedir.
<br><br>
 0   Apple         999 non-null    bool<br> <br>
 1   Bread         999 non-null    bool<br>  
 2   Butter        999 non-null    bool<br>  
...

In [7]:
print("Kural Sayısı:", len(apriori(df, min_support=0.15)))

Kural Sayısı: 136


Şimdi de apriori algoritması içerisinde use_colnames=True parametresini kullanarak items(ürünler) numaralarından item(ürün) isimlerine geçiş yapıyoruz.

In [8]:
apriori(df, min_support=0.15, use_colnames=True)[1:25]

Unnamed: 0,support,itemsets
1,0.384384,(Bread)
2,0.42042,(Butter)
3,0.404404,(Cheese)
4,0.407407,(Corn)
5,0.398398,(Dill)
6,0.384384,(Eggs)
7,0.41041,(Ice cream)
8,0.408408,(Kidney Beans)
9,0.405405,(Milk)
10,0.401401,(Nutmeg)


Yukarıdaki tabloda tekli, ikili ve üçlü itemset’lerin oluştuğu görülmektedir. min_support değeri (0.15) değerini set edip kuralları rule setleri oluşturduktan sonra ilgilendiğimiz metriğe göre (confidence, lift, conviction ve vd.) Association Rules tablosunu oluşturuyoruz. Burada metric olarak Confidence ve değerini 0.3 (%30) seçtik.

In [9]:
frequent_itemsets = apriori(df, min_support=0.15, use_colnames=True)
rules1 = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.30)

**Kural Sayıları**

<code>from mlxtend.frequent_patterns import association_rules</code>

*Association_rules hakkında daha fazla bilgi için: http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/*

In [10]:
print("Oluşan Kural Sayısı:", len(rules1))

Oluşan Kural Sayısı: 240


**Confidence metriğine göre (Z-A) 10 Kural:**

In [11]:
rules1 = rules1.sort_values(['confidence'], ascending=False)

rules1[1:11]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
67,(Ice cream),(Butter),0.41041,0.42042,0.207207,0.504878,1.200889,0.034662,1.170579
54,(Bread),(Yogurt),0.384384,0.42042,0.193193,0.502604,1.19548,0.03159,1.165228
208,(chocolate),(Milk),0.421421,0.405405,0.211211,0.501188,1.236263,0.040365,1.192021
149,(Dill),(chocolate),0.398398,0.421421,0.199199,0.5,1.186461,0.031306,1.157157
69,(Kidney Beans),(Butter),0.408408,0.42042,0.202202,0.495098,1.177626,0.030499,1.147905
93,(Cheese),(Kidney Beans),0.404404,0.408408,0.2002,0.49505,1.212143,0.035038,1.171583
73,(Nutmeg),(Butter),0.401401,0.42042,0.198198,0.493766,1.174457,0.029441,1.144884
66,(Butter),(Ice cream),0.42042,0.41041,0.207207,0.492857,1.200889,0.034662,1.162571
182,(Ice cream),(chocolate),0.41041,0.421421,0.202202,0.492683,1.169098,0.029246,1.140467
185,(Milk),(Kidney Beans),0.405405,0.408408,0.199199,0.491358,1.203105,0.033628,1.163081


**Yorum 1:** ID bilgisi 67 olan satırı inceleyecek olursak;
* Ica Cream ve Butter item’larının birlikte görülme olasılığı (support) %21 (0.207) olduğunu,
* Ice Cream item’ının satın alan kişilerin (confidence)  %50’inin (0.504878) olasılıkla  Butter item’ınıda satın aldığını,
* Ice Cream item’ının yer aldığı alışveriş sepetlerin de Butter item’ının satışı (lift) 1,20 kat arttığı,
* Ice Cream ve Butter item’larının birlikte satın alınmalarının ile birbirlerinden bağımsız olarak satın alınmalarından ne kadar fazla (leverage) 0.03 olduğunu,
* Ice Cream ve Butter item’larının birbirleri ile ilişkili (conviction) 1.17 değeri ile olduğunu  söyleyebiliriz.

Şimdi de antecedents ve consequents kısımlarında bulunan itemsleri sayılarını toplayalım ve ilk 5 satırı görelim:

In [12]:
rules1["antecedent_len"] = rules1["antecedents"].apply(lambda x: len(x))
rules1["consequents_len"] = rules1["consequents"].apply(lambda x: len(x))
rules1[1:6]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len,consequents_len
67,(Ice cream),(Butter),0.41041,0.42042,0.207207,0.504878,1.200889,0.034662,1.170579,1,1
54,(Bread),(Yogurt),0.384384,0.42042,0.193193,0.502604,1.19548,0.03159,1.165228,1,1
208,(chocolate),(Milk),0.421421,0.405405,0.211211,0.501188,1.236263,0.040365,1.192021,1,1
149,(Dill),(chocolate),0.398398,0.421421,0.199199,0.5,1.186461,0.031306,1.157157,1,1
69,(Kidney Beans),(Butter),0.408408,0.42042,0.202202,0.495098,1.177626,0.030499,1.147905,1,1


Yukarıdaki confidence metriği için yaptıklarımızı diğer metrikler içinde yapabiliriz. Örnek olması adına lift metriği için:

In [13]:
rules2 = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules2 = rules2.sort_values(['lift'], ascending=False)
rules2[1:6]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
206,(chocolate),(Milk),0.421421,0.405405,0.211211,0.501188,1.236263,0.040365,1.192021
93,(Cheese),(Kidney Beans),0.404404,0.408408,0.2002,0.49505,1.212143,0.035038,1.171583
92,(Kidney Beans),(Cheese),0.408408,0.404404,0.2002,0.490196,1.212143,0.035038,1.168284
208,(Onion),(Nutmeg),0.403403,0.401401,0.195195,0.483871,1.205454,0.033269,1.159785
209,(Nutmeg),(Onion),0.401401,0.403403,0.195195,0.486284,1.205454,0.033269,1.161336


In [14]:
rules2["antecedent_len"] = rules2["antecedents"].apply(lambda x: len(x))
rules2["consequents_len"] = rules2["consequents"].apply(lambda x: len(x))
rules2[1:6]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len,consequents_len
206,(chocolate),(Milk),0.421421,0.405405,0.211211,0.501188,1.236263,0.040365,1.192021,1,1
93,(Cheese),(Kidney Beans),0.404404,0.408408,0.2002,0.49505,1.212143,0.035038,1.171583,1,1
92,(Kidney Beans),(Cheese),0.408408,0.404404,0.2002,0.490196,1.212143,0.035038,1.168284,1,1
208,(Onion),(Nutmeg),0.403403,0.401401,0.195195,0.483871,1.205454,0.033269,1.159785,1,1
209,(Nutmeg),(Onion),0.401401,0.403403,0.195195,0.486284,1.205454,0.033269,1.161336,1,1


**Oluşan Kural Setleri için Filtreleme**

Filtre 1: Antecedent item uzunluğu 1 olan ve Confidence değeri 0.20’ye büyük eşit olan ve Lift değeri 1‘den büyük olan ilk 10 kayıtı görelim.

In [15]:
rules1[(rules1['antecedent_len'] >= 1) &
       (rules1['confidence'] >= 0.20) &
       (rules1['lift'] > 1) ].sort_values(['confidence'], ascending=False)[1:10]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len,consequents_len
67,(Ice cream),(Butter),0.41041,0.42042,0.207207,0.504878,1.200889,0.034662,1.170579,1,1
54,(Bread),(Yogurt),0.384384,0.42042,0.193193,0.502604,1.19548,0.03159,1.165228,1,1
208,(chocolate),(Milk),0.421421,0.405405,0.211211,0.501188,1.236263,0.040365,1.192021,1,1
149,(Dill),(chocolate),0.398398,0.421421,0.199199,0.5,1.186461,0.031306,1.157157,1,1
69,(Kidney Beans),(Butter),0.408408,0.42042,0.202202,0.495098,1.177626,0.030499,1.147905,1,1
93,(Cheese),(Kidney Beans),0.404404,0.408408,0.2002,0.49505,1.212143,0.035038,1.171583,1,1
73,(Nutmeg),(Butter),0.401401,0.42042,0.198198,0.493766,1.174457,0.029441,1.144884,1,1
66,(Butter),(Ice cream),0.42042,0.41041,0.207207,0.492857,1.200889,0.034662,1.162571,1,1
182,(Ice cream),(chocolate),0.41041,0.421421,0.202202,0.492683,1.169098,0.029246,1.140467,1,1


Filtre 2: Benzer şekilde Antecedents item adı Bread olan ve Confidence metriğine göre [Z-A] sıralanmış ilk 10 kayıt:

In [16]:
rules1[rules1['antecedents'] == {'Bread'}].sort_values(['confidence'], ascending=False)[1:10]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len,consequents_len
56,(Bread),(chocolate),0.384384,0.421421,0.185185,0.481771,1.143204,0.023197,1.116453,1,1
40,(Bread),(Ice cream),0.384384,0.41041,0.181181,0.471354,1.148495,0.023426,1.115283,1,1
31,(Bread),(Butter),0.384384,0.42042,0.18018,0.46875,1.114955,0.018577,1.090973,1,1
50,(Bread),(Sugar),0.384384,0.409409,0.179179,0.466146,1.138581,0.021809,1.106277,1,1
48,(Bread),(Onion),0.384384,0.403403,0.178178,0.463542,1.149077,0.023116,1.112102,1,1
44,(Bread),(Milk),0.384384,0.405405,0.174174,0.453125,1.117708,0.018343,1.087259,1,1
34,(Bread),(Corn),0.384384,0.407407,0.174174,0.453125,1.112216,0.017573,1.083598,1,1
32,(Bread),(Cheese),0.384384,0.404404,0.173173,0.450521,1.114035,0.017726,1.083928,1,1
46,(Bread),(Nutmeg),0.384384,0.401401,0.171171,0.445312,1.109394,0.016879,1.079164,1,1


Verilen parametre değerlerine göre oluşan Birliktelik Kuralları Analizine ait oluşan kuralları <code>.json</code> olarak olarak export ediyoruz.

In [17]:
rules1.to_json('./rules1.json')
rules2.to_json('./rules2.json')

http://rasbt.github.io/mlxtend/<br>
https://github.com/rasbt/mlxtend<br>
https://pandas.pydata.org/<br>
https://www.veribilimiokulu.com/python-ile-birliktelik-kurallari-analizi-association-rules-analysis-with-python/<br>