## MARKET BASKET ANALYSIS with Apriori



In [20]:
# ![](https://miro.medium.com/max/2880/1*DHfQvlMVBaJCHpYmj1kmCw.png)

In [1]:
import numpy as np 
import pandas as pd
import os
        
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

import mlxtend as ml
print('MXTend Version: %s' % ml.__version__)
print('Pandas Version: %s' % pd.__version__)
print('Numpy Version: %s' % np.__version__)

/kaggle/input/datasets-for-appiori/basket_analysis.csv
MXTend Version: 0.18.0
Pandas Version: 1.2.3
Numpy Version: 1.19.5


In [2]:
df = pd.read_csv('../input/datasets-for-appiori/basket_analysis.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,Apple,Bread,Butter,Cheese,Corn,Dill,Eggs,Ice cream,Kidney Beans,Milk,Nutmeg,Onion,Sugar,Unicorn,Yogurt,chocolate
0,0,False,True,False,False,True,True,False,True,False,False,False,False,True,False,True,True
1,1,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
2,2,True,False,True,False,False,True,False,True,False,True,False,False,False,False,True,True
3,3,False,False,True,True,False,True,False,False,False,True,True,True,False,False,False,False
4,4,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [3]:
df.drop('Unnamed: 0',axis=1,inplace=True)

In [4]:
df.head()

Unnamed: 0,Apple,Bread,Butter,Cheese,Corn,Dill,Eggs,Ice cream,Kidney Beans,Milk,Nutmeg,Onion,Sugar,Unicorn,Yogurt,chocolate
0,False,True,False,False,True,True,False,True,False,False,False,False,True,False,True,True
1,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
2,True,False,True,False,False,True,False,True,False,True,False,False,False,False,True,True
3,False,False,True,True,False,True,False,False,False,True,True,True,False,False,False,False
4,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False


**Apriori Analysis Rules**
* The data set must be tabular or transactional.
* Data must be categorical.
* The directions of the variables in the data must be defined as in, out or both.

* **Note:** After transferring the content, we need to convert the dataset into a tabular structure if it is nested list type. For this, you can use the TransactionEncoder function in the preprocessing class in the mlxtend module. We do not need this operation in this dataset.

<code>from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(df).transform(df)
df = pd.DataFrame(te_ary, columns=te.columns_)</code>

*For more information about TransactionEncoder:http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/*

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 999 entries, 0 to 998
Data columns (total 16 columns):
 #   Column        Non-Null Count  Dtype
---  ------        --------------  -----
 0   Apple         999 non-null    bool 
 1   Bread         999 non-null    bool 
 2   Butter        999 non-null    bool 
 3   Cheese        999 non-null    bool 
 4   Corn          999 non-null    bool 
 5   Dill          999 non-null    bool 
 6   Eggs          999 non-null    bool 
 7   Ice cream     999 non-null    bool 
 8   Kidney Beans  999 non-null    bool 
 9   Milk          999 non-null    bool 
 10  Nutmeg        999 non-null    bool 
 11  Onion         999 non-null    bool 
 12  Sugar         999 non-null    bool 
 13  Unicorn       999 non-null    bool 
 14  Yogurt        999 non-null    bool 
 15  chocolate     999 non-null    bool 
dtypes: bool(16)
memory usage: 15.7 KB


**Model Creation**<br>

<code>from mlxtend.frequent_patterns import apriori</code>

*For more information about Apriori:http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/*

In [6]:
apriori(df, min_support=0.15)[1:25]

Unnamed: 0,support,itemsets
1,0.384384,(1)
2,0.42042,(2)
3,0.404404,(3)
4,0.407407,(4)
5,0.398398,(5)
6,0.384384,(6)
7,0.41041,(7)
8,0.408408,(8)
9,0.405405,(9)
10,0.401401,(10)


The numbers written in the itemset column in the table represent the products (0-15). Product number 0 refers to Apple, product number 1 refers to Bread, product number 14 refers to Yogurt.
<br><br>
 0 Apple 999 non-null bool<br> <br>
 1 Bread 999 non-null bool<br>
 2 Butter 999 non-null bool<br>
...

In [18]:
print("Number of Rules:", len(apriori(df, min_support=0.15)))

Number of Rules: 136


Now, using the use_colnames=True parameter within the apriori algorithm, we switch from items(products) numbers to item(product) names.

In [8]:
apriori(df, min_support=0.15, use_colnames=True)[1:25]

Unnamed: 0,support,itemsets
1,0.384384,(Bread)
2,0.42042,(Butter)
3,0.404404,(Cheese)
4,0.407407,(Corn)
5,0.398398,(Dill)
6,0.384384,(Eggs)
7,0.41041,(Ice cream)
8,0.408408,(Kidney Beans)
9,0.405405,(Milk)
10,0.401401,(Nutmeg)


In the table above, it is seen that single, double and triple itemsets are formed. After we set the min_support value (0.15) and create rules sets, we create the Association Rules table according to the metric we are interested in (confidence, lift, conviction and etc.). Here, we chose Confidence as the metric and its value 0.3 (30%).

In [9]:
frequent_itemsets = apriori(df, min_support=0.15, use_colnames=True)
rules1 = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.30)

**Rule Numbers**

<code>from mlxtend.frequent_patterns import association_rules</code>

*For more information about association_rules: http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/*

In [19]:
print("Number of Rules:", len(rules1))

Number of Rules: 240


**Based on the Confidence metric (Z-A) 10 Rules:**

In [11]:
rules1 = rules1.sort_values(['confidence'], ascending=False)

rules1[1:11]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
66,(Ice cream),(Butter),0.41041,0.42042,0.207207,0.504878,1.200889,0.034662,1.170579
55,(Bread),(Yogurt),0.384384,0.42042,0.193193,0.502604,1.19548,0.03159,1.165228
209,(chocolate),(Milk),0.421421,0.405405,0.211211,0.501188,1.236263,0.040365,1.192021
148,(Dill),(chocolate),0.398398,0.421421,0.199199,0.5,1.186461,0.031306,1.157157
68,(Kidney Beans),(Butter),0.408408,0.42042,0.202202,0.495098,1.177626,0.030499,1.147905
92,(Cheese),(Kidney Beans),0.404404,0.408408,0.2002,0.49505,1.212143,0.035038,1.171583
72,(Nutmeg),(Butter),0.401401,0.42042,0.198198,0.493766,1.174457,0.029441,1.144884
67,(Butter),(Ice cream),0.42042,0.41041,0.207207,0.492857,1.200889,0.034662,1.162571
183,(Ice cream),(chocolate),0.41041,0.421421,0.202202,0.492683,1.169098,0.029246,1.140467
184,(Milk),(Kidney Beans),0.405405,0.408408,0.199199,0.491358,1.203105,0.033628,1.163081


**Comment 1:** If we examine the line with ID information of 67;
* The probability of seeing Ica Cream and Butter items together (support) is 21% (0.207),
* 50% (0.504878) of the people who bought the Ice Cream item (confidence) probably also bought the Butter item,
* The sales (lift) of the Butter item in the shopping carts containing the Ice Cream item has increased by 1.20 times,
* How much higher (leverage) 0.03 is when Ice Cream and Butter items are purchased together than if they are purchased separately,
* We can say that Ice Cream and Butter items are related to each other (conviction) with a value of 1.17.

Now let's add the number of items in the antecedents and consequents parts and see the first 5 lines:

In [12]:
rules1["antecedent_len"] = rules1["antecedents"].apply(lambda x: len(x))
rules1["consequents_len"] = rules1["consequents"].apply(lambda x: len(x))
rules1[1:6]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len,consequents_len
66,(Ice cream),(Butter),0.41041,0.42042,0.207207,0.504878,1.200889,0.034662,1.170579,1,1
55,(Bread),(Yogurt),0.384384,0.42042,0.193193,0.502604,1.19548,0.03159,1.165228,1,1
209,(chocolate),(Milk),0.421421,0.405405,0.211211,0.501188,1.236263,0.040365,1.192021,1,1
148,(Dill),(chocolate),0.398398,0.421421,0.199199,0.5,1.186461,0.031306,1.157157,1,1
68,(Kidney Beans),(Butter),0.408408,0.42042,0.202202,0.495098,1.177626,0.030499,1.147905,1,1


We can do what we did for the confidence metric above for other metrics. For the lift metric as an example:

In [13]:
rules2 = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules2 = rules2.sort_values(['lift'], ascending=False)
rules2[1:6]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
207,(chocolate),(Milk),0.421421,0.405405,0.211211,0.501188,1.236263,0.040365,1.192021
93,(Kidney Beans),(Cheese),0.408408,0.404404,0.2002,0.490196,1.212143,0.035038,1.168284
92,(Cheese),(Kidney Beans),0.404404,0.408408,0.2002,0.49505,1.212143,0.035038,1.171583
208,(Onion),(Nutmeg),0.403403,0.401401,0.195195,0.483871,1.205454,0.033269,1.159785
209,(Nutmeg),(Onion),0.401401,0.403403,0.195195,0.486284,1.205454,0.033269,1.161336


In [14]:
rules2["antecedent_len"] = rules2["antecedents"].apply(lambda x: len(x))
rules2["consequents_len"] = rules2["consequents"].apply(lambda x: len(x))
rules2[1:6]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len,consequents_len
207,(chocolate),(Milk),0.421421,0.405405,0.211211,0.501188,1.236263,0.040365,1.192021,1,1
93,(Kidney Beans),(Cheese),0.408408,0.404404,0.2002,0.490196,1.212143,0.035038,1.168284,1,1
92,(Cheese),(Kidney Beans),0.404404,0.408408,0.2002,0.49505,1.212143,0.035038,1.171583,1,1
208,(Onion),(Nutmeg),0.403403,0.401401,0.195195,0.483871,1.205454,0.033269,1.159785,1,1
209,(Nutmeg),(Onion),0.401401,0.403403,0.195195,0.486284,1.205454,0.033269,1.161336,1,1


**Filtering for Generated Rule Sets**

Filter 1: Let's see the first 10 records with an Antecident item length of 1 and a Confidence value greater than 0.20 and a Lift value greater than 1.

In [15]:
rules1[(rules1['antecedent_len'] >= 1) &
       (rules1['confidence'] >= 0.20) &
       (rules1['lift'] > 1) ].sort_values(['confidence'], ascending=False)[1:10]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len,consequents_len
66,(Ice cream),(Butter),0.41041,0.42042,0.207207,0.504878,1.200889,0.034662,1.170579,1,1
55,(Bread),(Yogurt),0.384384,0.42042,0.193193,0.502604,1.19548,0.03159,1.165228,1,1
209,(chocolate),(Milk),0.421421,0.405405,0.211211,0.501188,1.236263,0.040365,1.192021,1,1
148,(Dill),(chocolate),0.398398,0.421421,0.199199,0.5,1.186461,0.031306,1.157157,1,1
68,(Kidney Beans),(Butter),0.408408,0.42042,0.202202,0.495098,1.177626,0.030499,1.147905,1,1
92,(Cheese),(Kidney Beans),0.404404,0.408408,0.2002,0.49505,1.212143,0.035038,1.171583,1,1
72,(Nutmeg),(Butter),0.401401,0.42042,0.198198,0.493766,1.174457,0.029441,1.144884,1,1
67,(Butter),(Ice cream),0.42042,0.41041,0.207207,0.492857,1.200889,0.034662,1.162571,1,1
183,(Ice cream),(chocolate),0.41041,0.421421,0.202202,0.492683,1.169098,0.029246,1.140467,1,1


Filter 2: Similarly, the first 10 records with the Antecedents item name Bread, sorted by Confidence metric [Z-A]:

In [16]:
rules1[rules1['antecedents'] == {'Bread'}].sort_values(['confidence'], ascending=False)[1:10]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len,consequents_len
57,(Bread),(chocolate),0.384384,0.421421,0.185185,0.481771,1.143204,0.023197,1.116453,1,1
40,(Bread),(Ice cream),0.384384,0.41041,0.181181,0.471354,1.148495,0.023426,1.115283,1,1
30,(Bread),(Butter),0.384384,0.42042,0.18018,0.46875,1.114955,0.018577,1.090973,1,1
51,(Bread),(Sugar),0.384384,0.409409,0.179179,0.466146,1.138581,0.021809,1.106277,1,1
49,(Bread),(Onion),0.384384,0.403403,0.178178,0.463542,1.149077,0.023116,1.112102,1,1
34,(Bread),(Corn),0.384384,0.407407,0.174174,0.453125,1.112216,0.017573,1.083598,1,1
45,(Bread),(Milk),0.384384,0.405405,0.174174,0.453125,1.117708,0.018343,1.087259,1,1
33,(Bread),(Cheese),0.384384,0.404404,0.173173,0.450521,1.114035,0.017726,1.083928,1,1
46,(Bread),(Nutmeg),0.384384,0.401401,0.171171,0.445312,1.109394,0.016879,1.079164,1,1


We export the rules of Association Rules Analysis, which are formed according to the given parameter values, as <code>.json</code>.

In [17]:
rules1.to_json('./rules1.json')
rules2.to_json('./rules2.json')

http://rasbt.github.io/mlxtend/<br>
https://github.com/rasbt/mlxtend<br>
https://pandas.pydata.org/<br>
https://www.veribilimiokulu.com/python-ile-birliktelik-kurallari-analizi-association-rules-analysis-with-python/<br>