# Association Rules

Association rules represent relationships and interdependencies between large sets of data items.
A typical example of association rule discovery is "shopping cart analysis."<br> In this process, according to the various items that customers put in their shopping carts, customers' buying habits and behavior are analyzed, and by identifying the relationship between products, repeating patterns during shopping can be obtained.
<br>

Three important parameters:
* Support shows the popularity of a set of items according to the frequency of transactions.
* Confidence shows the probability of buying item y if item x is bought. x -> y
* Lift is a combination of the above two parameters.<br>

To implement association rules in this project, we use the Apriori algorithm, one of the field's most popular and efficient algorithms.


## Apriori Algorithm 
The algorithm works so that a minimum support value is considered, and repetitions occur with frequent itemsets.<br> They are removed if the sets and subsets have a support value lower than the threshold. This process continues until there is no possibility of deletion.




## Data Prepration

In [5]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
 
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.preprocessing import TransactionEncoder
import warnings
warnings.filterwarnings("ignore")

In [2]:
data_set = pd.read_csv("../data/Hypermarket_dataset.csv")
data_set = data_set.drop('Date', axis=1)
data_set.head()

Unnamed: 0,Member_number,itemDescription
0,1808,tropical fruit
1,2552,whole milk
2,2300,pip fruit
3,1187,other vegetables
4,3037,whole milk


In [3]:
unified_data_set = data_set.groupby('Member_number').agg(lambda x: x.tolist())
transactions_list =  unified_data_set['itemDescription'].tolist()
encoder = TransactionEncoder()
transactions = encoder.fit(transactions_list).transform(transactions_list)
transaction_df = pd.DataFrame(transactions, columns=encoder.columns_)
transaction_df = transaction_df.replace([False, True], [0,1])
transaction_df.head()

Unnamed: 0,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,1,0
1,0,0,0,0,0,0,0,0,1,0,...,0,0,0,1,0,1,0,1,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


## Identifying Recurring Patterns

Applying the apriori algorithm and for min_support = 0.07, generate all the repeating patterns.





In [6]:
frequent_itemsets  = apriori(transaction_df, min_support=0.07, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.078502,(UHT-milk)
1,0.119548,(beef)
2,0.079785,(berries)
3,0.158799,(bottled beer)
4,0.213699,(bottled water)
...,...,...
78,0.075680,"(tropical fruit, yogurt)"
79,0.079785,"(whole milk, whipped/sour cream)"
80,0.150590,"(whole milk, yogurt)"
81,0.082093,"(whole milk, other vegetables, rolls/buns)"



## Extracting Association Rules



Here we wrote a function that takes two inputs, confidence and lift, and displays the resulting associative rules in the output.



In [48]:
def ARM(confidence, lift):
    global frequent_itemsets
    filterd_by_confidence = association_rules(frequent_itemsets, metric="confidence", min_threshold=confidence)
    rules = filterd_by_confidence[filterd_by_confidence['lift'] >lift ]
    return rules
    

In [49]:
ARM(0.45, 1.1)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(bottled beer),(whole milk),0.158799,0.458184,0.085428,0.537964,1.174124,0.012669,1.172672
1,(bottled water),(whole milk),0.213699,0.458184,0.112365,0.52581,1.147597,0.014452,1.142615
2,(canned beer),(whole milk),0.165213,0.458184,0.087224,0.52795,1.152268,0.011526,1.147795
4,(domestic eggs),(whole milk),0.133145,0.458184,0.070292,0.527938,1.152242,0.009287,1.147766
5,(newspapers),(whole milk),0.139815,0.458184,0.072345,0.517431,1.12931,0.008284,1.122775
6,(sausage),(other vegetables),0.206003,0.376603,0.092868,0.450809,1.19704,0.015287,1.135119
7,(other vegetables),(whole milk),0.376603,0.458184,0.19138,0.508174,1.109106,0.018827,1.101643
8,(pastry),(whole milk),0.177527,0.458184,0.091072,0.513006,1.119651,0.009732,1.112572
9,(pip fruit),(whole milk),0.1706,0.458184,0.086968,0.509774,1.112598,0.008801,1.105239
10,(rolls/buns),(whole milk),0.349666,0.458184,0.178553,0.510638,1.114484,0.018342,1.10719


In [50]:
ARM(0.5, 1.5)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction


In [51]:
ARM(0.5, 0.8)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(bottled beer),(whole milk),0.158799,0.458184,0.085428,0.537964,1.174124,0.012669,1.172672
1,(bottled water),(whole milk),0.213699,0.458184,0.112365,0.52581,1.147597,0.014452,1.142615
2,(canned beer),(whole milk),0.165213,0.458184,0.087224,0.52795,1.152268,0.011526,1.147795
3,(domestic eggs),(whole milk),0.133145,0.458184,0.070292,0.527938,1.152242,0.009287,1.147766
4,(newspapers),(whole milk),0.139815,0.458184,0.072345,0.517431,1.12931,0.008284,1.122775
5,(other vegetables),(whole milk),0.376603,0.458184,0.19138,0.508174,1.109106,0.018827,1.101643
6,(pastry),(whole milk),0.177527,0.458184,0.091072,0.513006,1.119651,0.009732,1.112572
7,(pip fruit),(whole milk),0.1706,0.458184,0.086968,0.509774,1.112598,0.008801,1.105239
8,(rolls/buns),(whole milk),0.349666,0.458184,0.178553,0.510638,1.114484,0.018342,1.10719
9,(sausage),(whole milk),0.206003,0.458184,0.106978,0.519303,1.133394,0.012591,1.127146
