
<div>
<h1>Association rules</h1>

Association rules represent relationships and interdependencies between large sets of data items.

A common example of association rule discovery is "shopping cart analysis". In this process, according to the different items that customers put in their shopping carts, the buying habits and behavior of customers are analyzed, and by identifying the relationship between products, repeating patterns during shopping can be obtained.

There are three important parameters when association rules are discussed:
- Support shows the popularity of a set of items according to the frequency of transactions.
Confidence shows the probability of buying item y if item x is bought. x -> y
- Lift is a combination of the above two parameters.

Let us examine the X→Y rule in the shopping cart problem. Lift specifies the support of the result when calculating the conditional probability of occurrence of {Y} given {X}. In fact, consider Lift a confidence that {X} gives us for having {Y} in the shopping cart. In other words, Lift is the increase in the probability of having {Y} in the shopping cart with the knowledge of the existence of {X} compared to the probability of having {Y} in the shopping cart without knowing the existence of {X}. Mathematically, this is equal to:

Lift(X→Y)=(σ(X∩Y)/σ(X))/s(Y)=support/(s(X)s(Y))=confidence/s(Y)

In cases where {X} actually leads to {Y} in the shopping cart, the lift value will be greater than 1. The value of lift less than 1 indicates that having x in the shopping cart does not increase the chance of y being in the shopping cart, despite the rule that shows a high confidence value. A lift value greater than 1 indicates a high relationship between {Y} and {X}. If the customer has already purchased {X}, the higher the lift value, the higher the chance of purchasing {Y}. Lift is a measure that helps store managers make decisions about product placement in the aisle.

To implement association rules in this exercise, we use the Apriori algorithm, which is one of the most popular and efficient algorithms in this field.

</div>


<div>
<h1>Apriori</h1>

The algorithm works in such a way that a minimum support value is considered and iterations occur with frequent itemsets. If the sets and subsets have a support value lower than the threshold, they are removed. This process continues until there is no deletion possible .

In this part of the exercise, we want to apply the apriori algorithm on the Hypermarket_dataset dataset, which contains people's purchase orders from grocery stores.

</div>


<div>
<h1>Data preparation</h1>

To start the work, you need to prepare the dataset data in the form of a sparse matrix with the purchased products in the column and the purchase order number as the index. For convenience, code the products purchased in each order with numbers 0,1.
  Sample output matrix:

<img src="https://drive.google.com/uc?id=1eD0jan1ZbeYqSklgK--ks7oeY-MyTA3p"></img>

Of course and inverted index would be more efficient, but for demonstration using a sparse matrix is acceptable

</div>

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
 
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules


In [2]:
from google.colab import drive
from mlxtend.preprocessing import TransactionEncoder
drive.mount('/content/gdrive')
dataset_path=r'/content/gdrive/MyDrive/transactions/Hypermarket_dataset.csv'
df = pd.read_csv(dataset_path)
df=df.to_numpy()
dic=dict()
for i in range(len(df)):
    if df[i][0] not in dic.keys(): dic[df[i][0]]=[]
    dic[df[i][0]].append(df[i][2])
dic=dic.values()
te = TransactionEncoder()
te_ary = te.fit(dic).transform(dic)
df = pd.DataFrame(te_ary, columns=te.columns_)
display(df)


Mounted at /content/gdrive


Unnamed: 0,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3893,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3894,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
3895,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3896,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False





<div>
<h1>Identifying recurring patterns</h1>

By applying the apriori algorithm and for the value of min_support = 0.07, we generate all the recurring patterns.

</div>

In [3]:
patterns=apriori(df, min_support=0.07,use_colnames=True)
display(patterns)

Unnamed: 0,support,itemsets
0,0.078502,(UHT-milk)
1,0.119548,(beef)
2,0.079785,(berries)
3,0.158799,(bottled beer)
4,0.213699,(bottled water)
...,...,...
78,0.075680,"(tropical fruit, yogurt)"
79,0.079785,"(whipped/sour cream, whole milk)"
80,0.150590,"(whole milk, yogurt)"
81,0.082093,"(other vegetables, whole milk, rolls/buns)"



<div>
<h1>Mining of associative rules</h1>
Below is a function that takes the two inputs of confidence and lift and returns the rules that have lift and support higher than the given values. We display the outputs for two different sets of input parameters. The complete outputs can be seen in the files rule1.csv and rule2.csv respectively.
</div>

In [5]:
def associationRulesWithLiftAndSup(patterns,lift,sup):
    r1=association_rules(patterns, metric='support', min_threshold=sup)
    r2=association_rules(patterns, metric='lift', min_threshold=lift)
    print()
    rules=pd.concat([r1,r2])
    rules.drop_duplicates()
    return rules
rules=associationRulesWithLiftAndSup(patterns,0.2,0.2)
display(rules)




Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(whole milk),(bottled beer),0.458184,0.158799,0.085428,0.186450,1.174124,0.012669,1.033988
1,(bottled beer),(whole milk),0.158799,0.458184,0.085428,0.537964,1.174124,0.012669,1.172672
2,(other vegetables),(bottled water),0.376603,0.213699,0.093894,0.249319,1.166680,0.013414,1.047450
3,(bottled water),(other vegetables),0.213699,0.376603,0.093894,0.439376,1.166680,0.013414,1.111969
4,(rolls/buns),(bottled water),0.349666,0.213699,0.079271,0.226706,1.060863,0.004548,1.016820
...,...,...,...,...,...,...,...,...,...
93,"(other vegetables, yogurt)",(whole milk),0.120318,0.458184,0.071832,0.597015,1.303003,0.016704,1.344507
94,"(whole milk, yogurt)",(other vegetables),0.150590,0.376603,0.071832,0.477002,1.266589,0.015119,1.191967
95,(other vegetables),"(whole milk, yogurt)",0.376603,0.150590,0.071832,0.190736,1.266589,0.015119,1.049608
96,(whole milk),"(other vegetables, yogurt)",0.458184,0.120318,0.071832,0.156775,1.303003,0.016704,1.043235


In [14]:
rules=associationRulesWithLiftAndSup(patterns,1.1,1.1)
display(rules)




Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(whole milk),(bottled beer),0.458184,0.158799,0.085428,0.186450,1.174124,0.012669,1.033988
1,(bottled beer),(whole milk),0.158799,0.458184,0.085428,0.537964,1.174124,0.012669,1.172672
2,(other vegetables),(bottled water),0.376603,0.213699,0.093894,0.249319,1.166680,0.013414,1.047450
3,(bottled water),(other vegetables),0.213699,0.376603,0.093894,0.439376,1.166680,0.013414,1.111969
4,(soda),(bottled water),0.313494,0.213699,0.076193,0.243044,1.137318,0.009199,1.038767
...,...,...,...,...,...,...,...,...,...
67,"(other vegetables, yogurt)",(whole milk),0.120318,0.458184,0.071832,0.597015,1.303003,0.016704,1.344507
68,"(whole milk, yogurt)",(other vegetables),0.150590,0.376603,0.071832,0.477002,1.266589,0.015119,1.191967
69,(other vegetables),"(whole milk, yogurt)",0.376603,0.150590,0.071832,0.190736,1.266589,0.015119,1.049608
70,(whole milk),"(other vegetables, yogurt)",0.458184,0.120318,0.071832,0.156775,1.303003,0.016704,1.043235
