## Apriori Situation

Let’s say you are a machine learning engineer working for a supermarket.
Your objective is to explore data and extract the most valuable informations you can find.
Your employer gave you a dataset that looks like this :

In [1]:
import mlxtend
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

In [2]:
dataset = [["Milk", "Onion", "Nutmeg", "Kidney Beans", "Eggs", "Yogurt"],
          ["Dill", "Onion", "Nutmeg","Kidney Beans", "Eggs", "Yogurt"],
          ["Milk", "Apple", "Kidney Beans", "Eggs"],
          ["Milk", "Unicorn", "Corn", "Kidney Beans", "Yogurt"],
          ["Corn", "Onion", "Onion", "Kidney Beans", "Ice cream", "Eggs"]]

We can notice that every inner list represents a transaction or a purchase made by a customer .

In order to use apriori function, we need to transform our dataset into a one-hot-encoded Dataframe.
Transaction Encoder creates a Numpy array from a List and “One hot” encoded it but in True/False not in 0/1.

In [3]:
te = TransactionEncoder()

In [4]:
te_ary = te.fit_transform(dataset)  #Apply one-hot-encoding on our dataset

In [5]:
te_ary

array([[False, False, False,  True, False,  True,  True,  True,  True,
        False,  True],
       [False, False,  True,  True, False,  True, False,  True,  True,
        False,  True],
       [ True, False, False,  True, False,  True,  True, False, False,
        False, False],
       [False,  True, False, False, False,  True,  True, False, False,
         True,  True],
       [False,  True, False,  True,  True,  True, False, False,  True,
        False, False]])

In [9]:
df = pd.DataFrame(te_ary, columns=te.columns_)  #Creating a new DataFrame from our Numpy array

In [10]:
df

Unnamed: 0,Apple,Corn,Dill,Eggs,Ice cream,Kidney Beans,Milk,Nutmeg,Onion,Unicorn,Yogurt
0,False,False,False,True,False,True,True,True,True,False,True
1,False,False,True,True,False,True,False,True,True,False,True
2,True,False,False,True,False,True,True,False,False,False,False
3,False,True,False,False,False,True,True,False,False,True,True
4,False,True,False,True,True,True,False,False,True,False,False


### Support Code
Let’s select itemsets with a minimum of 60% Support

Apriori returns by default the column indice of the item .For example (3) means Eggs.

In [12]:
from mlxtend.frequent_patterns import apriori

apriori(df, min_support=0.6)

Unnamed: 0,support,itemsets
0,0.8,(3)
1,1.0,(5)
2,0.6,(6)
3,0.6,(8)
4,0.6,(10)
5,0.8,"(3, 5)"
6,0.6,"(8, 3)"
7,0.6,"(5, 6)"
8,0.6,"(8, 5)"
9,0.6,"(10, 5)"


In [13]:
#Instead of column indices we can use column names.
frequent_items = apriori(df, min_support=0.6, use_colnames=True)

In [14]:
frequent_items

Unnamed: 0,support,itemsets
0,0.8,(Eggs)
1,1.0,(Kidney Beans)
2,0.6,(Milk)
3,0.6,(Onion)
4,0.6,(Yogurt)
5,0.8,"(Kidney Beans, Eggs)"
6,0.6,"(Onion, Eggs)"
7,0.6,"(Kidney Beans, Milk)"
8,0.6,"(Onion, Kidney Beans)"
9,0.6,"(Kidney Beans, Yogurt)"


#### Confidence Code
In case we want to extract rules based on other metrics like confidence, we can use association_rules from mlxtend.frequent_patterns library.

In [15]:
from mlxtend.frequent_patterns import association_rules
association_rules(frequent_items, metric="confidence", min_threshold=0.7) # associate items

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Kidney Beans),(Eggs),1.0,0.8,0.8,0.8,1.0,0.0,1.0,0.0
1,(Eggs),(Kidney Beans),0.8,1.0,0.8,1.0,1.0,0.0,inf,0.0
2,(Onion),(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,0.5
3,(Eggs),(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,1.0
4,(Milk),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf,0.0
5,(Onion),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf,0.0
6,(Yogurt),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf,0.0
7,"(Onion, Kidney Beans)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,0.5
8,"(Onion, Eggs)",(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf,0.0
9,"(Kidney Beans, Eggs)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,1.0


#### Lift code
Associating based on Lift

In [17]:
from mlxtend.frequent_patterns import association_rules

association_rules(frequent_items, metric="lift", min_threshold=1.25)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Onion),(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,0.5
1,"(Onion, Kidney Beans)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,0.5
2,(Onion),"(Kidney Beans, Eggs)",0.6,0.8,0.6,1.0,1.25,0.12,inf,0.5
