# Apriori vs FPGrowth

Apriori is a popular algorithm for extracting frequent itemsets with applications in association rule learning. The apriori algorithm has been designed to operate on databases containing transactions, such as purchases by customers of a store

http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/

FP-Growth is an frequent pattern mining algorithm that does not require candidate generation. Internally, it uses a so-called FP-tree (frequent pattern tree) datastrucure without generating the candidate sets explicitely, which makes is particularly attractive for large datasets.

http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/fpgrowth/

In [1]:
#!pip install mlxtend

### Apriori

In [2]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.frequent_patterns import fpgrowth

In [3]:
dataset = [['Milk','Onion', 'Bread', 'Kidney Beans','Eggs','Yoghurt'],
           ['Fish','Onion','Bread','Kidney Beans','Eggs','Yoghurt'],
           ['Milk', 'Apples', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Sugar', 'Tea Leaves', 'Kidney Beans', 'Yoghurt'],
           ['Tea Leaves','Onion','Kidney Beans', 'Ice cream', 'Eggs'],
]

In [4]:
tr = TransactionEncoder()
tr_arr = tr.fit(dataset).transform(dataset)
df = pd.DataFrame(tr_arr, columns=tr.columns_)
df

Unnamed: 0,Apples,Bread,Eggs,Fish,Ice cream,Kidney Beans,Milk,Onion,Sugar,Tea Leaves,Yoghurt
0,False,True,True,False,False,True,True,True,False,False,True
1,False,True,True,True,False,True,False,True,False,False,True
2,True,False,True,False,False,True,True,False,False,False,False
3,False,False,False,False,False,True,True,False,True,True,True
4,False,False,True,False,True,True,False,True,False,True,False


In [5]:
from mlxtend.frequent_patterns import apriori
frequent_itemsets1 = apriori(df, min_support = 0.6, use_colnames = True)
frequent_itemsets1['length'] = frequent_itemsets1['itemsets'].apply(lambda x: len(x))
frequent_itemsets1

Unnamed: 0,support,itemsets,length
0,0.8,(Eggs),1
1,1.0,(Kidney Beans),1
2,0.6,(Milk),1
3,0.6,(Onion),1
4,0.6,(Yoghurt),1
5,0.8,"(Kidney Beans, Eggs)",2
6,0.6,"(Eggs, Onion)",2
7,0.6,"(Milk, Kidney Beans)",2
8,0.6,"(Kidney Beans, Onion)",2
9,0.6,"(Kidney Beans, Yoghurt)",2


In [6]:
frequent_itemsets1[(frequent_itemsets1['length'] >= 2) &
                   (frequent_itemsets1['support'] >= 0.8) ]

Unnamed: 0,support,itemsets,length
5,0.8,"(Kidney Beans, Eggs)",2


### FPGrowth

Since FP-Growth doesn't require creating candidate sets explicitly, it can be magnitudes faster than the alternative Apriori algorithm. 

For instance, the following cells compare the performance of the Apriori algorithm to the performance of FP-Growth -- even in this very simple toy dataset scenario, FP-Growth is about 5 times faster.

In [7]:
frequent_itemsets2 = fpgrowth(df, min_support = 0.6, use_colnames = True)
frequent_itemsets2['length'] = frequent_itemsets2['itemsets'].apply(lambda x: len(x))
frequent_itemsets2

Unnamed: 0,support,itemsets,length
0,1.0,(Kidney Beans),1
1,0.8,(Eggs),1
2,0.6,(Yoghurt),1
3,0.6,(Onion),1
4,0.6,(Milk),1
5,0.8,"(Kidney Beans, Eggs)",2
6,0.6,"(Kidney Beans, Yoghurt)",2
7,0.6,"(Eggs, Onion)",2
8,0.6,"(Kidney Beans, Onion)",2
9,0.6,"(Kidney Beans, Eggs, Onion)",3


In [8]:
frequent_itemsets2[(frequent_itemsets2['length'] == 2) &
                   (frequent_itemsets2['support'] >= 0.8) ]

Unnamed: 0,support,itemsets,length
5,0.8,"(Kidney Beans, Eggs)",2


### Performance

In [9]:
%timeit -n 100 -r 10 apriori(df, min_support=0.6)

937 µs ± 20.4 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)


In [10]:
%timeit -n 100 -r 10 fpgrowth(df, min_support=0.6)

310 µs ± 12.3 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)
