# Theory:

**What is apriori algorithm?**

Apriori Algorithm is a Machine Learning algorithm that is used to gain insight into the structured relationships between different items involved. It’s a data mining technique that is used for mining frequent itemsets and relevant association rules.

Example: Recommending products based on your purchased items. You can see this in different e-commerce websites. (Recommendation system)


**Things that we need to know before implementation:**

**Association rule:** 
It identifies frequent patterns and associations(relations) among a set of items. Ex: If you go to buy a keyboard, you might also get a mouse. So place them aside in your market to get more profit.

**Support:** 
Support refers to the default popularity of an item and can be calculated by finding the number of transactions containing a particular item divided by total number of transactions.
```
Support (Keyboard) = (Transactions containing (Keyboard)) / (Total Transactions))

```
**Confidence:**
Confidence refers to the likelihood that an item B (mouse) is also bought if item A (keyboard) is bought. Like our keyboard and mouse example. 

`Confidence(Keyboard→Mouse) = (Transactions containing both (Keyboard and Mouse))/(Transactions containing Keyboard)`

**Lift:**
Lift(Keyboard -&gt; Mouse) refers to the increase in the ratio of sale of Mouse when the Keyboard is sold. 
Lift(Keyboard -&gt; Mouse) can be calculated by dividing Confidence(Keyboard→Mouse) divided by Support(Mouse). 

```
Lift(Keyboard -&gt; Mouse) = (Confidence(Keyboard→Mouse))   /  (Support (Mouse))

```


# Code 

In [1]:
import numpy as np 
import pandas as pd 
from mlxtend.frequent_patterns import apriori, association_rules

In [2]:
data = pd.read_csv('GroceryStoreDataSet.csv',names=['Products'],header=None)
# data = pd.read_csv('GroceryStoreDataSet.csv', header=None)

In [3]:
data.head()

Unnamed: 0,Products
0,"MILK,BREAD,BISCUIT"
1,"BREAD,MILK,BISCUIT,CORNFLAKES"
2,"BREAD,TEA,BOURNVITA"
3,"JAM,MAGGI,BREAD,MILK"
4,"MAGGI,TEA,BISCUIT"


In [4]:
data.values

array([['MILK,BREAD,BISCUIT'],
       ['BREAD,MILK,BISCUIT,CORNFLAKES'],
       ['BREAD,TEA,BOURNVITA'],
       ['JAM,MAGGI,BREAD,MILK'],
       ['MAGGI,TEA,BISCUIT'],
       ['BREAD,TEA,BOURNVITA'],
       ['MAGGI,TEA,CORNFLAKES'],
       ['MAGGI,BREAD,TEA,BISCUIT'],
       ['JAM,MAGGI,BREAD,TEA'],
       ['BREAD,MILK'],
       ['COFFEE,COCK,BISCUIT,CORNFLAKES'],
       ['COFFEE,COCK,BISCUIT,CORNFLAKES'],
       ['COFFEE,SUGER,BOURNVITA'],
       ['BREAD,COFFEE,COCK'],
       ['BREAD,SUGER,BISCUIT'],
       ['COFFEE,SUGER,CORNFLAKES'],
       ['BREAD,SUGER,BOURNVITA'],
       ['BREAD,COFFEE,SUGER'],
       ['BREAD,COFFEE,SUGER'],
       ['TEA,MILK,COFFEE,CORNFLAKES']], dtype=object)

In [5]:
data = list(data["Products"].apply(lambda x:x.split(',')))
data 

[['MILK', 'BREAD', 'BISCUIT'],
 ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['JAM', 'MAGGI', 'BREAD', 'MILK'],
 ['MAGGI', 'TEA', 'BISCUIT'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['MAGGI', 'TEA', 'CORNFLAKES'],
 ['MAGGI', 'BREAD', 'TEA', 'BISCUIT'],
 ['JAM', 'MAGGI', 'BREAD', 'TEA'],
 ['BREAD', 'MILK'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'COCK'],
 ['BREAD', 'SUGER', 'BISCUIT'],
 ['COFFEE', 'SUGER', 'CORNFLAKES'],
 ['BREAD', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['TEA', 'MILK', 'COFFEE', 'CORNFLAKES']]

In [6]:
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()

In [7]:
te_data = te.fit(data).transform(data)

In [8]:
te_data

array([[ True, False,  True, False, False, False, False, False,  True,
        False, False],
       [ True, False,  True, False, False,  True, False, False,  True,
        False, False],
       [False,  True,  True, False, False, False, False, False, False,
        False,  True],
       [False, False,  True, False, False, False,  True,  True,  True,
        False, False],
       [ True, False, False, False, False, False, False,  True, False,
        False,  True],
       [False,  True,  True, False, False, False, False, False, False,
        False,  True],
       [False, False, False, False, False,  True, False,  True, False,
        False,  True],
       [ True, False,  True, False, False, False, False,  True, False,
        False,  True],
       [False, False,  True, False, False, False,  True,  True, False,
        False,  True],
       [False, False,  True, False, False, False, False, False,  True,
        False, False],
       [ True, False, False,  True,  True,  True, False, Fal

In [9]:
te.columns_

['BISCUIT',
 'BOURNVITA',
 'BREAD',
 'COCK',
 'COFFEE',
 'CORNFLAKES',
 'JAM',
 'MAGGI',
 'MILK',
 'SUGER',
 'TEA']

In [10]:
df = pd.DataFrame(te_data,columns=te.columns_)
df.head()

Unnamed: 0,BISCUIT,BOURNVITA,BREAD,COCK,COFFEE,CORNFLAKES,JAM,MAGGI,MILK,SUGER,TEA
0,True,False,True,False,False,False,False,False,True,False,False
1,True,False,True,False,False,True,False,False,True,False,False
2,False,True,True,False,False,False,False,False,False,False,True
3,False,False,True,False,False,False,True,True,True,False,False
4,True,False,False,False,False,False,False,True,False,False,True


In [11]:
frq_items = apriori(df, min_support = 0.1, use_colnames = True)

frq_items

Unnamed: 0,support,itemsets
0,0.35,(BISCUIT)
1,0.2,(BOURNVITA)
2,0.65,(BREAD)
3,0.15,(COCK)
4,0.4,(COFFEE)
5,0.3,(CORNFLAKES)
6,0.1,(JAM)
7,0.25,(MAGGI)
8,0.25,(MILK)
9,0.3,(SUGER)


frq_items = apriori(df, min_support = .01, use_colnames = True)

frq_items

In [12]:
rules = association_rules(frq_items, metric ="lift", min_threshold = 1) 
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(BISCUIT),(COCK),0.35,0.15,0.10,0.285714,1.904762,0.0475,1.190
1,(COCK),(BISCUIT),0.15,0.35,0.10,0.666667,1.904762,0.0475,1.950
2,(BISCUIT),(CORNFLAKES),0.35,0.30,0.15,0.428571,1.428571,0.0450,1.225
3,(CORNFLAKES),(BISCUIT),0.30,0.35,0.15,0.500000,1.428571,0.0450,1.300
4,(BISCUIT),(MAGGI),0.35,0.25,0.10,0.285714,1.142857,0.0125,1.050
...,...,...,...,...,...,...,...,...,...
99,"(COFFEE, CORNFLAKES)","(BISCUIT, COCK)",0.20,0.10,0.10,0.500000,5.000000,0.0800,1.800
100,(BISCUIT),"(COCK, COFFEE, CORNFLAKES)",0.35,0.10,0.10,0.285714,2.857143,0.0650,1.260
101,(COCK),"(BISCUIT, COFFEE, CORNFLAKES)",0.15,0.10,0.10,0.666667,6.666667,0.0850,2.700
102,(COFFEE),"(BISCUIT, COCK, CORNFLAKES)",0.40,0.10,0.10,0.250000,2.500000,0.0600,1.200


In [13]:
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False])
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
95,"(BISCUIT, COFFEE)","(COCK, CORNFLAKES)",0.10,0.10,0.10,1.000000,10.000000,0.090,inf
98,"(COCK, CORNFLAKES)","(BISCUIT, COFFEE)",0.10,0.10,0.10,1.000000,10.000000,0.090,inf
41,"(BISCUIT, COFFEE)",(COCK),0.10,0.15,0.10,1.000000,6.666667,0.085,inf
78,(JAM),"(BREAD, MAGGI)",0.10,0.15,0.10,1.000000,6.666667,0.085,inf
92,"(BISCUIT, COFFEE, CORNFLAKES)",(COCK),0.10,0.15,0.10,1.000000,6.666667,0.085,inf
...,...,...,...,...,...,...,...,...,...
8,(BREAD),(BOURNVITA),0.65,0.20,0.15,0.230769,1.153846,0.020,1.040000
14,(BREAD),(JAM),0.65,0.10,0.10,0.153846,1.538462,0.035,1.063636
39,(BREAD),"(BISCUIT, MILK)",0.65,0.10,0.10,0.153846,1.538462,0.035,1.063636
67,(BREAD),"(BOURNVITA, TEA)",0.65,0.10,0.10,0.153846,1.538462,0.035,1.063636


In [14]:
print(rules.head())


                      antecedents         consequents  antecedent support  \
95              (BISCUIT, COFFEE)  (COCK, CORNFLAKES)                 0.1   
98             (COCK, CORNFLAKES)   (BISCUIT, COFFEE)                 0.1   
41              (BISCUIT, COFFEE)              (COCK)                 0.1   
78                          (JAM)      (BREAD, MAGGI)                 0.1   
92  (BISCUIT, COFFEE, CORNFLAKES)              (COCK)                 0.1   

    consequent support  support  confidence       lift  leverage  conviction  
95                0.10      0.1         1.0  10.000000     0.090         inf  
98                0.10      0.1         1.0  10.000000     0.090         inf  
41                0.15      0.1         1.0   6.666667     0.085         inf  
78                0.15      0.1         1.0   6.666667     0.085         inf  
92                0.15      0.1         1.0   6.666667     0.085         inf  
