#**DEMO 1:Implementing Apriori Algorithm with Python**

###**Problem Definition**

Perform Apriori Algorithm on a simple dataset and mine association rules that may exist betweeen itemsets with metric set to confidence & lift.



###**Tasks to be performed**


>*   Importing Required Libraries
>*   Creating a simple dataset
>*   Transaction Encoding
>*   Understanding Apriori Algorithm
>*   Applying Apriori Algorithm
>*   Mining Association Rules


**Importing Required Libraries**

In [0]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
import warnings
warnings.filterwarnings("ignore")

**Creating a simple Dataset**

In [0]:
dataset = [['Milk', 'Snacks', 'Bread', 'Jam', 'Eggs', 'Banana', 'Beer', 'Diaper'],
           ['Milk','Bread','Beer', 'Diaper', 'Snacks','Banana'],
           ['Milk','Bread','Jam','Eggs','Banana'],
           ['Milk', 'Bread','Jam' 'Yogurt', 'Eggs','Banana'],
           ['Bread', 'Eggs', 'Diaper', 'Beer', 'Snacks'],
           ['Beer','Diaper','Snacks' 'Eggs','Jam']]

**Transaction Encoding**



In [0]:
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
df.head()

Unnamed: 0,Banana,Beer,Bread,Diaper,Eggs,Jam,JamYogurt,Milk,Snacks,SnacksEggs
0,True,True,True,True,True,True,False,True,True,False
1,True,True,True,True,False,False,False,True,True,False
2,True,False,True,False,True,True,False,True,False,False
3,True,False,True,False,True,False,True,True,False,False
4,False,True,True,True,True,False,False,False,True,False


**Understanding Apriori Algorithm**

Apriori algorithm is a classical algorithm used in data mining. Used for finding frequent itemsets and mining association rules that may exist between different itemsets. It operates very well on a dataset containing large number of transactions. It is not only easy to understand but also to implement. The frequent itemsets generated by the Apriori Algorithm can be used to determine the association rules that may exist between different items.


There are three major components of Apriori Algorithm.They are:

>* **Support :** It can be defined as the popularity of a particular item.It can be calculated as the number of transactions invloving that particular item divided by total number of transactions.

>* **Confidence :**It is the likelihood of an item Y being purchased when X was purchased.

>* **Lift :**It is the likelihood of an item Y being purchased when item X is purchased while considering the popularity of Y.

**3.Applying Apriori Algorithm**

In [0]:
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)

frequent_itemsets

Unnamed: 0,support,itemsets
0,0.666667,(Banana)
1,0.666667,(Beer)
2,0.833333,(Bread)
3,0.666667,(Diaper)
4,0.666667,(Eggs)
5,0.666667,(Milk)
6,0.666667,"(Bread, Banana)"
7,0.666667,"(Milk, Banana)"
8,0.666667,"(Diaper, Beer)"
9,0.666667,"(Bread, Eggs)"


**From above, you can see that the result is a dataframe with support for each itemsets.**

**4. Mining Association Rules**

In [0]:
association_rules(frequent_itemsets, metric="confidence", min_threshold=0.8)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Banana),(Bread),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf
1,(Milk),(Banana),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf
2,(Banana),(Milk),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf
3,(Diaper),(Beer),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf
4,(Beer),(Diaper),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf
5,(Eggs),(Bread),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf
6,(Milk),(Bread),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf
7,"(Milk, Bread)",(Banana),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf
8,"(Bread, Banana)",(Milk),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf
9,"(Milk, Banana)",(Bread),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf


**From above, you can see the result of association analysis showing which item is frequently purchased with other items.**

In [0]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Bread),(Banana),0.833333,0.666667,0.666667,0.8,1.2,0.111111,1.666667
1,(Banana),(Bread),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf
2,(Milk),(Banana),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf
3,(Banana),(Milk),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf
4,(Diaper),(Beer),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf
5,(Beer),(Diaper),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf
6,(Bread),(Eggs),0.833333,0.666667,0.666667,0.8,1.2,0.111111,1.666667
7,(Eggs),(Bread),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf
8,(Milk),(Bread),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf
9,(Bread),(Milk),0.833333,0.666667,0.666667,0.8,1.2,0.111111,1.666667


In [0]:
rules["antecedent_len"] = rules["antecedents"].apply(lambda x: len(x))
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
0,(Bread),(Banana),0.833333,0.666667,0.666667,0.8,1.2,0.111111,1.666667,1
1,(Banana),(Bread),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf,1
2,(Milk),(Banana),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,1
3,(Banana),(Milk),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,1
4,(Diaper),(Beer),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,1
5,(Beer),(Diaper),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,1
6,(Bread),(Eggs),0.833333,0.666667,0.666667,0.8,1.2,0.111111,1.666667,1
7,(Eggs),(Bread),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf,1
8,(Milk),(Bread),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf,1
9,(Bread),(Milk),0.833333,0.666667,0.666667,0.8,1.2,0.111111,1.666667,1


In [0]:
rules[ (rules['antecedent_len'] >= 0.5) &
       (rules['confidence'] > 0.75) &
       (rules['lift'] > 1.2) ]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
2,(Milk),(Banana),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,1
3,(Banana),(Milk),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,1
4,(Diaper),(Beer),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,1
5,(Beer),(Diaper),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,1
10,"(Milk, Bread)",(Banana),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,2
11,"(Bread, Banana)",(Milk),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,2
14,(Milk),"(Bread, Banana)",0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,1
15,(Banana),"(Milk, Bread)",0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,1


In [0]:
rules[rules['antecedents'] == {'Milk', 'Bread'}]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
10,"(Milk, Bread)",(Banana),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,2


**From above, we can say that when a customer buys milk and bread together, he/she is likely to banana as well. So, to increase the sales we can put these items together.**