   
 __Association Rules__ is an unsupervised technique to extract pattern or relation between items. The rule defines association between A and B as A => B i.e if A is purchased B is also purchased. 

An association rule consists of an antecedent and a consequent.

$${\{Pen, Pencil\}} \to \{Paper\}$$
$$     {antecedent} \to consequent$$

For a given rule, `itemset` is the list of all the items in the antecedent and the consequent.

$${itemset} \to \{Pen, Pencil, Paper\}$$


### Measuring the strength of a rule

**Support**

Support is the fraction of the total number of transactions in which the itemset occurs.

$$
{Support(\{A\} \to \{B\}) = \frac{Transactions\ containing\ both\ A\ and\ B"}{Total\ number\ of\ transactions}}
$$

**Confidence**

Confidence is the conditional probability of occurrence of consequent given the antecedent.

$$
{Confidence(\{A\} \to \{B\}) = \frac{Transactions\ containing\ both\ A\ and\ B"}{Transactions\ containing\ A}}
$$

**Lift**

Lift is a very literal term given to this measure. Think of it as the **`lift`** that {A} provides to our confidence for having {B} on the cart. To rephrase, lift is the rise in probability of having {B} on the cart with the knowledge of {A} being present over the probability of having {B} on the cart without any knowledge about presence of {A}. Mathematically,

$$
{Lift(\{A\} \to \{B\}) = ( \frac{Transactions\ containing\ both\ A\ and\ B}{Transactions\ containing\ A}} )/{(Fractions\ of\ transactions\ containing\ B )}
$$

#### Frequent itemsets

An itemset is considered as `frequent` if it meets a user-specified __support threshold__. 

For instance, if a threshold of support is 0.05 (5%), a frequent itemset is defined as a set of items that occur together in at least 5% of all transactions in the database.

In [1]:
!pip install mlxtend

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
import warnings
warnings.filterwarnings("ignore")

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [9]:
import os
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules
import xlrd
import random
import matplotlib.pyplot as plt
%matplotlib inline

In [12]:
df=pd.read_csv('/content/drive/MyDrive/ML/Transactions.csv',header=None)

In [13]:
df.head()


Unnamed: 0,0,1
0,1001,Choclates
1,1001,Pencil
2,1001,Marker
3,1002,Pencil
4,1002,Choclates


In [14]:
df.columns=['ID','Items']    #Assign column name
df.head()

Unnamed: 0,ID,Items
0,1001,Choclates
1,1001,Pencil
2,1001,Marker
3,1002,Pencil
4,1002,Choclates


In [17]:
#basket1=pd.crsstab(df.ID,df.Items).astype('bool').astype()
basket=pd.crosstab(df.ID,df.Items)
basket

Items,Choclates,Coke,Cookies,Eraser,Marker,Pencil
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1001,1,0,0,0,1,1
1002,1,0,0,0,0,1
1003,0,1,0,1,0,1
1004,1,0,1,0,0,1
1005,0,0,0,0,1,0
1006,0,0,0,0,1,1
1007,1,0,0,0,0,1
1008,1,0,1,0,0,1
1009,0,0,0,0,1,1
1010,0,1,0,0,1,0


In [19]:
basket.columns.name= None
basket.index.name=None
basket

Unnamed: 0,Choclates,Coke,Cookies,Eraser,Marker,Pencil
1001,1,0,0,0,1,1
1002,1,0,0,0,0,1
1003,0,1,0,1,0,1
1004,1,0,1,0,0,1
1005,0,0,0,0,1,0
1006,0,0,0,0,1,1
1007,1,0,0,0,0,1
1008,1,0,1,0,0,1
1009,0,0,0,0,1,1
1010,0,1,0,0,1,0


In [20]:
frequent_itemsets=apriori(basket,min_support=0.07,use_colnames=True)
print(frequent_itemsets)

    support                      itemsets
0       0.5                   (Choclates)
1       0.2                        (Coke)
2       0.2                     (Cookies)
3       0.1                      (Eraser)
4       0.5                      (Marker)
5       0.8                      (Pencil)
6       0.2          (Cookies, Choclates)
7       0.1           (Marker, Choclates)
8       0.5           (Pencil, Choclates)
9       0.1                (Eraser, Coke)
10      0.1                (Coke, Marker)
11      0.1                (Pencil, Coke)
12      0.2             (Pencil, Cookies)
13      0.1              (Eraser, Pencil)
14      0.3              (Pencil, Marker)
15      0.2  (Pencil, Cookies, Choclates)
16      0.1   (Pencil, Marker, Choclates)
17      0.1        (Eraser, Coke, Pencil)


In [21]:
rules=association_rules(frequent_itemsets,metric="lift",min_threshold=1)
rules=rules.loc[:,["antecedents","consequents","support","confidence","lift"]]
rules

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(Cookies),(Choclates),0.2,1.0,2.0
1,(Choclates),(Cookies),0.2,0.4,2.0
2,(Pencil),(Choclates),0.5,0.625,1.25
3,(Choclates),(Pencil),0.5,1.0,1.25
4,(Eraser),(Coke),0.1,1.0,5.0
5,(Coke),(Eraser),0.1,0.5,5.0
6,(Coke),(Marker),0.1,0.5,1.0
7,(Marker),(Coke),0.1,0.2,1.0
8,(Pencil),(Cookies),0.2,0.25,1.25
9,(Cookies),(Pencil),0.2,1.0,1.25


In [23]:
import ipywidgets as widgets
from ipywidgets import interact ,interact_manual

In [26]:
@interact
def thresholds(lift=(0,1.5,0.1),confidence=(0,1,0.1),support=(0,1,0.05)):
    print(rules[ (rules['lift']>=lift) & (rules['confidence']>=confidence) &
            (rules['support']<support)])

interactive(children=(FloatSlider(value=0.7000000000000001, description='lift', max=1.5), FloatSlider(value=0.…

In [None]:
rules[(rules['lift']>=1.5) &
     (rules['confidence']>=0.8) &
     (rules['support']<.2)]