# Association Rules Generation from Frequent Itemsets
* Function to generate association rules from frequent itemsets
* Reference: https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/

## Metrics

- 'support':
    support(A→C)=support(A∪C),range: [0,1]

- 'confidence':
    confidence(A→C)=support(A→C)/support(A),range: [0,1]

- 'lift':
    lift(A→C)=confidence(A→C)/support(C),range: [0,∞]

- 'leverage':
    levarage(A→C)=support(A→C)−support(A)*support(C),range: [−1,1]

- 'conviction':
    onviction(A→C)=1−support(C)/1−confidence(A→C),range: [0,∞]
    
    

In [2]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [3]:
dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
print(frequent_itemsets)

    support                     itemsets
0       0.8                       (Eggs)
1       1.0               (Kidney Beans)
2       0.6                       (Milk)
3       0.6                      (Onion)
4       0.6                     (Yogurt)
5       0.8         (Eggs, Kidney Beans)
6       0.6                (Onion, Eggs)
7       0.6         (Milk, Kidney Beans)
8       0.6        (Onion, Kidney Beans)
9       0.6       (Yogurt, Kidney Beans)
10      0.6  (Onion, Eggs, Kidney Beans)


# Example 1 -- Generating Association Rules from Frequent Itemsets


In [4]:
rules_by_conf = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
print(rules_by_conf)

              antecedents            consequents  antecedent support  \
0           (Onion, Eggs)         (Kidney Beans)                 0.6   
1   (Onion, Kidney Beans)                 (Eggs)                 0.6   
2    (Eggs, Kidney Beans)                (Onion)                 0.8   
3                 (Onion)   (Eggs, Kidney Beans)                 0.6   
4                  (Eggs)  (Onion, Kidney Beans)                 0.8   
5                 (Onion)                 (Eggs)                 0.6   
6                  (Eggs)                (Onion)                 0.8   
7                  (Eggs)         (Kidney Beans)                 0.8   
8          (Kidney Beans)                 (Eggs)                 1.0   
9                  (Milk)         (Kidney Beans)                 0.6   
10               (Yogurt)         (Kidney Beans)                 0.6   
11                (Onion)         (Kidney Beans)                 0.6   

    consequent support  support  confidence  lift  leverage  co

# Example 2 -- Rule Generation and Selection Criteria


In [5]:
rules_by_lift = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
print(rules_by_lift)

             antecedents            consequents  antecedent support  \
0  (Onion, Kidney Beans)                 (Eggs)                 0.6   
1   (Eggs, Kidney Beans)                (Onion)                 0.8   
2                (Onion)   (Eggs, Kidney Beans)                 0.6   
3                 (Eggs)  (Onion, Kidney Beans)                 0.8   
4                (Onion)                 (Eggs)                 0.6   
5                 (Eggs)                (Onion)                 0.8   

   consequent support  support  confidence  lift  leverage  conviction  
0                 0.8      0.6        1.00  1.25      0.12         inf  
1                 0.6      0.6        0.75  1.25      0.12    1.600000  
2                 0.8      0.6        1.00  1.25      0.12         inf  
3                 0.6      0.6        0.75  1.25      0.12    1.600000  
4                 0.8      0.6        1.00  1.25      0.12         inf  
5                 0.6      0.6        0.75  1.25      0.12    1.

## Filter the results
Let's say we are ony interested in rules that satisfy the following criteria:
- at least 2 antecedents
- a confidence > 0.75
- a lift score > 1.2
We could compute the antecedent length as follows:

In [6]:
rules_by_lift["antecedent_len"] = rules_by_lift["antecedents"].apply(lambda x: len(x))
rules_by_lift[ (rules_by_lift['antecedent_len'] >= 2) &
               (rules_by_lift['confidence'] > 0.75) &
               (rules_by_lift['lift'] > 1.2) ]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
0,"(Onion, Kidney Beans)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,2
