# 14. Apriori Algorithm  
**Author**: Your Name  
**Date**: June 9, 2025  

## Introduction

The Apriori algorithm is a classic unsupervised learning technique for **Association Rule Mining**. It is widely used for **Market Basket Analysis** to discover relationships between items in transactional datasets.

- **Type**: Unsupervised Learning  
- **Task**: Association Rule Mining  
- **Goal**: Find rules that satisfy minimum support and confidence thresholds

## Core Idea

Apriori finds **frequent itemsets** (items that often appear together) and derives **association rules** from them. It uses the **Apriori Principle**:  
> "If an itemset is frequent, all of its subsets must also be frequent."

## Key Metrics

- **Support**: Proportion of transactions containing the itemset  
- **Confidence**: Probability of Y given X in the rule X → Y  
- **Lift**: How much more likely Y is when X is present  

## Steps of Apriori

1. **Frequent Itemset Generation**  
   - Generate all itemsets that meet `min_support`
2. **Association Rule Generation**  
   - For each frequent itemset, generate rules that meet `min_confidence`

## Pros  
- Interpretable, easy to implement  
- Parallelizable  

## Cons  
- Computationally expensive on large datasets  
- Many candidate itemsets and rules may be generated  

## Use Cases  
- Market basket analysis  
- Web usage mining  
- Gene association in bioinformatics  
- Recommendation systems


In [2]:
import pandas as pd

# Install mlxtend if needed:
!pip install mlxtend

from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

# Optional: Display formatting
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 200)


Collecting mlxtend
  Downloading mlxtend-0.23.4-py3-none-any.whl.metadata (7.3 kB)
Downloading mlxtend-0.23.4-py3-none-any.whl (1.4 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: mlxtend
Successfully installed mlxtend-0.23.4


In [3]:
# Sample transactions
dataset = [
    ['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
    ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
    ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
    ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
    ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']
]


In [4]:
# Transform to one-hot encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)

df = pd.DataFrame(te_ary, columns=te.columns_)

print("One-Hot Encoded Transactional Data:")
print(df)


One-Hot Encoded Transactional Data:
   Apple   Corn   Dill   Eggs  Ice cream  Kidney Beans   Milk  Nutmeg  Onion  Unicorn  Yogurt
0  False  False  False   True      False          True   True    True   True    False    True
1  False  False   True   True      False          True  False    True   True    False    True
2   True  False  False   True      False          True   True   False  False    False   False
3  False   True  False  False      False          True   True   False  False     True    True
4  False   True  False   True       True          True  False   False   True    False   False


In [5]:
# Use Apriori with min_support = 0.6 (60%)
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)

print("\nFrequent Itemsets (min_support = 0.6):")
print(frequent_itemsets)



Frequent Itemsets (min_support = 0.6):
    support                     itemsets
0       0.8                       (Eggs)
1       1.0               (Kidney Beans)
2       0.6                       (Milk)
3       0.6                      (Onion)
4       0.6                     (Yogurt)
5       0.8         (Kidney Beans, Eggs)
6       0.6                (Eggs, Onion)
7       0.6         (Kidney Beans, Milk)
8       0.6        (Kidney Beans, Onion)
9       0.6       (Kidney Beans, Yogurt)
10      0.6  (Kidney Beans, Eggs, Onion)


### Interpretation:
Itemsets appearing in at least 60% (i.e. 3 of 5) transactions are retained.


In [6]:
# Generate rules with min_confidence = 0.7
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

# Sort by lift for stronger relationships
rules = rules.sort_values(by='lift', ascending=False)

print("\nGenerated Association Rules (min_confidence = 0.7):")
print(rules)



Generated Association Rules (min_confidence = 0.7):
              antecedents            consequents  antecedent support  consequent support  support  confidence  lift  representativity  leverage  conviction  zhangs_metric  jaccard  certainty  \
3                 (Onion)                 (Eggs)                 0.6                 0.8      0.6        1.00  1.25               1.0      0.12         inf            0.5     0.75      1.000   
8   (Kidney Beans, Onion)                 (Eggs)                 0.6                 0.8      0.6        1.00  1.25               1.0      0.12         inf            0.5     0.75      1.000   
11                (Onion)   (Kidney Beans, Eggs)                 0.6                 0.8      0.6        1.00  1.25               1.0      0.12         inf            0.5     0.75      1.000   
2                  (Eggs)                (Onion)                 0.8                 0.6      0.6        0.75  1.25               1.0      0.12         1.6            1.0 

  cert_metric = np.where(certainty_denom == 0, 0, certainty_num / certainty_denom)


### Rule Interpretation Example:

Rule: {Kidney Beans, Eggs} → {Onion}  
- **Support**: 0.6 → Found in 60% of transactions  
- **Confidence**: 0.75 → 75% of times {Kidney Beans, Eggs} are bought, Onion is too  
- **Lift**: 1.25 → Onion is 1.25x more likely to be bought with Kidney Beans & Eggs than by chance  

Because Lift > 1, this is a **positive association**.


## Conclusion & Key Takeaways

- The Apriori algorithm discovers frequent itemsets and strong association rules.
- It uses the Apriori Principle to prune the search space efficiently.
- Rules are evaluated using **support**, **confidence**, and **lift**.
- The dataset must be **one-hot encoded** before applying Apriori.
- While useful, Apriori can be slow on large datasets; alternatives include **FP-Growth**.

## Further Reading

- [mlxtend Apriori Documentation](http://rasbt.github.io/mlxtend/)
- Agrawal & Srikant (1994), “Fast Algorithms for Mining Association Rules”
- [StatQuest: Association Rules](https://www.youtube.com/watch?v=ZSPG7P4T0oE)
