# Association Rule Learning: Apriori

This is **Market Basket Analysis**. We aren't predicting a target ($y$). We aren't grouping points ($k$).
We are searching for **Rules**: "If you buy X, you are likely to buy Y."

### The 3 Key Metrics
1.  **Support:** How popular is this item? (Transactions containing X / Total Transactions)
2.  **Confidence:** If you buy X, how likely are you to buy Y? (Both / X)
3.  **Lift:** Is this proper correlation or just coincidence? (Confidence / Support of Y)
    - **Lift > 1:** They are positively correlated (Buy X implies Buy Y).
    - **Lift = 1:** No correlation.
    - **Lift < 1:** They are enemies (Buying X means you likely WON'T buy Y).

In [9]:
# Install mlxtend if needed (Standard library for this)
# !pip install mlxtend

## Create a "Supermarket" Dataset

In [10]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# A list of lists (Each inner list is a customer's basket)
dataset = [
    ['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
    ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
    ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
    ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
    ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs'],
    ['Milk', 'Unicorn', 'Corn', 'Yogurt'],
    ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
    ['Milk', 'Onion', 'Eggs', 'Yogurt'],
    ['Corn', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs'],
    ['Milk', 'Onion', 'Nutmeg', 'Ice cream', 'Yogurt']
]

# 1. Convert to One-Hot Encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

print("Transaction Matrix:")
print(df.head())

Transaction Matrix:
   Apple   Corn   Dill   Eggs  Ice cream  Kidney Beans   Milk  Nutmeg  Onion  \
0  False  False  False   True      False          True   True    True   True   
1  False  False   True   True      False          True  False    True   True   
2   True  False  False   True      False          True   True   False  False   
3  False   True  False  False      False          True   True   False  False   
4  False   True  False   True       True          True  False   False   True   

   Unicorn  Yogurt  
0    False    True  
1    False    True  
2    False   False  
3     True    True  
4    False   False  


## Run Apriori Algorithm
We want to find itemsets that appear in at least 50% of transactions (`min_support=0.5`).

In [11]:
# Find frequent itemsets
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)

print("Frequent Itemsets:")
print(frequent_itemsets)

Frequent Itemsets:
   support              itemsets
0      0.7                (Eggs)
1      0.7        (Kidney Beans)
2      0.7                (Milk)
3      0.6               (Onion)
4      0.6              (Yogurt)
5      0.6  (Eggs, Kidney Beans)
6      0.5         (Eggs, Onion)
7      0.5        (Yogurt, Milk)


## Generate Rules
Now we convert those itemsets into "If -> Then" rules.
Let's look for rules with at least **70% Confidence**.

In [12]:
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

# Filter for interesting columns
rules = rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

print("The Rules:")
print(rules.sort_values(by='lift', ascending=False))

The Rules:
      antecedents     consequents  support  confidence      lift
0          (Eggs)  (Kidney Beans)      0.6    0.857143  1.224490
1  (Kidney Beans)          (Eggs)      0.6    0.857143  1.224490
3         (Onion)          (Eggs)      0.5    0.833333  1.190476
4        (Yogurt)          (Milk)      0.5    0.833333  1.190476
2          (Eggs)         (Onion)      0.5    0.714286  1.190476
5          (Milk)        (Yogurt)      0.5    0.714286  1.190476


### Interpretation
Look at the rules with **Lift > 1**.
- Example: `If {Onion, Eggs} -> Then {Kidney Beans}`.
- **Confidence:** How often is it true?
- **Lift:** Is it a stronger connection than random chance?