# Association Rule Mining

### What is it?
Association Rule Mining is an unsupervised learning technique used to find relationships between items in large datasets.
It answers questions like: “If people buy X, what else are they likely to buy?”

### How it works

Step 1: Find frequent itemsets (items that appear together often).

Step 2: Generate rules from those itemsets.

Step 3: Evaluate rules using metrics:

Support → how frequently an itemset appears in the dataset.

Confidence → how often the rule is true.

Lift → how much stronger the rule is compared to random chance.

#### Simple Example

```
T1: Milk, Bread, Butter  
T2: Bread, Butter  
T3: Milk, Bread  
T4: Milk, Bread, Butter  
```

Rule: {Milk, Bread} → {Butter}

- Support = appears in 2/4 = 50%

- Confidence = (Milk & Bread & Butter) / (Milk & Bread) = 2/3 ≈ 66%

- Lift > 1 means positive association.

#### Real World Applications
Retail → Market Basket Analysis (Amazon, Walmart).

Recommendation Systems → “Users who listen to this song also like…”

Web Mining → Which web pages are visited together.

Healthcare → Which symptoms appear together.

#### Algorithms in Association Rule Mining
1. Apriori Algorithm (Most Common, Easy to Start With)

- Works by generating frequent itemsets step by step (breadth-first).

- Uses the Apriori property: “If an itemset is frequent, all its subsets must also be frequent.”

- Downside → slow on large datasets.

2. Eclat Algorithm

- Uses vertical data format (items with transaction IDs).

- Finds frequent itemsets using set intersections.

- Faster than Apriori in some cases

3. FP-Growth (Frequent Pattern Growth)

- Builds a tree structure (FP-tree) to represent transactions.

- Much faster than Apriori because it avoids generating too many candidate itemsets.

- Used in big data frameworks like Apache Spark MLlib.

In [1]:
!pip install mlxtend

Collecting mlxtend
  Downloading mlxtend-0.23.4-py3-none-any.whl.metadata (7.3 kB)
Downloading mlxtend-0.23.4-py3-none-any.whl (1.4 MB)
   ---------------------------------------- 0.0/1.4 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.4 MB ? eta -:--:--
   --------------- ------------------------ 0.5/1.4 MB 5.6 MB/s eta 0:00:01
   ------------------------------- -------- 1.0/1.4 MB 5.0 MB/s eta 0:00:01
   ---------------------------------------- 1.4/1.4 MB 2.8 MB/s  0:00:00
Installing collected packages: mlxtend
Successfully installed mlxtend-0.23.4


In [2]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

In [3]:
# Sample dataset (transactions in one-hot encoded form)
dataset = [
    ['Milk', 'Bread', 'Butter'],
    ['Bread', 'Butter'],
    ['Milk', 'Bread'],
    ['Milk', 'Bread', 'Butter']
]

In [4]:
# Convert dataset into DataFrame (one-hot encoding)
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

In [5]:
df

Unnamed: 0,Bread,Butter,Milk
0,True,True,True
1,True,True,False
2,True,False,True
3,True,True,True


In [7]:
# Step 1: Find frequent itemsets
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,1.0,(Bread)
1,0.75,(Butter)
2,0.75,(Milk)
3,0.75,"(Bread, Butter)"
4,0.75,"(Bread, Milk)"
5,0.5,"(Milk, Butter)"
6,0.5,"(Bread, Milk, Butter)"


In [8]:
# Step 2: Generate rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)
rules

  cert_metric = np.where(certainty_denom == 0, 0, certainty_num / certainty_denom)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Bread),(Butter),1.0,0.75,0.75,0.75,1.0,1.0,0.0,1.0,0.0,0.75,0.0,0.875
1,(Butter),(Bread),0.75,1.0,0.75,1.0,1.0,1.0,0.0,inf,0.0,0.75,0.0,0.875
2,(Bread),(Milk),1.0,0.75,0.75,0.75,1.0,1.0,0.0,1.0,0.0,0.75,0.0,0.875
3,(Milk),(Bread),0.75,1.0,0.75,1.0,1.0,1.0,0.0,inf,0.0,0.75,0.0,0.875
4,(Milk),(Butter),0.75,0.75,0.5,0.666667,0.888889,1.0,-0.0625,0.75,-0.333333,0.5,-0.333333,0.666667
5,(Butter),(Milk),0.75,0.75,0.5,0.666667,0.888889,1.0,-0.0625,0.75,-0.333333,0.5,-0.333333,0.666667
6,"(Bread, Milk)",(Butter),0.75,0.75,0.5,0.666667,0.888889,1.0,-0.0625,0.75,-0.333333,0.5,-0.333333,0.666667
7,"(Bread, Butter)",(Milk),0.75,0.75,0.5,0.666667,0.888889,1.0,-0.0625,0.75,-0.333333,0.5,-0.333333,0.666667
8,"(Milk, Butter)",(Bread),0.5,1.0,0.5,1.0,1.0,1.0,0.0,inf,0.0,0.5,0.0,0.75
9,(Bread),"(Milk, Butter)",1.0,0.5,0.5,0.5,1.0,1.0,0.0,1.0,0.0,0.5,0.0,0.75


In [9]:
print("\nAssociation Rules:\n", rules[['antecedents','consequents','support','confidence','lift']])


Association Rules:
         antecedents      consequents  support  confidence      lift
0           (Bread)         (Butter)     0.75    0.750000  1.000000
1          (Butter)          (Bread)     0.75    1.000000  1.000000
2           (Bread)           (Milk)     0.75    0.750000  1.000000
3            (Milk)          (Bread)     0.75    1.000000  1.000000
4            (Milk)         (Butter)     0.50    0.666667  0.888889
5          (Butter)           (Milk)     0.50    0.666667  0.888889
6     (Bread, Milk)         (Butter)     0.50    0.666667  0.888889
7   (Bread, Butter)           (Milk)     0.50    0.666667  0.888889
8    (Milk, Butter)          (Bread)     0.50    1.000000  1.000000
9           (Bread)   (Milk, Butter)     0.50    0.500000  1.000000
10           (Milk)  (Bread, Butter)     0.50    0.666667  0.888889
11         (Butter)    (Bread, Milk)     0.50    0.666667  0.888889
