# C6: APRIORI BASED AND HYBRID RECOMMENDATION SYSTEM

## Market Basket Analysis

A technique used to find relationships between items frequently bought together.

- **Definition:** An unsupervised learning method that identifies associations or co-occurrences among items in transactional data
- **Goal:** Discover patterns such as "If a customer buys X, they are likely to buy Y"
- **Algorithms:** Commonly used association rule learning algorithms like
  - Apriori
  - FP-Growth
  - Eclat
- **Applications:**
  - Product recommendation
  - Cross-selling strategies
  - Store layout optimization
  - Fraud detection

## Association Rules

- **Support:** How often items appear together  
  - $\mathrm{Support(X \Rightarrow Y)} = \frac{\text{Transactions with X and Y}}{\text{Total transactions}}$

- **Confidence:** Likelihood of buying Y when X is bought  
  - $\mathrm{Confidence(X \Rightarrow Y)} = \frac{\text{Transactions with X and Y}}{\text{Transactions with X}}$

- **Lift:** Strength of association compared to random chance  
  - $\mathrm{Lift(X \Rightarrow Y)} = \frac{Confidence(X \Rightarrow Y)}{Support(Y)}$
  - Lift > 1: strong association  
  - Lift = 1: independent  
  - Lift < 1: negative association  

## Apriori Algorithm

- **Purpose:** Find frequent itemsets and generate association rules
- **Steps:**
  1. Set a minimum support and confidence threshold
  2. Generate candidate itemsets of length 1
  3. Filter itemsets that meet minimum support
  4. Generate larger itemsets
  5. Repeat until no more frequent itemsets can be generated
  6. Generate association rules
- **Advantages:**
  - Simple and easy to implement
  - Works well for small to medium datasets
- **Disadvantages:**
  - Can be slow on large datasets
  - Requires multiple scans of the database

In [3]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)

# Sample transaction dataset
dataset = [
    ['Milk', 'Bread', 'Butter'],
    ['Beer', 'Bread'],
    ['Milk', 'Bread', 'Butter', 'Beer'],
    ['Milk', 'Bread'],
    ['Bread', 'Butter']
]

# Convert dataset into one-hot encoded DataFrame
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

# Step 1: Find frequent itemsets
frequent_itemsets = apriori(df, min_support=0.4, use_colnames=True)
print("Frequent Itemsets:\n", frequent_itemsets)

# Step 2: Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)
print("\nAssociation Rules:\n", rules[['antecedents','consequents','support','confidence','lift']])


Frequent Itemsets:
    support               itemsets
0      0.4                 (Beer)
1      1.0                (Bread)
2      0.6               (Butter)
3      0.6                 (Milk)
4      0.4          (Beer, Bread)
5      0.6        (Butter, Bread)
6      0.6          (Bread, Milk)
7      0.4         (Butter, Milk)
8      0.4  (Butter, Bread, Milk)

Association Rules:
         antecedents      consequents  support  confidence      lift
0            (Beer)          (Bread)      0.4    1.000000  1.000000
1           (Bread)           (Beer)      0.4    0.400000  1.000000
2          (Butter)          (Bread)      0.6    1.000000  1.000000
3           (Bread)         (Butter)      0.6    0.600000  1.000000
4           (Bread)           (Milk)      0.6    0.600000  1.000000
5            (Milk)          (Bread)      0.6    1.000000  1.000000
6          (Butter)           (Milk)      0.4    0.666667  1.111111
7            (Milk)         (Butter)      0.4    0.666667  1.111111
8   (Bu

## Hybrid Methods

- **Definition:** Techniques that combine two or more algorithms to overcome limitations of individual methods  
- **Goal:** Improve accuracy, scalability, or interpretability by leveraging the strengths of different approaches  

### Types of Hybridization

1. **Weighted:** Combine scores from multiple recommenders  
2. **Switching:** Switch between methods depending on context  
3. **Cascade:** Use one method first, then refine with another  
4. **Feature Augmentation:** Output of one method becomes input features for another  

### Common Hybrid Methods

1. **Hybrid Association Rule Mining**  
   - Combine Apriori / FP-Growth with clustering or classification  
   - Example: Cluster customers first, then run Apriori in each cluster for more personalized rules  

2. **Association + Prediction Models**  
   - Use association rules to generate features, then feed them into ML models  
   - Helpful in recommendation systems  

3. **Apriori + Genetic Algorithms**  
   - Genetic algorithms optimize rule selection and reduce irrelevant rules  

4. **Hybrid Collaborative Filtering**  
   - In recommender systems: Combine content-based filtering with association rule mining  

5. **Deep Learning Hybrids**  
   - Use embeddings with association rule learning to capture nonlinear relationships more effectively  

### Advantages

- More accurate than single methods  
- Handles large-scale or complex datasets  
- Improves personalization in recommendations  

### Disadvantages

- More complex to implement and tune  
- Higher computational cost  

## Evaluation Metrics

- **Support**  
- **Confidence**  
- **Lift**  
- **Leverage:**  
  - Difference between observed co-occurrence and expected if independent  
  - $\mathrm{Leverage(X \Rightarrow Y)} = Support(X \cup Y) - Support(X) \cdot Support(Y)$  
  - Range: $[-1, 1]$  
  - Use: Identifies how much better the rule is compared to chance  

- **Conviction:**  
  - How often X occurs without Y, compared to the expectation if X and Y were independent  
  - $\mathrm{Conviction(X \Rightarrow Y)} = \dfrac{1 - Support(Y)}{1 - Confidence(X \Rightarrow Y)}$  
  - Higher conviction means stronger implication  
  - If $Confidence = 1$, then $Conviction = \infty$  


In [1]:
# Example : Association Rules Filtered by Lift

import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

# Transactions
dataset = [
    ['Milk', 'Bread', 'Butter'],
    ['Milk', 'Beer'],
    ['Milk', 'Bread'],
    ['Beer', 'Bread'],
    ['Milk', 'Bread', 'Beer', 'Butter']
]

# One-hot encode
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

# Frequent itemsets
freq_items = apriori(df, min_support=0.4, use_colnames=True)

# Association rules with lift filter
rules = association_rules(freq_items, metric="lift", min_threshold=1.1)
print(rules[['antecedents','consequents','support','confidence','lift']])


       antecedents      consequents  support  confidence      lift
0         (Butter)          (Bread)      0.4    1.000000  1.250000
1          (Bread)         (Butter)      0.4    0.500000  1.250000
2         (Butter)           (Milk)      0.4    1.000000  1.250000
3           (Milk)         (Butter)      0.4    0.500000  1.250000
4  (Bread, Butter)           (Milk)      0.4    1.000000  1.250000
5   (Butter, Milk)          (Bread)      0.4    1.000000  1.250000
6    (Bread, Milk)         (Butter)      0.4    0.666667  1.666667
7         (Butter)    (Bread, Milk)      0.4    1.000000  1.666667
8          (Bread)   (Butter, Milk)      0.4    0.500000  1.250000
9           (Milk)  (Bread, Butter)      0.4    0.500000  1.250000
