### ASSOCIATE RULE LEARNING

### **What is Association Rule Learning?**

Association Rule Learning is a machine learning technique used to **identify relationships between variables** (typically items) in **transaction datasets**.

It is widely used in:

* Market basket analysis
* Recommender systems
* Inventory management

The idea is to find **frequent patterns**, **correlations**, or **associations** among sets of items in large databases.

---

#### **Terminology**

* **Itemset**: A collection of one or more items (e.g., {milk, bread}).
* **Transaction**: A single occurrence (e.g., one customer’s purchase).
* **Support**: Frequency of occurrence of an itemset in the dataset.
* **Confidence**: Probability of item B being purchased when item A is purchased.
* **Lift**: How much more likely item B is purchased with item A than independently.

---

#### **Mathematical Example**

Consider the following 5 transactions in a grocery store:

| Transaction ID | Items Bought        |
| -------------- | ------------------- |
| T1             | Milk, Bread, Butter |
| T2             | Bread, Butter       |
| T3             | Milk, Bread         |
| T4             | Milk, Butter        |
| T5             | Bread, Butter       |

Now, we want to generate the rule:
**{Milk} → {Butter}**

---

### Step 1: Support

Support is the proportion of transactions where **both Milk and Butter** are bought.

$$
\text{Support}(\{Milk, Butter\}) = \frac{\text{Count of transactions with Milk and Butter}}{\text{Total transactions}} = \frac{2}{5} = 0.4
$$

T1 and T4 include both Milk and Butter.

---

### Step 2: Confidence

Confidence is the proportion of transactions with **Milk** that also contain **Butter**.

$$
\text{Confidence}(\{Milk\} \rightarrow \{Butter\}) = \frac{\text{Support}(\{Milk, Butter\})}{\text{Support}(\{Milk\})}
$$

* Transactions with Milk: T1, T3, T4 → 3 transactions
* Transactions with Milk and Butter: T1, T4 → 2 transactions

$$
\text{Confidence} = \frac{2}{3} \approx 0.667
$$

So, when Milk is bought, Butter is also bought about 66.7% of the time.

---

### Step 3: Lift

Lift measures whether Milk and Butter are bought together **more often than expected** if they were independent.

$$
\text{Lift}(\{Milk\} \rightarrow \{Butter\}) = \frac{\text{Confidence}(\{Milk\} \rightarrow \{Butter\})}{\text{Support}(\{Butter\})}
$$

* Transactions with Butter: T1, T2, T4, T5 → 4 transactions

$$
\text{Support}(\{Butter\}) = \frac{4}{5} = 0.8
$$

$$
\text{Lift} = \frac{0.667}{0.8} = 0.833
$$

Since Lift < 1, this suggests that **Milk and Butter occur together less often than expected**. That means the rule is **not a strong positive association**.

---

### **How Rules Are Chosen**

Rules are filtered using **minimum thresholds** for:

* **Support**: To eliminate rare itemsets
* **Confidence**: To ensure reliability
* **Lift** (or other interestingness measures): To evaluate usefulness

---

### **Common Algorithms**

1. **Apriori Algorithm**

   * Iteratively generates larger frequent itemsets from smaller ones.
   * Prunes itemsets that don’t meet minimum support.

2. **FP-Growth (Frequent Pattern Growth)**

   * More efficient for large datasets.
   * Uses a compact tree structure (FP-tree) to avoid candidate generation.

3. **ECLAT (Equivalence Class Transformation)**

   * Uses a vertical data format and set intersection.
   * Often faster for dense datasets.

---

### **Conclusion**

Association Rule Learning is valuable for uncovering hidden patterns in transactional data. By analyzing itemsets using **support**, **confidence**, and **lift**, you can discover actionable insights like cross-sell opportunities, bundling strategies, and customer behavior.

If you want, I can show you how to implement this in Python using `mlxtend` or `apyori`.


### **Apriori Algorithm**

The **Apriori Algorithm** is a classic **association rule learning algorithm** used to identify **frequent itemsets** in transactional datasets and generate **association rules** from them.

It is widely used in **market basket analysis** and other tasks where the goal is to discover which items frequently appear together.

---

### **Key Idea**

Apriori is based on the **“Apriori Property”**:

> *If an itemset is frequent, then all of its subsets must also be frequent.*

This property is used to **reduce the number of candidate itemsets** that need to be examined.

---

### **Steps of the Apriori Algorithm**

#### **1. Set Minimum Thresholds**

* **Minimum Support**: The minimum frequency for an itemset to be considered frequent.
* **Minimum Confidence**: The minimum likelihood of a rule being considered interesting.

---

#### **2. Generate Frequent Itemsets**

Start with itemsets of size 1 and grow them iteratively:

* **Step 1:** Find all 1-itemsets that satisfy minimum support.
* **Step 2:** Use those to generate 2-itemsets.
* **Step 3:** Keep generating k-itemsets by joining frequent (k−1)-itemsets.
* At each step, **prune** itemsets that contain **infrequent subsets**.

This is known as a **breadth-first search** through the itemset lattice.

---

#### **3. Generate Association Rules**

For each frequent itemset:

* Generate rules of the form $A \rightarrow B$
* Compute **confidence** and **lift**
* Keep rules that meet the thresholds

---

### **Example**

Let’s consider a dataset of 5 transactions:

| Transaction ID | Items Bought        |
| -------------- | ------------------- |
| T1             | Milk, Bread, Butter |
| T2             | Bread, Butter       |
| T3             | Milk, Bread         |
| T4             | Milk, Butter        |
| T5             | Bread, Butter       |

Assume:

* Minimum Support = 0.4 (40%)
* Minimum Confidence = 0.6 (60%)

---

### **Step 1: Find Frequent 1-itemsets**

Count how often each item appears:

* Milk: 3/5 = 0.6 → Frequent
* Bread: 4/5 = 0.8 → Frequent
* Butter: 4/5 = 0.8 → Frequent

All 1-itemsets are frequent.

---

#### **Step 2: Generate and Filter 2-itemsets**

Possible 2-itemsets:

* {Milk, Bread} → Support = 2/5 = 0.4 ✔
* {Milk, Butter} → Support = 2/5 = 0.4 ✔
* {Bread, Butter} → Support = 3/5 = 0.6 ✔

All are frequent.

---

#### **Step 3: Generate and Filter 3-itemsets**

Possible 3-itemset:

* {Milk, Bread, Butter} → Support = 1/5 = 0.2 ✘ (not frequent)

So the process stops here.

---

#### **Step 4: Generate Rules**

From {Milk, Bread}, generate:

* Rule: Milk → Bread
  Confidence = 2/3 = 0.667 ✔
* Rule: Bread → Milk
  Confidence = 2/4 = 0.5 ✘ (less than 0.6)

Keep only rules that meet minimum confidence.

---

### **Advantages**

* Simple and easy to implement
* Guarantees completeness (will find all frequent itemsets)

---

### **Disadvantages**

* Computationally expensive on large datasets
* Requires multiple scans of the dataset
* Can generate a large number of candidate itemsets

### Importing the libraries

In [5]:
!pip install apyori



In [6]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### Data Preprocessing

In [8]:
df = pd.read_csv('../datasets/Market_Basket_Optimisation.csv', header = None)
transactions = []
for i in range(0, 7501):
  transactions.append([str(df.values[i,j]) for j in range(0, 20)])

### Training the Apriori model on the dataset

In [10]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

### Visualising the results

#### **Displaying the first results coming directly from the output of the apriori function**

In [13]:
results = list(rules)
results

[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

#### **Putting the results well organised into a Pandas DataFrame**

In [15]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

#### **Displaying the results non sorted**

In [17]:
resultsinDataFrame

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,light cream,chicken,0.004533,0.290598,4.843951
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
2,pasta,escalope,0.005866,0.372881,4.700812
3,fromage blanc,honey,0.003333,0.245098,5.164271
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
6,light cream,olive oil,0.0032,0.205128,3.11471
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
8,pasta,shrimp,0.005066,0.322034,4.506672


#### **Displaying the results sorted by descending lifts**

In [19]:
resultsinDataFrame.nlargest(n = 10, columns = 'Lift')

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
3,fromage blanc,honey,0.003333,0.245098,5.164271
0,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
8,pasta,shrimp,0.005066,0.322034,4.506672
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
6,light cream,olive oil,0.0032,0.205128,3.11471
