# 📘 **1.4 Performance Metrics in Association Rule Mining – Detailed Notes**

---

## 🔰 **1. Introduction to Association Rule Mining**

Association Rule Mining is a technique used to identify relationships or patterns among a set of items in **transaction databases**, relational databases, or other data repositories.  
It’s mainly used in **market basket analysis**, where we analyze which items are frequently bought together.

A typical **association rule** has the form:

> **X ⇒ Y**  
> (meaning: *If itemset X is bought, itemset Y is also likely to be bought*)

Where:
- **X** = Antecedent (Left-hand side, condition)
- **Y** = Consequent (Right-hand side, result)

---

## 🧮 **2. Key Terminologies**

| Term         | Description |
|--------------|-------------|
| **Item**     | A single product or entity (e.g., bread) |
| **Itemset**  | A group/set of items (e.g., {bread, butter}) |
| **Transaction** | A record of items bought together (e.g., in a receipt) |
| **Support Count (σ)** | Frequency of an itemset in the dataset |
| **Rule**     | An implication in the form of X ⇒ Y |

---

## 📏 **3. Performance Metrics (Evaluation Measures)**

The quality of an association rule is measured using the following metrics:

---

### 🔹 3.1 **Support**

**Definition**:  
Support indicates how frequently an itemset appears in the entire dataset. It helps to filter out infrequent itemsets.

**Formula**:  
$$
\text{Support}(X ⇒ Y) = \frac{\text{Number of transactions containing both } X \text{ and } Y}{\text{Total number of transactions}} = P(X ∪ Y)
$$

**Purpose**:
- Measures the **popularity** of an itemset.
- Helps in finding relevant rules that occur frequently.

**Example**:
- Out of 100 transactions, 20 contain both milk and bread.  
- Support = $20 / 100 = \mathbf{0.20}$ (or 20%)

---

### 🔹 3.2 **Confidence**

**Definition**:  
Confidence is the likelihood that item Y is also purchased when item X is purchased.

**Formula**:  
$$
\text{Confidence}(X ⇒ Y) = \frac{\text{Support}(X ∪ Y)}{\text{Support}(X)} = P(Y|X)
$$

**Purpose**:
- Measures **accuracy** of the rule.
- Reflects **how often** the rule has been found true.

**Example**:
- 40 people buy milk, and 20 buy both milk and bread.  
- Confidence = $20 / 40 = \mathbf{0.50}$ (or 50%)

---

### 🔹 3.3 **Lift**

**Definition**:  
Lift shows how much more likely Y is to occur with X than by random chance.

**Formula**:  
$$
\text{Lift}(X ⇒ Y) = \frac{\text{Confidence}(X ⇒ Y)}{\text{Support}(Y)} = \frac{P(Y|X)}{P(Y)}
$$

**Purpose**:
- Measures **correlation** between X and Y.
- Shows how much more likely Y is purchased when X is purchased, compared to random probability.

**Interpretation**:
- **Lift > 1**: Positive correlation  
- **Lift = 1**: No correlation  
- **Lift < 1**: Negative correlation

**Example**:
- Confidence = 0.50, Support of Bread = 0.30  
- Lift = $0.50 / 0.30 = \mathbf{1.67}$

---

### 🔹 3.4 **Leverage**

**Definition**:  
Leverage measures the difference between the observed frequency of X and Y occurring together and the expected frequency if they were independent.

**Formula**:  
$$
\text{Leverage}(X ⇒ Y) = \text{Support}(X ∪ Y) - (\text{Support}(X) \times \text{Support}(Y))
$$

**Purpose**:
- Measures the **statistical dependence** of X and Y.

**Example**:
- Support(X ∪ Y) = 0.20  
- Support(X) = 0.40, Support(Y) = 0.30  
- Leverage = $0.20 - (0.4 × 0.3) = \mathbf{0.08}$

---

### 🔹 3.5 **Conviction**

**Definition**:  
Conviction evaluates how often X appears **without** Y and compares this with the expected frequency of X appearing without Y.

**Formula**:  
$$
\text{Conviction}(X ⇒ Y) = \frac{1 - \text{Support}(Y)}{1 - \text{Confidence}(X ⇒ Y)}
$$

**Purpose**:
- Measures the **strength and direction** of implication.

**Interpretation**:
- Conviction = 1 → Rule has no effect  
- Conviction > 1 → Rule implies strong dependency

**Example**:
- Support(Y) = 0.30, Confidence = 0.50  
- Conviction = $(1 - 0.3) / (1 - 0.5) = 0.7 / 0.5 = \mathbf{1.4}$

---

## 🧠 **4. Summary Table**

| Metric     | Value Range | Good Indicator Of | Best Use |
|------------|-------------|-------------------|----------|
| **Support**    | 0 to 1       | Frequency           | Rule relevance |
| **Confidence** | 0 to 1       | Accuracy             | Rule reliability |
| **Lift**       | > 0          | Correlation          | Pattern strength |
| **Leverage**   | -1 to 1      | Statistical significance | Independence check |
| **Conviction** | > 0          | Rule implication     | Rule direction |

---

## ✅ **5. Practical Guidelines**

- Use **support** to eliminate rare itemsets early.  
- Use **confidence** to check the accuracy of rules.  
- Use **lift > 1** to identify interesting patterns.  
- **Leverage** and **conviction** are used to eliminate misleading rules.

---

## 🧪 **6. Real-Life Example: Market Basket**

**Transactions:**
1. Milk, Bread  
2. Milk, Butter  
3. Bread, Butter  
4. Milk, Bread, Butter  
5. Milk, Bread

**Rule: Milk ⇒ Bread**

- Support = $3 / 5 = 0.6$  
- Confidence = $3 / 4 = 0.75$  
- Support(Bread) = $4 / 5 = 0.8$  
- Lift = $0.75 / 0.8 = 0.9375$  
- Conviction = $(1 - 0.8) / (1 - 0.75) = 0.2 / 0.25 = 0.8$

---

## 🛠️ **7. Tools for Association Rule Mining**

- **Python Libraries**:
  - `mlxtend` (frequent_patterns module)
  - `apyori`
  - `Orange3`

- **R Packages**:
  - `arules`
  - `arulesViz`

--- 
---
---

# 📊 **1.4 Performance Metrics in Association Rule Mining**

Association Rule Mining is a key technique in data mining used to discover interesting relationships, patterns, or associations among a set of items in large datasets — especially in **market basket analysis**.

To evaluate and select **strong and interesting rules**, several performance metrics are used:

---

## 🔑 **Basic Terms**

Before jumping to the metrics, here are some basic definitions:

- **Itemset**: A collection of one or more items.  
- **Transaction**: A record of items bought together.  
- **Rule**: An implication of the form $X \Rightarrow Y$, where $X$ and $Y$ are itemsets, and $X \cap Y = \emptyset$.

---

## 📏 **Key Performance Metrics**

### 📏 **Key Performance Metrics** (Table Style with Inline LaTeX)

| **Metric**     | **Definition** | **Formula** | **Interpretation** |
|----------------|----------------|-------------|---------------------|
| **Support**    | How frequently an itemset occurs in the dataset | $\text{Support}(X \Rightarrow Y) = \frac{\text{Transactions with } X \cup Y}{\text{Total Transactions}}$ | Measures **relevance** of the rule |
| **Confidence** | How often items in $Y$ appear in transactions that contain $X$ | $\text{Confidence}(X \Rightarrow Y) = \frac{\text{Support}(X \cup Y)}{\text{Support}(X)}$ | Measures the **reliability** of the rule |
| **Lift**       | How much more likely $Y$ is to occur with $X$ than by chance | $\text{Lift}(X \Rightarrow Y) = \frac{\text{Confidence}(X \Rightarrow Y)}{\text{Support}(Y)}$ | Lift $>$ 1: Positive correlation, Lift = 1: No correlation, Lift $<$ 1: Negative correlation |
| **Leverage**   | Difference between observed and expected co-occurrence of $X$ and $Y$ | $\text{Leverage}(X \Rightarrow Y) = \text{Support}(X \cup Y) - \text{Support}(X) \times \text{Support}(Y)$ | Helps identify **statistical dependence** |
| **Conviction** | Compares how often $X$ occurs without $Y$ | $\text{Conviction}(X \Rightarrow Y) = \frac{1 - \text{Support}(Y)}{1 - \text{Confidence}(X \Rightarrow Y)}$ | Conviction $>$ 1 implies stronger association |

---

## 🧠 **Understanding with an Example**

Let’s say we have 100 transactions and:

- 40 transactions include milk ($X$)  
- 30 include bread ($Y$)  
- 20 include both milk and bread  

Then:

- $$\text{Support}(X \Rightarrow Y) = \frac{20}{100} = 0.20$$  
- $$\text{Confidence}(X \Rightarrow Y) = \frac{20}{40} = 0.50$$  
- $$\text{Lift}(X \Rightarrow Y) = \frac{0.50}{0.30} = 1.67$$  
- $$\text{Leverage}(X \Rightarrow Y) = 0.20 - (0.4 \times 0.3) = 0.08$$  
- $$\text{Conviction}(X \Rightarrow Y) = \frac{0.7}{0.5} = 1.4$$

---

## 🧾 **How to Use These Metrics in Rule Evaluation**

| Use Case                        | Recommended Metric         |
|----------------------------------|----------------------------|
| Want frequently occurring rules | **Support**                |
| Want accurate predictions       | **Confidence**, **Lift**   |
| Want statistical strength       | **Lift**, **Leverage**     |
| Want directional rules          | **Conviction**             |

---

## ⚠️ **Important Notes**

- High **support** ensures relevance but might miss rare but interesting rules.  
- High **confidence** does not imply causality — hence **lift** and **leverage** are used for deeper insight.  
- Use **minimum threshold values** to prune weak or redundant rules.

---

Let me know if you want this exported to PDF, DOCX, or any formatted file!