![](images/associ.png)

- support level = 33%
- confidence level= 50%


- $Rule: X => Y$ 
 - $support =freq(X,Y)/N $
 - $Confidence = freq(X,Y)/freq(X) $

## Apriori Algorithm: Association Rule and Support

### Rule $ X \Rightarrow Y $ in Apriori Algorithm

In the **Apriori algorithm**, the rule $ X \Rightarrow Y $ represents an **association rule** where:

- **$ X $** is the **antecedent** (left-hand side), which is a set of items or attributes.
- **$ Y $** is the **consequent** (right-hand side), which is another set of items or attributes.

This rule means that **if the items in $ X $ are present in a transaction, then the items in $ Y $ are likely to also be present** in the same transaction. The goal of the Apriori algorithm is to find such association rules in a dataset of transactions.

### Example:
If $ X = \{ \text{Bread} \} $ and $ Y = \{ \text{Butter} \} $, the rule $ \{ \text{Bread} \} \Rightarrow \{ \text{Butter} \} $ means:

- If a customer buys bread (antecedent), then they are likely to buy butter as well (consequent).

### Key Metrics Used in Apriori:
- **Support**: The proportion of transactions in the dataset that contain both $ X $ and $ Y $. It shows how frequently the itemset appears in the dataset.
- **Confidence**: The probability that a transaction containing $ X $ also contains $ Y $. It measures the strength of the rule.
- **Lift**: The increase in the likelihood of $ Y $ occurring when $ X $ is present, compared to when $ Y $ occurs without $ X $. A lift greater than 1 indicates a positive association.

The Apriori algorithm uses these metrics to generate rules that are statistically significant.

---

### Support Definition

In the context of the **Apriori algorithm** and **association rule mining**, **support** is a measure of how frequently a particular itemset (a set of items) appears in the dataset. It helps to identify the significance of the itemset in the context of all transactions.

#### **Definition of Support:**

The support of an itemset $ X $ is defined as the proportion of transactions in the dataset that contain $ X $. 

Mathematically, it is calculated as:

$ 
\text{Support}(X) = \frac{\text{Number of transactions that contain } X}{\text{Total number of transactions}} 
$

#### **Example:**
Suppose we have a dataset of 100 transactions, and itemset $ X = \{ \text{Bread} \} $ appears in 30 of those transactions. The support of the itemset $ \{ \text{Bread} \} $ would be:

$ 
\text{Support}(\{ \text{Bread} \}) = \frac{30}{100} = 0.30 
$

This means that 30% of the transactions contain "Bread".

#### **Support for an Association Rule:**
For an association rule $ X \Rightarrow Y $, the **support** is the proportion of transactions that contain both $ X $ and $ Y $. In this case, it is calculated as:

$ 
\text{Support}(X \Rightarrow Y) = \frac{\text{Number of transactions that contain both } X \text{ and } Y}{\text{Total number of transactions}} 
$

This gives a sense of how common or frequent the rule is in the entire dataset. High support means the rule applies to a large portion of the transactions.


___

## Confidence in Association Rule Mining

In association rule mining, **confidence** is a metric that measures the strength of an association rule. It represents the probability that the consequent \( Y \) occurs given that the antecedent \( X \) has occurred. In other words, it quantifies how likely \( Y \) is to appear in transactions where \( X \) is present.

### **Definition of Confidence:**

The **confidence** of an association rule \( X \Rightarrow Y \) is defined as the conditional probability that \( Y \) occurs given \( X \). Mathematically, it is calculated as:

$$
\text{Confidence}(X \Rightarrow Y) = P(Y | X) = \frac{P(X \cap Y)}{P(X)}
$$

Where:
- \( P(X \cap Y) \) is the **joint probability** (or support) of both \( X \) and \( Y \) occurring together.
- \( P(X) \) is the **probability** (or support) of \( X \) occurring.

### **Interpretation of Confidence:**
- **Confidence = 1**: This means that whenever \( X \) occurs, \( Y \) always occurs as well. There is a perfect association between \( X \) and \( Y \).
- **Confidence < 1**: This means that \( Y \) does not always occur when \( X \) occurs. The rule is not perfect.
- **Confidence = 0**: This means that \( Y \) never occurs when \( X \) occurs.

### **Example:**

If the rule is \( \{ \text{Bread} \} \Rightarrow \{ \text{Butter} \} \), and:

- The **support** of \( \{ \text{Bread} \} \) is \( P(X) = 0.30 \) (30% of transactions contain "Bread").
- The **support** of \( \{ \text{Bread, Butter} \} \) is \( P(X \cap Y) = 0.15 \) (15% of transactions contain both "Bread" and "Butter").

The **confidence** of the rule \( \{ \text{Bread} \} \Rightarrow \{ \text{Butter} \} \) is:

$$
\text{Confidence}(\{ \text{Bread} \} \Rightarrow \{ \text{Butter} \}) = \frac{P(\{ \text{Bread, Butter} \})}{P(\{ \text{Bread} \})} = \frac{0.15}{0.30} = 0.50
$$

This means that, given that a customer buys "Bread", there is a 50% chance that they will also buy "Butter".


____


## Lift in Association Rule Mining

In association rule mining, **Lift** is a metric that measures the strength of a rule by comparing the **observed** co-occurrence of items \( X \) and \( Y \) with the **expected** co-occurrence of these items if they were **independent**. It is used to determine whether the occurrence of \( X \) increases the likelihood of \( Y \) occurring, compared to when they are independent of each other.

<br>
In association rule mining, Lift is a metric that measures the strength of a rule by comparing the probability of the consequent 𝑌 occurring, given that the antecedent 𝑋 has occurred, to the probability of the consequent Y occurring independently of the antecedent.

### **Definition of Lift**:

The **Lift** of an association rule \( X \Rightarrow Y \) is calculated by comparing the **joint probability** of \( X \) and \( Y \) occurring together to the **expected probability** of \( Y \) occurring, assuming that \( X \) and \( Y \) are independent.

Mathematically, the **Lift** is given by:

$$
\text{Lift}(X \Rightarrow Y) = \frac{P(X \cap Y)}{P(X) \cdot P(Y)}
$$

Where:
- \( P(X \cap Y) \) is the **joint probability** (or **support**) of both \( X \) and \( Y \) occurring together.
- \( P(X) \) is the **probability** (or **support**) of \( X \) occurring.
- \( P(Y) \) is the **probability** (or **support**) of \( Y \) occurring.

### **Interpretation of Lift**:
- **Lift = 1**: If \( X \) and \( Y \) are independent, the occurrence of \( X \) does not affect the occurrence of \( Y \), so the Lift is 1. This means \( P(X \cap Y) = P(X) \cdot P(Y) \).
- **Lift > 1**: If Lift is greater than 1, it indicates a **positive association** between \( X \) and \( Y \). The occurrence of \( X \) increases the probability of \( Y \) occurring. The higher the Lift, the stronger the association.
- **Lift < 1**: If Lift is less than 1, it indicates a **negative association** between \( X \) and \( Y \). The occurrence of \( X \) decreases the probability of \( Y \) occurring.

### **Example of Lift Calculation**:

Consider the following scenario for the association rule \( X = \{\text{Bread}\} \Rightarrow Y = \{\text{Butter}\} \):

- The **support** of \( X = \{\text{Bread}\} \) is \( P(X) = 0.30 \) (30% of the transactions contain "Bread").
- The **support** of \( Y = \{\text{Butter}\} \) is \( P(Y) = 0.20 \) (20% of the transactions contain "Butter").
- The **support** of \( X \cap Y = \{\text{Bread, Butter}\} \) is \( P(X \cap Y) = 0.15 \) (15% of the transactions contain both "Bread" and "Butter").

Now, we can calculate the **Lift** of the rule \( X \Rightarrow Y \) as:

$$
\text{Lift}(X \Rightarrow Y) = \frac{P(X \cap Y)}{P(X) \cdot P(Y)} = \frac{0.15}{0.30 \cdot 0.20} = \frac{0.15}{0.06} = 2.5
$$

This means that the rule \( X = \{\text{Bread}\} \Rightarrow Y = \{\text{Butter}\} \) has a **positive association**. The occurrence of "Bread" increases the likelihood of buying "Butter" by a factor of 2.5 times compared to when "Bread" and "Butter" are independent.

### **Key Insights from Lift**:
- **Lift = 1**: No association (independent items).
- **Lift > 1**: Positive association (the presence of \( X \) increases the likelihood of \( Y \) occurring).
- **Lift < 1**: Negative association (the presence of \( X \) decreases the likelihood of \( Y \) occurring).

### **Usefulness of Lift**:
- **Lift** is a powerful metric because it takes into account the **overall frequencies** of \( X \) and \( Y \) in the dataset. It tells us whether the rule is **statistically significant** beyond what would be expected if the two items were independent.
- A higher **Lift** indicates a stronger and more meaningful rule, which can be especially useful when identifying important relationships in large datasets.

### **Conclusion**:

Lift is an important measure in association rule mining, helping to understand the strength of the relationship between two items. It corrects for the frequency of individual items, providing a more accurate picture of how much one item influences the occurrence of another.


#### 1- item set 
- milk - 9(freq)  | support =(9/12) | taken
- bread - 10(freq)
- butter- 10(freq)
- Egg - 3(freq)
- ketchup - 3(freq) | support = (3/12)=25% | not taken
- cookies- 5(freq)


___

#### frequent 1 item set
- milk
- bread
- butter
- cookies

___
#### 2-item sets
- milk,bread - 7
- milk,butter - 7
- milk, cookies - 3
- bread,butter - 9
- butter, cookies - 3
- bread, cookies - 4
___

#### frequent 2 item sets
- milk ,bread - 7
- milk , butter - 7
- break,butter - 9
- bread, cookies - 4

____

![](images/3_item.png)