<a href="https://colab.research.google.com/github/Geetanshi-jain/DSAssignmentByGeetanshijain/blob/main/day_12_Association_rules.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Association Rules

## Introduction to Association Rules
Imagine a rule like this: **“If a customer buys bread, they also buy butter.”** This is an Association Rule. We use it to find patterns in data, like which products are often bought together in shopping carts.

For example, if people who buy "diapers" also often buy "beer," this insight can help stores arrange items or plan promotions.

### Important Terms
- **Support**: The percentage that shows how often both items (like bread and butter) are bought together across all transactions. For example, if bread and butter are bought together in 10 out of 100 transactions, the support is 10%.
- **Confidence**: Measures how often the second item (like butter) is bought when the first item (like bread) is bought. If this happens in 8 out of 10 cases, then the confidence is 80%.
- **Lift**: Tells us how much better our confidence is compared to random chance. For example, if lift is 2, it means the chance of buying butter when bread is bought is twice as high as buying butter randomly.
- **Antecedent**: The condition part (the first item, like bread) in an association rule.
- **Consequent**: The outcome or the item we are predicting (like butter).

## How to Mine Association Rules Using Python
In Python, finding association rules is easy using the `mlxtend` library's `apriori()` function. Let’s say we’re working with a dataset called `transactions`.

### Step 1: Prepare the Data
First, load the data and convert it into a one-hot encoded DataFrame.

```python
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

# Sample data
transactions = [
    ['bread', 'butter'],
    ['bread', 'diaper', 'beer', 'eggs'],
    ['milk', 'diaper', 'beer', 'coke'],
    ['bread', 'milk', 'diaper', 'beer'],
    ['bread', 'milk', 'diaper', 'coke']
]

te = TransactionEncoder()
te_array = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_array, columns=te.columns_)
```

### Step 2: Mine the Rules
Now, use the `apriori()` and `association_rules()` commands to find the rules.

```python
from mlxtend.frequent_patterns import apriori, association_rules

# Set minimum support to 0.01 (1%)
frequent_itemsets = apriori(df, min_support=0.01, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.4)
```

### Step 3: Check the Rules
To see the top 10 rules, sort by lift.

```python
# Sort rules by lift to see strongest associations
rules = rules.sort_values(by="lift", ascending=False)
print(rules.head(10))  # Show top 10 rules by lift
```

---

## Confidence Difference Criterion
The **Confidence Difference Criterion** checks how different the rule’s confidence is compared to the random chance of the consequent. This means we only keep rules where the confidence is very different from random selection.

### How to Apply Confidence Difference Criterion in Python

1. **Calculate Confidence Difference**: Add a new column showing the difference between the rule’s confidence and the consequent’s overall probability.

```python
# Calculate average probability of each item being bought
consequent_support = df.mean()
rules['consequent_support'] = rules['consequent'].apply(lambda x: consequent_support[x])
rules['confidence_difference'] = abs(rules['confidence'] - rules['consequent_support'])
```

2. **Filter by Confidence Difference**: Set a threshold (e.g., 0.4) to keep rules where the confidence difference is high.

```python
# Filter rules with a confidence difference threshold
confidence_diff_rules = rules[rules['confidence_difference'] >= 0.4]
print(confidence_diff_rules.head(10))  # Show top 10 rules by confidence difference
```

---

## Confidence Quotient Criterion
The **Confidence Quotient Criterion** checks the ratio of the rule’s confidence to the random chance of the consequent. If this ratio is 0.4 or higher, we consider the rule interesting.

### How to Apply Confidence Quotient Criterion in Python

1. **Calculate Confidence Quotient**: Add a column showing the confidence quotient for each rule.

```python
# Calculate confidence quotient
rules['confidence_quotient'] = rules['confidence'] / rules['consequent_support']
```

2. **Filter by Confidence Quotient**: Set a threshold (e.g., 0.4) to keep rules with a high confidence-to-random-chance ratio.

```python
# Filter rules with a confidence quotient threshold
confidence_quotient_rules = rules[rules['confidence_quotient'] >= 0.4]
print(confidence_quotient_rules.head(10))  # Show top 10 rules by confidence quotient
```

---

### Summary
- **Association Rules** help find patterns, like which products are bought together.
- **Confidence Difference** keeps only rules where confidence is very different from random chance.
- **Confidence Quotient** keeps rules with a high confidence-to-random-chance ratio.
```
