# Lab 2: Apriori Algorithm and Its Details

## Objective:
The Apriori Algorithm is used for mining frequent itemsets and generating association rules in a transactional dataset. It is a fundamental algorithm in the field of **Market Basket Analysis**, where it is used to find patterns and relationships between products purchased together.

---

## Theory

The **Apriori Algorithm** follows an iterative approach to discover frequent itemsets in a dataset. Frequent itemsets are sets of items that appear together frequently in transactions. The algorithm uses **support** and **confidence** to determine the frequency and strength of association rules.


### Key Concepts:

1. **Transaction**: A record that consists of a set of items purchased by a customer.
2. **Itemset**: A collection of items from a transaction.
3. **Frequent Itemset**: An itemset whose support count is above a predefined threshold, called **Minimum Support**.
4. **Support**: The frequency or proportion of transactions that contain a specific itemset.

   The formula for support is:

   $$ 
   \text{Support}(A) = \frac{\text{Count of transactions containing A}}{\text{Total number of transactions}} 
   $$

5. **Confidence**: A measure of the likelihood that an itemset occurs, given that another itemset occurs. It is used to generate **association rules**.

   The formula for confidence is:

   $$ 
   \text{Confidence}(A \to B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)} 
   $$
   
---

## Algorithm

The Apriori algorithm proceeds through the following steps:

### Step-by-Step Procedure:

1. **Generate Candidate Itemsets**:
   - Start by identifying **1-itemsets** (single items in the dataset).
   - Then, iteratively combine frequent itemsets of size `k` to generate candidate itemsets of size `k+1`.

2. **Calculate Support for Itemsets**:
   - For each candidate itemset, calculate the **support** by counting how many transactions contain that itemset.

3. **Prune Itemsets**:
   - Remove itemsets that do not meet the minimum support threshold.
   
4. **Repeat**:
   - Repeat steps 1-3 for larger itemsets until no more frequent itemsets are found.

5. **Generate Association Rules**:
   - From the frequent itemsets, generate **association rules** with a minimum confidence threshold.
   - The rules are of the form: **{Item A} → {Item B}**, meaning if **Item A** is bought, **Item B** is likely to be bought.

---

## Example

**Transactions**:

| Transaction ID | Items Purchased           |
|----------------|---------------------------|
| 1              | Milk, Bread, Butter       |
| 2              | Bread, Butter             |
| 3              | Milk, Bread               |
| 4              | Milk, Bread, Butter, Eggs |
| 5              | Bread, Eggs               |

**Support Calculation**:
- **Support for {Milk}** = 3/5 = 0.6 (60% of transactions contain Milk)
- **Support for {Bread}** = 4/5 = 0.8 (80% of transactions contain Bread)

---

## Steps in the Apriori Algorithm:

1. **Generate Candidate Itemsets**:
   - Start with **1-itemsets**: {Milk}, {Bread}, {Butter}, {Eggs}.
   - Then, generate **2-itemsets** from frequent 1-itemsets: {Milk, Bread}, {Milk, Butter}, {Bread, Butter}, etc.

2. **Calculate Support**:
   - Calculate support for each itemset by counting how many transactions contain them.

3. **Prune Itemsets**:
   - Discard itemsets that do not meet the minimum support threshold.

4. **Generate Association Rules**:
   - From frequent itemsets, generate association rules, e.g., {Milk} → {Bread} with high confidence.

---


The Apriori Algorithm is a powerful technique for discovering frequent patterns and associations in large datasets. It is widely used in **Market Basket Analysis** to help retailers understand customer purchasing behavior. By applying the Apriori algorithm, retailers can improve marketing strategies, product placement, and inventory management.

---

This **Lab 2** will provide practical knowledge of how frequent itemsets are generated and association rules are derived using the Apriori algorithm.


In [8]:
# importing libraries 
from itertools import combinations


In [9]:
# Calculating support for each candidate itemset in the transactions.

def calculate_support(transactions, candidates):
    support_count = {itemset: 0 for itemset in candidates}
    for transaction in transactions:
        for itemset in candidates:
            if itemset.issubset(transaction):
                support_count[itemset] += 1
    total_transactions = len(transactions)
    return {itemset: count / total_transactions for itemset, count in support_count.items()}


In [10]:

# Remove itemsets that do not meet the minimum support threshold.

def prune_itemsets(support_count, min_support):
    return {itemset: support for itemset, support in support_count.items() if support >= min_support}


In [11]:

# Generate candidate k-itemsets from the current frequent (k-1)-itemsets.


def generate_candidates(frequent_itemsets, k):
    frequent_items = list(frequent_itemsets.keys())
    candidates = set()
    for i in range(len(frequent_items)):
        for j in range(i + 1, len(frequent_items)):
            union_set = frequent_items[i].union(frequent_items[j])
            if len(union_set) == k:  # Ensure the candidate is of size k
                candidates.add(union_set)
    return candidates


In [12]:

# Apriori algorithm to find all frequent itemsets.


def apriori(transactions, min_support):
    transactions = [set(transaction) for transaction in transactions]  # Convert transactions to sets
    single_items = {frozenset([item]) for transaction in transactions for item in transaction}  # 1-itemsets
    frequent_itemsets = {}
    current_itemsets = single_items
    k = 1

    while current_itemsets:
        # Calculate support for current itemsets
        support_count = calculate_support(transactions, current_itemsets)

        # Prune itemsets that don't meet the minimum support threshold
        current_itemsets = prune_itemsets(support_count, min_support)

        # Add frequent itemsets to the final result
        frequent_itemsets.update(current_itemsets)

        # Generate candidates for the next level
        k += 1
        current_itemsets = generate_candidates(current_itemsets, k)

    return frequent_itemsets


In [13]:
# Generate association rules from frequent itemsets.


def generate_association_rules(frequent_itemsets, min_confidence):

    rules = []
    for itemset in frequent_itemsets:
        if len(itemset) > 1:
            for antecedent in map(frozenset, combinations(itemset, len(itemset) - 1)):
                consequent = itemset - antecedent
                if frequent_itemsets[antecedent] > 0:
                    confidence = frequent_itemsets[itemset] / frequent_itemsets[antecedent]
                    if confidence >= min_confidence:
                        rules.append((antecedent, consequent, confidence))
    return rules
    

In [14]:
# Example dataset
transactions = [
    ['Milk', 'Bread', 'Eggs'],      # Transaction 1
    ['Milk', 'Bread', 'Butter'],    # Transaction 2
    ['Bread', 'Butter', 'Cheese'],  # Transaction 3
    ['Milk', 'Bread', 'Butter', 'Cheese', 'Eggs'],  # Transaction 4
    ['Cheese', 'Eggs'],             # Transaction 5
    ['Milk', 'Eggs'],               # Transaction 6
    ['Milk', 'Bread', 'Cheese'],    # Transaction 7
]

# Parameters
min_support = 0.4  # Minimum support threshold (e.g., 40%)
min_confidence = 0.6  # Minimum confidence threshold (e.g., 60%)

# Run Apriori algorithm
frequent_itemsets = apriori(transactions, min_support)

# Display frequent itemsets
print("Frequent Itemsets:")
for itemset, support in frequent_itemsets.items():
    print(f"Itemset: {set(itemset)}, Support: {support:.2f}")

# Generate and display association rules
rules = generate_association_rules(frequent_itemsets, min_confidence)
print("\nAssociation Rules:")
for antecedent, consequent, confidence in rules:
    print(f"If {set(antecedent)} -> {set(consequent)}, Confidence: {confidence:.2f}")


Frequent Itemsets:
Itemset: {'Cheese'}, Support: 0.57
Itemset: {'Butter'}, Support: 0.43
Itemset: {'Eggs'}, Support: 0.57
Itemset: {'Bread'}, Support: 0.71
Itemset: {'Milk'}, Support: 0.71
Itemset: {'Eggs', 'Milk'}, Support: 0.43
Itemset: {'Bread', 'Cheese'}, Support: 0.43
Itemset: {'Bread', 'Milk'}, Support: 0.57
Itemset: {'Butter', 'Bread'}, Support: 0.43

Association Rules:
If {'Eggs'} -> {'Milk'}, Confidence: 0.75
If {'Milk'} -> {'Eggs'}, Confidence: 0.60
If {'Bread'} -> {'Cheese'}, Confidence: 0.60
If {'Cheese'} -> {'Bread'}, Confidence: 0.75
If {'Bread'} -> {'Milk'}, Confidence: 0.80
If {'Milk'} -> {'Bread'}, Confidence: 0.80
If {'Butter'} -> {'Bread'}, Confidence: 1.00
If {'Bread'} -> {'Butter'}, Confidence: 0.60
