# Lab 2: Apriori Algorithm and Its Details

## Objective:
The Apriori Algorithm is used for mining frequent itemsets and generating association rules in a transactional dataset. It is a fundamental algorithm in the field of **Market Basket Analysis**, where it is used to find patterns and relationships between products purchased together.

---

## Theory

The **Apriori Algorithm** follows an iterative approach to discover frequent itemsets in a dataset. Frequent itemsets are sets of items that appear together frequently in transactions. The algorithm uses **support** and **confidence** to determine the frequency and strength of association rules.


### Key Concepts:

1. **Transaction**: A record that consists of a set of items purchased by a customer.
2. **Itemset**: A collection of items from a transaction.
3. **Frequent Itemset**: An itemset whose support count is above a predefined threshold, called **Minimum Support**.
4. **Support**: The frequency or proportion of transactions that contain a specific itemset.

   The formula for support is:

   $$ 
   \text{Support}(A) = \frac{\text{Count of transactions containing A}}{\text{Total number of transactions}} 
   $$

5. **Confidence**: A measure of the likelihood that an itemset occurs, given that another itemset occurs. It is used to generate **association rules**.

   The formula for confidence is:

   $$ 
   \text{Confidence}(A \to B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)} 
   $$
   
---

## Algorithm

The Apriori algorithm proceeds through the following steps:

### Step-by-Step Procedure:

1. **Generate Candidate Itemsets**:
   - Start by identifying **1-itemsets** (single items in the dataset).
   - Then, iteratively combine frequent itemsets of size `k` to generate candidate itemsets of size `k+1`.

2. **Calculate Support for Itemsets**:
   - For each candidate itemset, calculate the **support** by counting how many transactions contain that itemset.

3. **Prune Itemsets**:
   - Remove itemsets that do not meet the minimum support threshold.
   
4. **Repeat**:
   - Repeat steps 1-3 for larger itemsets until no more frequent itemsets are found.

5. **Generate Association Rules**:
   - From the frequent itemsets, generate **association rules** with a minimum confidence threshold.
   - The rules are of the form: **{Item A} → {Item B}**, meaning if **Item A** is bought, **Item B** is likely to be bought.

---

## Example

**Transactions**:

| Transaction ID | Items Purchased           |
|----------------|---------------------------|
| 1              | Milk, Bread, Butter       |
| 2              | Bread, Butter             |
| 3              | Milk, Bread               |
| 4              | Milk, Bread, Butter, Eggs |
| 5              | Bread, Eggs               |

**Support Calculation**:
- **Support for {Milk}** = 3/5 = 0.6 (60% of transactions contain Milk)
- **Support for {Bread}** = 4/5 = 0.8 (80% of transactions contain Bread)

---

## Steps in the Apriori Algorithm:

1. **Generate Candidate Itemsets**:
   - Start with **1-itemsets**: {Milk}, {Bread}, {Butter}, {Eggs}.
   - Then, generate **2-itemsets** from frequent 1-itemsets: {Milk, Bread}, {Milk, Butter}, {Bread, Butter}, etc.

2. **Calculate Support**:
   - Calculate support for each itemset by counting how many transactions contain them.

3. **Prune Itemsets**:
   - Discard itemsets that do not meet the minimum support threshold.

4. **Generate Association Rules**:
   - From frequent itemsets, generate association rules, e.g., {Milk} → {Bread} with high confidence.

---


The Apriori Algorithm is a powerful technique for discovering frequent patterns and associations in large datasets. It is widely used in **Market Basket Analysis** to help retailers understand customer purchasing behavior. By applying the Apriori algorithm, retailers can improve marketing strategies, product placement, and inventory management.

---

This **Lab 2** will provide practical knowledge of how frequent itemsets are generated and association rules are derived using the Apriori algorithm.


In [2]:
# Sample transaction dataset
transactions = [
    ['Milk', 'Bread', 'Butter'],
    ['Bread', 'Butter'],
    ['Milk', 'Bread'],
    ['Milk', 'Bread', 'Butter', 'Eggs'],
    ['Bread', 'Eggs']
]

# Display the dataset
for i, transaction in enumerate(transactions, start=1):
    print(f"Transaction {i}: {transaction}")


Transaction 1: ['Milk', 'Bread', 'Butter']
Transaction 2: ['Bread', 'Butter']
Transaction 3: ['Milk', 'Bread']
Transaction 4: ['Milk', 'Bread', 'Butter', 'Eggs']
Transaction 5: ['Bread', 'Eggs']
