In [5]:
from mlxtend.frequent_patterns import apriori, association_rules
import pandas as pd

# Sample dataset
data = {
    'Bread': [1, 1, 0, 1, 1],
    'Milk': [1, 1, 1, 0, 1],
    'Cheese': [0, 1, 1, 1, 0],
    'Butter': [0, 1, 0, 1, 1],
    'Apples': [1, 0, 1, 0, 1],
}

# Convert data into a DataFrame
df = pd.DataFrame(data)

# Step 1: Generate frequent itemsets
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)
print("Frequent Itemsets:")
print(frequent_itemsets)

# Step 2: Generate association rules
rules = association_rules(frequent_itemsets, metric="lift",num_itemsets=1, min_threshold=1.0)
print("\nAssociation Rules:")
print(rules)


Frequent Itemsets:
   support         itemsets
0      0.8          (Bread)
1      0.8           (Milk)
2      0.6         (Cheese)
3      0.6         (Butter)
4      0.6         (Apples)
5      0.6    (Bread, Milk)
6      0.6  (Bread, Butter)
7      0.6   (Apples, Milk)

Association Rules:
  antecedents consequents  antecedent support  consequent support  support  \
0     (Bread)    (Butter)                 0.8                 0.6      0.6   
1    (Butter)     (Bread)                 0.6                 0.8      0.6   
2    (Apples)      (Milk)                 0.6                 0.8      0.6   
3      (Milk)    (Apples)                 0.8                 0.6      0.6   

   confidence  lift  representativity  leverage  conviction  zhangs_metric  \
0        0.75  1.25               1.0      0.12         1.6            1.0   
1        1.00  1.25               1.0      0.12         inf            0.5   
2        1.00  1.25               1.0      0.12         inf            0.5   
3     



In [6]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, fpmax, fpgrowth


dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

frequent_itemsets = fpgrowth(df, min_support=0.6, use_colnames=True)
### alternatively:
#frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
#frequent_itemsets = fpmax(df, min_support=0.6, use_colnames=True)

frequent_itemsets

Unnamed: 0,support,itemsets
0,1.0,(Kidney Beans)
1,0.8,(Eggs)
2,0.6,(Yogurt)
3,0.6,(Onion)
4,0.6,(Milk)
5,0.8,"(Kidney Beans, Eggs)"
6,0.6,"(Yogurt, Kidney Beans)"
7,0.6,"(Eggs, Onion)"
8,0.6,"(Kidney Beans, Onion)"
9,0.6,"(Kidney Beans, Eggs, Onion)"


In [8]:
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.8,(Eggs)
1,1.0,(Kidney Beans)
2,0.6,(Milk)
3,0.6,(Onion)
4,0.6,(Yogurt)
5,0.8,"(Kidney Beans, Eggs)"
6,0.6,"(Eggs, Onion)"
7,0.6,"(Kidney Beans, Milk)"
8,0.6,"(Kidney Beans, Onion)"
9,0.6,"(Yogurt, Kidney Beans)"


In [9]:
frequent_itemsets = fpmax(df, min_support=0.6, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.6,"(Kidney Beans, Milk)"
1,0.6,"(Kidney Beans, Eggs, Onion)"
2,0.6,"(Yogurt, Kidney Beans)"


# Comparison of Apriori, FP-Growth, and FP-Max Algorithms

## 1. Apriori Algorithm
**Description**:  
The Apriori algorithm is a classic approach for finding frequent itemsets and generating association rules. It works by iteratively generating candidate itemsets and pruning those that are not frequent based on the **Apriori property**: *If an itemset is frequent, all its subsets must also be frequent.*

### Steps:
1. Generate candidate itemsets of size `k` from frequent itemsets of size `k-1`.
2. Scan the database to calculate the support of each candidate.
3. Prune candidates that do not meet the minimum support threshold.
4. Repeat until no more frequent itemsets can be generated.

**Advantages**:
- Simple and intuitive for small datasets.  

**Disadvantages**:
- Computationally expensive due to repeated database scans.
- High memory usage from generating many candidate itemsets.

---

## 2. FP-Growth Algorithm
**Description**:  
FP-Growth is a more efficient alternative to Apriori. It avoids generating candidate itemsets explicitly by compressing the transaction database into a **Frequent Pattern Tree (FP-tree)**. Frequent itemsets are extracted from this tree recursively.

### Steps:
1. **Construct FP-tree**:
   - Scan the database once to calculate item frequencies and sort items by descending frequency.
   - Build a tree structure where each path represents a transaction.
2. **Extract Frequent Patterns**:
   - Recursively mine the FP-tree to find frequent itemsets.

**Advantages**:
- Faster than Apriori, especially for large datasets.
- Requires only two database scans.

**Disadvantages**:
- Complex tree construction and recursive processing.
- May require significant memory for dense datasets.

---

## 3. FP-Max Algorithm
**Description**:  
FP-Max focuses on finding **maximal frequent itemsets**, which are frequent itemsets that do not have any frequent supersets. This reduces redundancy by only outputting the largest frequent patterns.

### Steps:
1. Builds the FP-tree (similar to FP-Growth).
2. Mines frequent itemsets, but prunes itemsets as soon as it determines that their supersets cannot be frequent.

**Advantages**:
- Produces a concise result, focusing only on maximal patterns.
- Reduces the number of patterns generated.

**Disadvantages**:
- Does not output all frequent itemsets, which might be required in some applications.

---

## Comparison Table

| Feature           | Apriori                  | FP-Growth               | FP-Max                    |
|--------------------|--------------------------|--------------------------|---------------------------|
| **Output**         | All frequent itemsets    | All frequent itemsets    | Maximal frequent itemsets |
| **Efficiency**     | Slower for large datasets| Faster than Apriori      | Faster than Apriori       |
| **Memory Usage**   | High (many candidates)   | Moderate (FP-tree)       | Lower (prunes supersets)  |
| **Database Scans** | Multiple                 | Two                      | Two                       |

---

### Support Calculation
For all three algorithms, **support** is calculated as:
\[
\text{Support of an itemset} = \frac{\text{Number of transactions containing the itemset}}{\text{Total number of transactions}}
\]
- Example: If an itemset `{Bread, Milk}` appears in 3 out of 5 transactions, its support is:
  \[
  \text{Support} = \frac{3}{5} = 0.6
  \]

---

### When to Use:
- **Apriori**: For small datasets or educational purposes.
- **FP-Growth**: For large datasets with many items and transactions.
- **FP-Max**: When you need only the largest frequent itemsets to reduce redundancy or for concise results.


____
____

# Apriori Algorithm Example with Dataset
### Generation Rule:
 - Combine two frequent itemsets of size k−1 if they share k−2 items in common.

 -  For example, if k−1=2, then itemsets {A,B} and {A,C} can be combined to form {A,B,C}, provided they share the common item A.

#### Pruning Step:

- After generating candidate itemsets, remove those that contain any subset that is not frequent. This is based on the Apriori property, which states that all subsets of a frequent itemset must also be frequent.
___
### Dataset: Transactions
We will use the following dataset of transactions:

| Transaction ID | Items Bought             |
|----------------|---------------------------|
| 1              | Bread, Milk, Egg         |
| 2              | Bread, Milk              |
| 3              | Milk, Egg, Butter        |
| 4              | Bread, Egg, Butter       |
| 5              | Bread, Milk, Egg, Butter |

---

### Parameters:
- **Minimum Support** = 0.4 (40%)
- **Minimum Confidence** = 0.6 (60%)

---

### Step-by-Step Execution of the Apriori Algorithm

#### Step 1: Calculate Support for Single Items
The algorithm begins by calculating the support for individual items. 

| Item    | Support Count | Support  |
|---------|---------------|----------|
| Bread   | 4             | 0.8      |
| Milk    | 4             | 0.8      |
| Egg     | 4             | 0.8      |
| Butter  | 3             | 0.6      |

- All items meet the **minimum support threshold** of 0.4 and are considered frequent.

---

#### Step 2: Generate Candidate Itemsets of Size 2
Next, single items are combined into pairs, and the support for each pair is calculated.

| Itemset        | Support Count | Support  |
|----------------|---------------|----------|
| {Bread, Milk}  | 3             | 0.6      |
| {Bread, Egg}   | 3             | 0.6      |
| {Bread, Butter}| 2             | 0.4      |
| {Milk, Egg}    | 3             | 0.6      |
| {Milk, Butter} | 2             | 0.4      |
| {Egg, Butter}  | 3             | 0.6      |

- All itemsets of size 2 meet the **minimum support threshold** of 0.4 and are retained as frequent itemsets.

---

#### Step 3: Generate Candidate Itemsets of Size 3
Frequent itemsets of size 2 are combined to form candidates of size 3, and their support is calculated.

| Itemset               | Support Count | Support  |
|-----------------------|---------------|----------|
| {Bread, Milk, Egg}    | 2             | 0.4      |
| {Bread, Milk, Butter} | 1             | 0.2      |
| {Bread, Egg, Butter}  | 2             | 0.4      |
| {Milk, Egg, Butter}   | 2             | 0.4      |

- Only `{Bread, Milk, Egg}`, `{Bread, Egg, Butter}`, and `{Milk, Egg, Butter}` meet the **minimum support threshold** of 0.4.

---

#### Step 4: Generate Association Rules
The algorithm generates rules from frequent itemsets and calculates the confidence for each rule.

##### Example Calculations:
- Rule: `{Bread, Milk} → Egg`
  - **Support** = 2/5 = 0.4
  - **Confidence** = Support({Bread, Milk, Egg}) / Support({Bread, Milk}) = 0.4 / 0.6 = 0.67
- Rule: `{Milk, Egg} → Butter`
  - **Support** = 2/5 = 0.4
  - **Confidence** = Support({Milk, Egg, Butter}) / Support({Milk, Egg}) = 0.4 / 0.6 = 0.67

Only rules with **confidence ≥ 0.6** are retained.

---

### Final Frequent Itemsets and Rules

#### Frequent Itemsets:
- **Size 1**: `{Bread}`, `{Milk}`, `{Egg}`, `{Butter}`
- **Size 2**: `{Bread, Milk}`, `{Bread, Egg}`, `{Milk, Egg}`, `{Egg, Butter}`
- **Size 3**: `{Bread, Milk, Egg}`, `{Bread, Egg, Butter}`, `{Milk, Egg, 


___