# Apriori Algorithm Implementation Challenge

**Objective:** Implement the Apriori algorithm for association rule mining from scratch.

The Apriori algorithm works in three main phases:
1. **Candidate Generation**: Create potential itemsets of increasing size
2. **Support Counting**: Determine which itemsets meet minimum support
3. **Rule Generation**: Create association rules from frequent itemsets

### Apriori Principle
*"All subsets of a frequent itemset must also be frequent."*
This is key for pruning the search space.

In [1]:
import numpy as np
from itertools import combinations
from collections import defaultdict
import pandas as pd

# Sample dataset
transactions = [
    ['milk', 'bread', 'eggs'],
    ['milk', 'bread'],
    ['bread', 'eggs'],
    ['milk', 'eggs'],
    ['milk', 'bread', 'eggs', 'cheese'],
    ['cheese', 'eggs'],
    ['cheese'],
    ['milk', 'cheese', 'eggs']
]

# Preprocessing function (given)
def preprocess_transactions(transactions):
    return [list(set(t)) for t in transactions]

transactions = preprocess_transactions(transactions)
unique_items = list(set(item for t in transactions for item in t))

### Step 1: Implement Support Calculation

Complete the function below to calculate support for an itemset. (You should already have this)

In [2]:
def calculate_support(itemset, transactions):
    """
    Calculate the support of an itemset.
    
    Parameters:
    itemset (list): A list of items
    transactions (list): List of all transactions
    
    Returns:
    float: support count (between 0 and 1)
    """
    counter=0
    allPresent=True
    for transaction in transactions:
        for item in itemset:
            if item not in transaction:
                allPresent=False
                break
        if allPresent:
            counter+=1
        allPresent=True
    return counter/len(transactions)

# Test your function
print("Test support for ['milk']:", calculate_support(['milk'], transactions))

Test support for ['milk']: 0.625


### Step 2: Generate Candidate Itemsets

Implement the function to generate candidate itemsets of size k from frequent itemsets of size k-1.

In [3]:
def generate_candidates(prev_freq_itemsets, k):
    """
    Generate candidate itemsets of size k using frequent itemsets of size k-1.
    
    Parameters:
    prev_freq_itemsets (list): Frequent itemsets of size k-1
    k (int): The size of candidates to generate
    
    Returns:
    list: Candidate itemsets of size k
    """
    candidates = []
    n = len(prev_freq_itemsets)
    
    for i in range(n):
        for j in range(i + 1, n):
            # Merge two itemsets if their first k-2 items match
            # This helps avoid redundancy (just add eahc candidate 1 time)
            l1 = sorted(prev_freq_itemsets[i])
            l2 = sorted(prev_freq_itemsets[j])
            
            if l1[:k-2] == l2[:k-2]:
                candidate = sorted(list(set(l1) | set(l2)))
                if candidate not in candidates:
                    candidates.append(candidate)
    
    return candidates

# Test with k=2 (should print combinations of size 2)
print("Test candidates:", generate_candidates([['milk'], ['bread'], ['eggs']], 2))


Test candidates: [['bread', 'milk'], ['eggs', 'milk'], ['bread', 'eggs']]


### Step 3: Implement Pruning

Complete the pruning function using the Apriori principle.

In [4]:
def prune_candidates(candidates, prev_freq_itemsets, k):
    """
    Prune candidates whose subsets are not all frequent.
    
    Parameters:
    candidates (list): Candidate itemsets
    prev_freq_itemsets (list): Frequent itemsets of size k-1
    k (int): Size of the candidates
    
    Returns:
    list: Pruned candidate itemsets
    """
    pruned = []
    
    # Convert to set of tuples for faster lookup
    prev_freq_set = set(tuple(sorted(itemset)) for itemset in prev_freq_itemsets)
    
    for candidate in candidates:
        all_subsets_frequent = True
        
        # Generate all (k-1)-sized subsets
        for subset in combinations(candidate, k-1):
            if tuple(sorted(subset)) not in prev_freq_set:
                all_subsets_frequent = False
                break
        
        if all_subsets_frequent:
            pruned.append(candidate)
    
    return pruned

# Test pruning (should remove ['milk', 'cheese'] because 'cheese' is not in prev_freq_itemsets)
print("Test pruning:", prune_candidates([['milk', 'bread'], ['milk', 'cheese']], 
                                        [['milk'], ['bread'], ['eggs']], 2))


Test pruning: [['milk', 'bread']]


### Step 4: Complete the Apriori Algorithm

Implement the main function that ties all steps together.

In [5]:
def apriori(transactions, min_support=0.2):
    """
    The complete Apriori algorithm implementation.
    
    Parameters:
    transactions (list): List of transactions
    min_support (float): Minimum support threshold
    
    Returns:
    dict: A dictionary with keys as itemset sizes and values as frequent itemsets
    """

    # Generate initial 1-item frequent itemsets
    item_counts = defaultdict(int)
    for transaction in transactions:
        for item in transaction:
            item_counts[item] += 1
    
    num_transactions = len(transactions)
    freq_itemsets = []
    
    for item, count in item_counts.items():
        support = count / num_transactions
        if support >= min_support:
            freq_itemsets.append([item])

    result = {1: freq_itemsets}
    k = 2  # Size of itemsets to generate next

    while True:
        candidates = generate_candidates(result[k - 1], k)
        candidates = prune_candidates(candidates, result[k - 1], k)
        # Count support and collect frequent itemsets
        valid_itemsets = []
        for candidate in candidates:
            support = calculate_support(candidate, transactions)
            if support >= min_support:
                valid_itemsets.append(candidate)
        
        if not valid_itemsets:
            break
        
        result[k] = valid_itemsets
        k += 1

    return result

# Test the full algorithm
print("Apriori results:", apriori(transactions, 0.2))


Apriori results: {1: [['eggs'], ['milk'], ['bread'], ['cheese']], 2: [['eggs', 'milk'], ['bread', 'eggs'], ['cheese', 'eggs'], ['bread', 'milk'], ['cheese', 'milk']], 3: [['bread', 'eggs', 'milk'], ['cheese', 'eggs', 'milk']]}


### Step 5: Association Rule Generation

Implement rule generation from frequent itemsets.

In [6]:
def generate_rules(freq_itemsets, transactions, min_confidence=0.5):
    """
    Generate association rules from frequent itemsets.
    
    Parameters:
    freq_itemsets (dict): Frequent itemsets from apriori
    transactions (list): List of all transactions
    min_confidence (float): Minimum confidence threshold
    
    Returns:
    list: Association rules as dicts with keys 'antecedent', 'consequent', 'support', 'confidence'
    """
    rules = []

    for k, itemsets in freq_itemsets.items():
        if k < 2:
            continue
        
        for itemset in itemsets:
            itemset_support = calculate_support(itemset, transactions)
            
            # Generate all non-empty proper subsets (antecedents)
            for i in range(1, len(itemset)):
                for antecedent in combinations(itemset, i):
                    antecedent = list(antecedent)
                    consequent = [item for item in itemset if item not in antecedent]
                    
                    antecedent_support = calculate_support(antecedent, transactions)
                    confidence = itemset_support / antecedent_support if antecedent_support > 0 else 0
                    
                    if confidence >= min_confidence:
                        rules.append({
                            'antecedent': antecedent,
                            'consequent': consequent,
                            'support': itemset_support,
                            'confidence': confidence
                        })
    
    return rules

# Example test with your existing transactions and freq_itemsets:
freq_itemsets = {
    1: [['milk'], ['bread'], ['eggs']], 
    2: [['milk', 'bread'], ['bread', 'eggs']]
}
print("Test rules:", generate_rules(freq_itemsets, transactions, 0.5))


Test rules: [{'antecedent': ['milk'], 'consequent': ['bread'], 'support': 0.375, 'confidence': 0.6}, {'antecedent': ['bread'], 'consequent': ['milk'], 'support': 0.375, 'confidence': 0.75}, {'antecedent': ['bread'], 'consequent': ['eggs'], 'support': 0.375, 'confidence': 0.75}, {'antecedent': ['eggs'], 'consequent': ['bread'], 'support': 0.375, 'confidence': 0.5}]


## Final Testing

Run your complete implementation on the sample dataset and analyze the results.

In [7]:
# Test the complete workflow
freq_itemsets = apriori(transactions, 0.2)
rules = generate_rules(freq_itemsets, transactions, 0.5)

# Display results
print("Frequent Itemsets:")
for size, itemsets in freq_itemsets.items():
    print(f"Size {size}: {itemsets}")

print("\nAssociation Rules:")
for rule in rules:
    print(f"{rule['antecedent']} => {rule['consequent']} "
          f"(supp: {rule['support']:.2f}, conf: {rule['confidence']:.2f})")

Frequent Itemsets:
Size 1: [['eggs'], ['milk'], ['bread'], ['cheese']]
Size 2: [['eggs', 'milk'], ['bread', 'eggs'], ['cheese', 'eggs'], ['bread', 'milk'], ['cheese', 'milk']]
Size 3: [['bread', 'eggs', 'milk'], ['cheese', 'eggs', 'milk']]

Association Rules:
['eggs'] => ['milk'] (supp: 0.50, conf: 0.67)
['milk'] => ['eggs'] (supp: 0.50, conf: 0.80)
['bread'] => ['eggs'] (supp: 0.38, conf: 0.75)
['eggs'] => ['bread'] (supp: 0.38, conf: 0.50)
['cheese'] => ['eggs'] (supp: 0.38, conf: 0.75)
['eggs'] => ['cheese'] (supp: 0.38, conf: 0.50)
['bread'] => ['milk'] (supp: 0.38, conf: 0.75)
['milk'] => ['bread'] (supp: 0.38, conf: 0.60)
['cheese'] => ['milk'] (supp: 0.25, conf: 0.50)
['bread'] => ['eggs', 'milk'] (supp: 0.25, conf: 0.50)
['bread', 'eggs'] => ['milk'] (supp: 0.25, conf: 0.67)
['bread', 'milk'] => ['eggs'] (supp: 0.25, conf: 0.67)
['eggs', 'milk'] => ['bread'] (supp: 0.25, conf: 0.50)
['cheese'] => ['eggs', 'milk'] (supp: 0.25, conf: 0.50)
['cheese', 'eggs'] => ['milk'] (supp: 0.