# Step 4 - Association Rule Mining with Apriori Algorithm

This code requires the following file to run:
- St3_fatal_accident_clusters.parquet

-------------------------

This is the code for the fourth step of our pipeline.
In summary:
* Load clustered driver data from previous step
* Transform data into transaction format for association rule mining
* Implement Apriori algorithm from scratch for finding frequent itemsets
* Compare scratch implementation with mlxtend library implementation
* Generate association rules with support, confidence, and lift metrics
* Analyze rules for each cluster to identify patterns
* Display top rules ranked by lift to find strongest associations

The Apriori algorithm discovers interesting relationships (association rules) between features in the data. For example, it can identify that "drivers in dark conditions" → "nighttime accidents" with high confidence. The algorithm is useful for understanding which combinations of features frequently occur together in fatal accidents.

Both a scratch implementation and the mlxtend library are used for validation and comparison.

A limitation of this approach is that the minimum support and confidence thresholds must be carefully tuned. Too high thresholds may miss interesting rules, while too low thresholds generate too many trivial rules.

### How to run the code:

1) Run libraries
2) Run all the sections in order (top to bottom)
3) Run the Use section
4) Optional: review the code of each section

In [1]:
import pandas as pd
import numpy as np
import itertools
from pathlib import Path
from typing import List, Dict, Set, Tuple, Optional
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

### Step 4.1: Transaction Preparation

Convert clustered data into transaction format for association rule mining

In [20]:
def get_transactions_for_cluster(
    df: pd.DataFrame, 
    cluster_id: int,
    categorical_cols: List[str],
    binary_cols: List[str]
) -> List[List[str]]:

    df_c = df[df['cluster'] == cluster_id].copy()

    binary_both_values = [
        ('MALE', {1: 'GENDER=MALE', 0: 'GENDER=FEMALE'}),
        ('OLD_VEHICLE', {1: 'VEHICLE=OLD', 0: 'VEHICLE=NEW'}),
       ('URBAN', {1: 'LOCATION=URBAN', 0: 'LOCATION=RURAL'}),
    ]
    binary_both_cols = ['MALE', 'OLD_VEHICLE', 'URBAN']
    
    transactions = []
    for _, row in df_c.iterrows():
        transaction = []
        
        # Add categorical attributes (Column=Value format), can be changed in USE
        for col in categorical_cols:
            if pd.notna(row[col]):
                transaction.append(f"{col}={row[col]}")

        for col, value_map in binary_both_values:
            if pd.notna(row[col]):
                value = int(row[col])
                transaction.append(value_map[value])
         
        # Add binary attributes (Column name if value is 1), can be changed in USE
        for col in binary_cols:
            if row[col] == 1 and col not in binary_both_cols:
                transaction.append(col)
        
        transactions.append(transaction)
    
    return transactions

### Step 4.2: Apriori Algorithm - Scratch Implementation

Core functions for Apriori algorithm implemented from scratch

In [21]:
def create_C1(dataset: List[List[str]]) -> List[frozenset]:
    C1 = []
    for transaction in dataset:
        for item in transaction:
            if [item] not in C1:
                C1.append([item])
    C1.sort()
    return list(map(frozenset, C1))


def scan_D(
    dataset: List[List[str]], 
    candidates: List[frozenset], 
    min_support: float
) -> Tuple[List[frozenset], Dict[frozenset, float]]:

    ss_cnt = {}
    
    # Count occurrences of each candidate
    for tid in dataset:
        for can in candidates:
            if can.issubset(tid):
                if can not in ss_cnt:
                    ss_cnt[can] = 1
                else:
                    ss_cnt[can] += 1
    
    # Calculate support and filter by min_support
    num_items = float(len(dataset))
    ret_list = []
    support_data = {}
    
    for key in ss_cnt:
        support = ss_cnt[key] / num_items
        if support >= min_support:
            ret_list.insert(0, key)
        support_data[key] = support
    
    return ret_list, support_data


def apriori_gen(Lk: List[frozenset], k: int) -> List[frozenset]:

    ret_list = []
    len_Lk = len(Lk)
    
    for i in range(len_Lk):
        for j in range(i + 1, len_Lk):
            # Join step: merge itemsets with k-2 common items
            L1 = list(Lk[i])[:k-2]
            L2 = list(Lk[j])[:k-2]
            L1.sort()
            L2.sort()
            
            if L1 == L2: # Join step
                ret_list.append(Lk[i] | Lk[j])
    
    return ret_list

### Step 4.3: Apriori Main Algorithm (Scratch)

Main Apriori algorithm to find all frequent itemsets

In [22]:
def apriori_scratch(
    dataset: List[List[str]], 
    min_support: float = 0.4
) -> Tuple[List[List[frozenset]], Dict[frozenset, float]]:

    # Convert to frozenset for efficient subset operations
    dataset_frozen = list(map(set, dataset))
    
    # Generate initial candidates (size 1)
    C1 = create_C1(dataset)
    
    # Find frequent 1-itemsets
    L1, support_data = scan_D(dataset_frozen, C1, min_support)
    
    # Store all frequent itemsets
    L = [L1]
    k = 2
    
    # Iteratively find larger itemsets
    while len(L[k-2]) > 0:
        # Generate candidates of size k
        Ck = apriori_gen(L[k-2], k)
        
        # Find frequent k-itemsets
        Lk, supK = scan_D(dataset_frozen, Ck, min_support)
        
        support_data.update(supK)
        L.append(Lk)
        k += 1
    
    return L, support_data

### Step 4.4: Association Rule Generation (Scratch)

Generate association rules from frequent itemsets

In [23]:
def calc_conf(
    freq_set: frozenset,
    H: List[frozenset],
    support_data: Dict[frozenset, float],
    brl: List[Tuple],
    min_conf: float
) -> List[frozenset]:

    pruned_H = []
    
    for conseq in H:
        antecedent = freq_set - conseq

        # Calculate confidence: support(freq_set) / support(antecedent)
        conf = support_data[freq_set] / support_data[antecedent]
        
        if conf >= min_conf:
            # Calculate lift: confidence / support(consequent)
            lift = conf / support_data[conseq]
            
            brl.append((antecedent, conseq, conf, lift, support_data[freq_set]))
            pruned_H.append(conseq)
    
    return pruned_H


def rules_from_conseq(
    freq_set: frozenset,
    H: List[frozenset],
    support_data: Dict[frozenset, float],
    brl: List[Tuple],
    min_conf: float
) -> None:
    
    m = len(H[0])  # Size of current consequents
    
    if len(freq_set) > (m + 1):
        Hmp1 = apriori_gen(H, m + 1)
    
        # Test these candidates and prune
        Hmp1 = calc_conf(freq_set, Hmp1, support_data, brl, min_conf)
        if len(Hmp1) > 1:
            rules_from_conseq(freq_set, Hmp1, support_data, brl, min_conf)


def generate_rules_scratch(
    L: List[List[frozenset]], 
    support_data: Dict[frozenset, float],
    min_confidence: float = 0.6
) -> pd.DataFrame:

    big_rule_list = []
    
    # Start from L[1] (2-itemsets) since we need at least 2 items for a rule
    for i in range(1, len(L)):
        for freq_set in L[i]:
            H1 = [frozenset([item]) for item in freq_set]
            
            # 1. Always calculate confidence for 1-item consequents first
            H1 = calc_conf(freq_set, H1, support_data, big_rule_list, min_confidence)
            
            # 2. If itemset has more than 2 items AND we have valid consequents, recurse
            if i > 1 and len(H1) > 0:
                rules_from_conseq(freq_set, H1, support_data, big_rule_list, min_confidence)
    
    # Convert to DataFrame
    rules = []
    for (antecedents, consequents, confidence, lift, support) in big_rule_list:
        rules.append({
            'antecedents': ', '.join(sorted(antecedents)),
            'consequents': ', '.join(sorted(consequents)),
            'support': support,
            'confidence': confidence,
            'lift': lift
        })
    
    df_rules = pd.DataFrame(rules)
    
    if len(df_rules) > 0:
        df_rules = df_rules.sort_values('lift', ascending=False)
    
    return df_rules

### Step 4.5: Apriori with mlxtend Library

Use mlxtend library for comparison and validation

In [24]:
def apriori_mlxtend(
    transactions: List[List[str]],
    min_support: float = 0.4,
    min_confidence: float = 0.6
) -> Tuple[pd.DataFrame, pd.DataFrame]:

    # Transform transactions to one-hot encoded DataFrame
    te = TransactionEncoder()
    te_ary = te.fit(transactions).transform(transactions)
    df_encoded = pd.DataFrame(te_ary, columns=te.columns_)
    
    # Find frequent itemsets
    frequent_itemsets = apriori(
        df_encoded, 
        min_support=min_support, 
        use_colnames=True
    )
    
    # Generate rules
    if len(frequent_itemsets) > 0:
        rules = association_rules(
            frequent_itemsets,
            metric="confidence",
            min_threshold=min_confidence
        )
        
        # Format for consistency with scratch implementation
        if len(rules) > 0:
            rules['antecedents'] = rules['antecedents'].apply(
                lambda x: ', '.join(sorted(list(x)))
            )
            rules['consequents'] = rules['consequents'].apply(
                lambda x: ', '.join(sorted(list(x)))
            )
            rules = rules[['antecedents', 'consequents', 'support', 
                          'confidence', 'lift']].sort_values('lift', ascending=False)
    else:
        rules = pd.DataFrame()
    
    return frequent_itemsets, rules

### Step 4.6: Cluster Analysis Function

Analyze association rules for a specific cluster

In [25]:
def analyze_cluster_rules(
    df: pd.DataFrame,
    cluster_id: int,
    categorical_cols: List[str],
    binary_cols: List[str],
    min_support: float = 0.4,
    min_confidence: float = 0.6,
    top_n: int = 15
) -> Dict[str, pd.DataFrame]:

    # Get transactions for this cluster
    transactions = get_transactions_for_cluster(
        df, cluster_id, categorical_cols, binary_cols
    )
    print(f"Transactions: {len(transactions)}")
    
    # Method 1: Scratch implementation
    print("\n--- [SCRATCH] Running Apriori from scratch...")
    L_scratch, support_data_scratch = apriori_scratch(transactions, min_support)
    rules_scratch = generate_rules_scratch(L_scratch, support_data_scratch, min_confidence)
    
    print(f"\n--- [SCRATCH] Top {top_n} Rules ---\n")
    if len(rules_scratch) > 0:
        display_df = rules_scratch.head(top_n).copy()
        display_df.columns = [col.capitalize() for col in display_df.columns]
        # Format numeric columns to 3 decimals
        display_df['Support'] = display_df['Support'].map('{:.3f}'.format)
        display_df['Confidence'] = display_df['Confidence'].map('{:.3f}'.format)
        display_df['Lift'] = display_df['Lift'].map('{:.3f}'.format)
        print(display_df.to_string(index=False))
    else:
        print("No rules found with current thresholds.")
    
    # Method 2: mlxtend library
    print("\n--- [MLXTEND] Running Apriori with mlxtend...")
    freq_itemsets_mlx, rules_mlxtend = apriori_mlxtend(
        transactions, min_support, min_confidence
    )
    
    print(f"\n--- [MLXTEND] Top {top_n} Rules ---\n")
    if len(rules_mlxtend) > 0:
        display_df = rules_mlxtend.head(top_n).copy()
        display_df.columns = [col.capitalize() for col in display_df.columns]
        # Format numeric columns to 3 decimals
        display_df['Support'] = display_df['Support'].map('{:.3f}'.format)
        display_df['Confidence'] = display_df['Confidence'].map('{:.3f}'.format)
        display_df['Lift'] = display_df['Lift'].map('{:.3f}'.format)
        print(display_df.to_string(index=False))
    else:
        print("No rules found with current thresholds.")
    
    print("\n")
    
    return {
        'scratch': rules_scratch,
        'mlxtend': rules_mlxtend
    }

### Pipeline

Run association rule mining for all clusters

In [26]:
def run_apriori_pipeline(
    input_file: Path,
    categorical_cols: List[str],
    binary_cols: List[str],
    min_support: float = 0.4,
    min_confidence: float = 0.6,
    clusters_to_analyze: Optional[List[int]] = None,
    top_n: int = 15,
    save_results: bool = True,
    output_dir: Optional[Path] = None
) -> Dict[int, Dict[str, pd.DataFrame]]:

    print("ASSOCIATION RULE MINING WITH APRIORI")
    print(f"\nConfiguration:")
    print(f"   Min Support: {min_support}")
    print(f"   Min Confidence: {min_confidence}")
    print(f"   Binary features: {len(binary_cols)}")
    print(f"   Categorical features: {len(categorical_cols)}")
    print()
    
    # Load data
    print(f"Loading data from: {input_file}")
    df = pd.read_parquet(input_file)
    print(f"Loaded: {len(df):,} drivers")
    
    # Determine clusters to analyze
    if clusters_to_analyze is None:
        clusters_to_analyze = sorted(df['cluster'].unique())
    
    print(f"Analyzing {len(clusters_to_analyze)} clusters: {clusters_to_analyze}\n")
    
    # Analyze each cluster
    all_results = {}
    
    for cluster_id in clusters_to_analyze:
        results = analyze_cluster_rules(
            df=df,
            cluster_id=cluster_id,
            categorical_cols=categorical_cols,
            binary_cols=binary_cols,
            min_support=min_support,
            min_confidence=min_confidence,
            top_n=top_n
        )
        all_results[cluster_id] = results
        
        # Save results if requested
        if save_results and output_dir is not None:
            output_path = Path(output_dir)
            output_path.mkdir(parents=True, exist_ok=True)
            
            # Save scratch rules
            if len(results['scratch']) > 0:
                scratch_file = output_path / f"cluster_{cluster_id}_rules_scratch.csv"
                results['scratch'].to_csv(scratch_file, index=False)
            
            # Save mlxtend rules
            if len(results['mlxtend']) > 0:
                mlxtend_file = output_path / f"cluster_{cluster_id}_rules_mlxtend.csv"
                results['mlxtend'].to_csv(mlxtend_file, index=False)
    
    # Summary
    print(f"\nSummary:")
    print(f"   • Clusters analyzed: {len(clusters_to_analyze)}")
    print(f"   • Min support: {min_support}")
    print(f"   • Min confidence: {min_confidence}")
    
    # Count total rules found
    total_rules_scratch = sum(len(r['scratch']) for r in all_results.values())
    total_rules_mlxtend = sum(len(r['mlxtend']) for r in all_results.values())
    print(f"   • Total rules found (scratch): {total_rules_scratch}")
    print(f"   • Total rules found (mlxtend): {total_rules_mlxtend}")
    
    if save_results and output_dir is not None:
        print(f"\nResults saved to: {output_dir}")
    
    return all_results

### USE

Configure INPUT path, features, and Apriori parameters

In [27]:
# ============================================================================
# CONFIGURATION
# ============================================================================

# INPUT Configuration
INPUT_FILE = Path("Dataset/St3_fatal_accident_clusters.parquet")
OUTPUT_DIR = Path("Results/St4_apriori_results")

# Features Configuration (must match Step 3)
CATEGORICAL_COLS = [
    'TIME_OF_DAY',
    'WEEKEND_FLAG',
    'SEASON',
    'AGE_GROUP'
]

BINARY_COLS = [
    'RUSH_HOUR',
    'MALE',
    'ADVERSE_WEATHER',
    'DARK_CONDITIONS',
    'OLD_VEHICLE',
    'PASSENGER_CAR',
    'LARGE_TRUCK',
    'MOTORCYCLE',
    'URBAN',
    'INTERSTATE',
    'INTERSECTION',
    'WORK_ZONE_CRASH',
    'ROLLOVER_CRASH',
    'FIRE'
]

# Apriori Parameters
MIN_SUPPORT = 0.2  # Minimum support threshold (40%)
MIN_CONFIDENCE = 0.6  # Minimum confidence threshold (60%)
TOP_N_RULES = 15  # Number of top rules to display per cluster

# Analysis Configuration
CLUSTERS_TO_ANALYZE = None  # None = analyze all clusters, or specify list [0, 1, 2]
SAVE_RESULTS = True  # Save results to CSV files

# ============================================================================
# RUN PIPELINE
# ============================================================================

if not INPUT_FILE.exists():
    print(f"\nInput file not found: {INPUT_FILE}")
    print("Run Step 3 (Clustering) first")
else:
    results = run_apriori_pipeline(
        input_file=INPUT_FILE,
        categorical_cols=CATEGORICAL_COLS,
        binary_cols=BINARY_COLS,
        min_support=MIN_SUPPORT,
        min_confidence=MIN_CONFIDENCE,
        clusters_to_analyze=CLUSTERS_TO_ANALYZE,
        top_n=TOP_N_RULES,
        save_results=SAVE_RESULTS,
        output_dir=OUTPUT_DIR
    )

ASSOCIATION RULE MINING WITH APRIORI

Configuration:
   Min Support: 0.2
   Min Confidence: 0.6
   Binary features: 14
   Categorical features: 4

Loading data from: Dataset/St3_fatal_accident_clusters.parquet
Loaded: 55,725 drivers
Analyzing 6 clusters: [np.uint16(0), np.uint16(1), np.uint16(2), np.uint16(3), np.uint16(4), np.uint16(5)]

Transactions: 26348

--- [SCRATCH] Running Apriori from scratch...

--- [SCRATCH] Top 15 Rules ---

                          Antecedents                          Consequents Support Confidence  Lift
                      DARK_CONDITIONS           GENDER=MALE, PASSENGER_CAR   0.200      0.741 1.513
                      DARK_CONDITIONS  PASSENGER_CAR, WEEKEND_FLAG=Weekday   0.213      0.786 1.374
                            RUSH_HOUR    VEHICLE=NEW, WEEKEND_FLAG=Weekday   0.202      0.804 1.272
                      DARK_CONDITIONS    GENDER=MALE, WEEKEND_FLAG=Weekday   0.201      0.743 1.256
                      DARK_CONDITIONS                      

### Optional: Analyze Specific Cluster

Run this cell to analyze a single cluster with custom parameters

In [10]:
# Example: Analyze cluster 0 with different parameters
# Uncomment and modify as needed

# df = pd.read_parquet(INPUT_FILE)
# 
# cluster_results = analyze_cluster_rules(
#     df=df,
#     cluster_id=0,
#     categorical_cols=CATEGORICAL_COLS,
#     binary_cols=BINARY_COLS,
#     min_support=0.3,  # Lower support threshold
#     min_confidence=0.7,  # Higher confidence threshold
#     top_n=20  # Show more rules
# )

#### Function to create the text files for frequent itemset patterns only

In [28]:
def analyze_cluster_patterns(
    df: pd.DataFrame,
    cluster_id: int,
    categorical_cols: List[str],
    binary_cols: List[str],
    min_support: float = 0.2,
    output_dir: str = 'Results/frequent_item_patterns'
) -> Dict[str, pd.DataFrame]:
    """
    Analyze frequent itemset patterns for a specific cluster.
    
    Parameters:
    -----------
    df : pd.DataFrame
        The input dataframe containing cluster assignments
    cluster_id : int
        The cluster to analyze
    categorical_cols : List[str]
        List of categorical column names
    binary_cols : List[str]
        List of binary column names
    min_support : float
        Minimum support threshold (default: 0.2)
    output_dir : str
        Directory to save output files
        
    Returns:
    --------
    Dict[str, pd.DataFrame]
        Dictionary containing 'scratch' and 'mlxtend' frequent itemsets
    """
    print('='*60)
    print(f"ANALYZING CLUSTER: {cluster_id}")
    print('='*60)
    
    # Get transactions for this cluster
    transactions = get_transactions_for_cluster(
        df, cluster_id, categorical_cols, binary_cols
    )
    print(f"Transactions: {len(transactions)}")
    
    # --- 1. Scratch Implementation ---
    print(f"\n--- [SCRATCH] Running Apriori from scratch...")
    L_scratch, support_data_scratch = apriori_scratch(transactions, min_support)
    
    print(f"\n--- [SCRATCH] Frequent Itemsets (Support >= {min_support}) ---\n")
    
    # Collect all frequent itemsets
    frequent_itemsets_scratch = []
    for itemset, support in support_data_scratch.items():
        if support >= min_support:
            frequent_itemsets_scratch.append({
                'Itemset': ", ".join(sorted(list(itemset))),
                'Support': support
            })
    
    # Create DataFrame and sort
    df_patterns_scratch = pd.DataFrame(frequent_itemsets_scratch)
    if not df_patterns_scratch.empty:
        df_patterns_scratch = df_patterns_scratch.sort_values('Support', ascending=False)
        
        # Display formatted output
        display_df = df_patterns_scratch.copy()
        display_df['Support'] = display_df['Support'].apply(lambda x: f'{x:.6f}')
        print(display_df.to_string(index=False))
    else:
        print("No frequent itemsets found with the specified support threshold.")
    
    # --- 2. Mlxtend Implementation ---
    print(f"\n--- [MLXTEND] Running Apriori with mlxtend...")
    from mlxtend.preprocessing import TransactionEncoder
    from mlxtend.frequent_patterns import apriori
    
    # Convert transactions to one-hot encoded format
    te = TransactionEncoder()
    te_ary = te.fit(transactions).transform(transactions)
    df_encoded = pd.DataFrame(te_ary, columns=te.columns_)
    
    # Get frequent itemsets
    frequent_itemsets_mlx = apriori(df_encoded, min_support=min_support, use_colnames=True)
    
    print(f"\n--- [MLXTEND] Frequent Itemsets (Support >= {min_support}) ---\n")
    
    # Initialize df_patterns_mlx
    df_patterns_mlx = pd.DataFrame()
    
    if not frequent_itemsets_mlx.empty:
        # Sort by support descending
        frequent_itemsets_mlx = frequent_itemsets_mlx.sort_values('support', ascending=False).reset_index(drop=True)
        
        # Format for display
        display_df = frequent_itemsets_mlx.copy()
        display_df['itemsets'] = display_df['itemsets'].apply(
            lambda x: ", ".join(sorted(list(x)))
        )
        display_df.columns = ['Support', 'Itemset']  # Fix: correct column order
        display_df = display_df[['Itemset', 'Support']]  # Reorder columns
        display_df['Support'] = display_df['Support'].apply(lambda x: f'{x:.6f}')
        print(display_df.to_string(index=False))
        
        # Prepare mlxtend dataframe for saving (without formatting)
        df_patterns_mlx = frequent_itemsets_mlx.copy()
        df_patterns_mlx['itemsets'] = df_patterns_mlx['itemsets'].apply(
            lambda x: ", ".join(sorted(list(x)))
        )
        df_patterns_mlx.columns = ['Support', 'Itemset']
        df_patterns_mlx = df_patterns_mlx[['Itemset', 'Support']]  # Reorder columns
    else:
        print("No frequent itemsets found (Mlxtend).")
    
    # --- 3. Save Results to Files ---
    import os
    os.makedirs(output_dir, exist_ok=True)
    
    # Save scratch results
    if not df_patterns_scratch.empty:
        filename_scratch = f'{output_dir}/cluster_{cluster_id}_patterns_scratch.txt'
        with open(filename_scratch, 'w') as f:
            f.write(f"CLUSTER {cluster_id} - Frequent Itemsets [SCRATCH] (Support >= {min_support})\n")
            f.write(f"Total Transactions: {len(transactions)}\n")
            f.write("="*60 + "\n\n")
            f.write(df_patterns_scratch.to_string(index=False))
        print(f"\nScratch patterns written to: {filename_scratch}")
    
    # Save mlxtend results
    if not df_patterns_mlx.empty:
        filename_mlx = f'{output_dir}/cluster_{cluster_id}_patterns_mlxtend.txt'
        with open(filename_mlx, 'w') as f:
            f.write(f"CLUSTER {cluster_id} - Frequent Itemsets [MLXTEND] (Support >= {min_support})\n")
            f.write(f"Total Transactions: {len(transactions)}\n")
            f.write("="*60 + "\n\n")
            f.write(df_patterns_mlx.to_string(index=False))
        print(f"\nMLXTEND patterns written to: {filename_mlx}")
    
    print("\n")
    
    return {
        'scratch': df_patterns_scratch,
        'mlxtend': df_patterns_mlx
    }

In [29]:
df = pd.read_parquet(INPUT_FILE)

analyze_cluster_patterns(
    df=df,
    cluster_id=0,
    categorical_cols=CATEGORICAL_COLS,
    binary_cols=BINARY_COLS,
    min_support=0.2,  # Lower support threshold
)
analyze_cluster_patterns(
    df=df,
    cluster_id=1,
    categorical_cols=CATEGORICAL_COLS,
    binary_cols=BINARY_COLS,
    min_support=0.2,  # Lower support threshold
)
analyze_cluster_patterns(
    df=df,
    cluster_id=2,
    categorical_cols=CATEGORICAL_COLS,
    binary_cols=BINARY_COLS,
    min_support=0.2,  # Lower support threshold
)
analyze_cluster_patterns(
    df=df,
    cluster_id=3,
    categorical_cols=CATEGORICAL_COLS,
    binary_cols=BINARY_COLS,
    min_support=0.2,  # Lower support threshold
)
analyze_cluster_patterns(
    df=df,
    cluster_id=4,
    categorical_cols=CATEGORICAL_COLS,
    binary_cols=BINARY_COLS,
    min_support=0.2,  # Lower support threshold
)
analyze_cluster_patterns(
    df=df,
    cluster_id=5,
    categorical_cols=CATEGORICAL_COLS,
    binary_cols=BINARY_COLS,
    min_support=0.2,  # Lower support threshold
)

ANALYZING CLUSTER: 0
Transactions: 26348

--- [SCRATCH] Running Apriori from scratch...

--- [SCRATCH] Frequent Itemsets (Support >= 0.2) ---

                                                                        Itemset  Support
                                                           WEEKEND_FLAG=Weekday 0.810802
                                                                    VEHICLE=NEW 0.783703
                                                                    GENDER=MALE 0.743092
                                                                  PASSENGER_CAR 0.717702
                                                                 LOCATION=URBAN 0.663162
                                                                AGE_GROUP=Adult 0.640049
                                              VEHICLE=NEW, WEEKEND_FLAG=Weekday 0.631737
                                              GENDER=MALE, WEEKEND_FLAG=Weekday 0.591278
                                                       G

{'scratch':                                                Itemset   Support
 7                                 WEEKEND_FLAG=Weekday  0.942927
 6                                          VEHICLE=NEW  0.902311
 2                                          GENDER=MALE  0.892857
 13                   VEHICLE=NEW, WEEKEND_FLAG=Weekday  0.845238
 17                   GENDER=MALE, WEEKEND_FLAG=Weekday  0.836835
 ..                                                 ...       ...
 267  AGE_GROUP=Adult, GENDER=MALE, LOCATION=RURAL, ...  0.202381
 353  DARK_CONDITIONS, LOCATION=RURAL, SEASON=Fall, ...  0.201681
 321  AGE_GROUP=Adult, DARK_CONDITIONS, LOCATION=URB...  0.201681
 131   SEASON=Winter, VEHICLE=NEW, WEEKEND_FLAG=Weekday  0.201331
 183           GENDER=MALE, LARGE_TRUCK, LOCATION=URBAN  0.200980
 
 [429 rows x 2 columns],
 'mlxtend':                                                Itemset   Support
 0                                 WEEKEND_FLAG=Weekday  0.942927
 1                         