# Comparative Analysis of Interestingness Measures in Association Rule Mining

## Acknowledgements

This notebook was developed with assistance from Claude. Claude helped with content generation, code suggestions, and structure organization.

## Table of Contents
1. [Introduction](#introduction)
2. [Setup and Data Loading](#setup)
3. [Data Preprocessing](#preprocessing)
4. [Association Rule Mining](#mining)
5. [Interestingness Measures Implementation](#measures)
6. [Comparative Analysis](#analysis)
7. [Cross-Dataset Stability Analysis](#cross-dataset)
8. [Statistical Robustness Testing](#statistical)
9. [Visualization of Results](#visualization)
10. [Conclusions and Recommendations](#conclusions)

## 1. Introduction <a id="introduction"></a>

This notebook implements the research proposal on improving the evaluation of association rules in pattern mining by comparing existing interestingness measures. We'll analyze five datasets to determine which measures work best for different data types and provide visual comparisons of their effectiveness.

**Research Goals:**
- Compare the effectiveness of existing interestingness measures
- Determine which measures work best for different types of datasets

## 2. Setup and Data Loading <a id="setup"></a>

Let's start by importing the necessary libraries and loading our datasets.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder
from scipy.stats import spearmanr, wilcoxon
from sklearn.preprocessing import LabelEncoder
import time
import warnings
from tqdm.notebook import tqdm
import itertools
from functools import partial
from concurrent.futures import ProcessPoolExecutor
import multiprocessing
import os
import scipy

pd.set_option('display.max_columns', None)
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
sns.set_theme(style="whitegrid")
colors = sns.color_palette("viridis", 10)

In [2]:
def load_dataset(filename, dataset_type):
    if dataset_type == 'adult':
        columns = ['age', 'workclass', 'fnlwgt', 'education', 'education-num', 
                   'marital-status', 'occupation', 'relationship', 'race', 
                   'sex', 'capital-gain', 'capital-loss', 'hours-per-week', 
                   'native-country', 'income']
        return pd.read_csv(filename, names=columns, sep=', ', engine='python')
    
    elif dataset_type == 'mushroom':
        # Mushroom dataset has no header
        return pd.read_csv(filename, header=None)
    
    elif dataset_type == 'bank':
        return pd.read_csv(filename, sep=';')
    
    elif dataset_type == 'german':
        # German credit data
        columns = ['status', 'duration', 'credit_history', 'purpose', 'amount',
                   'savings', 'employment_duration', 'installment_rate', 'personal_status_sex',
                   'other_debtors', 'residence_since', 'property', 'age', 'other_installment_plans',
                   'housing', 'number_credits', 'job', 'people_liable', 'telephone', 'foreign_worker',
                   'credit_risk']
        return pd.read_csv(filename, sep=' ', names=columns)
    
    elif dataset_type == 'house_prices':
        return pd.read_csv(filename)
    
    else:
        raise ValueError(f"Unknown dataset type: {dataset_type}")

In [3]:
print("Loading datasets...")
try:
    adult_df = load_dataset('datasets/adult.data', 'adult')
    mushroom_df = load_dataset('datasets/agaricus-lepiota.data', 'mushroom')
    bank_df = load_dataset('datasets/bank-full.csv', 'bank')
    german_df = load_dataset('datasets/german.data', 'german')
    house_df = load_dataset('datasets/house_prices.csv', 'house_prices')
    
    datasets = {
        'Adult Census': adult_df,
        'Mushroom': mushroom_df,
        'Bank Marketing': bank_df,
        'German Credit': german_df,
        'House Prices': house_df
    }
    
    print("All datasets loaded successfully!")
except Exception as e:
    print(f"Error loading datasets: {e}")

Loading datasets...
All datasets loaded successfully!


## 3. Data Preprocessing <a id="preprocessing"></a>

We need to convert our datasets into a transaction format suitable for association rule mining.

In [4]:
def preprocess_dataset(df, categorical_cols=None, numerical_cols=None, binary_cols=None, target_col=None):
    df_copy = df.copy()
    
    for col in df_copy.columns:
        if df_copy[col].dtype == 'object':
            df_copy[col] = df_copy[col].fillna('Unknown')
        else:
            df_copy[col] = df_copy[col].fillna(df_copy[col].median())
    
    df_transformed = pd.DataFrame()
    
    if categorical_cols:
        for col in categorical_cols:
            if col in df_copy.columns:
                # One-hot encode categorical variables
                df_transformed = pd.concat([
                    df_transformed, 
                    pd.get_dummies(df_copy[col], prefix=col, drop_first=False)
                ], axis=1)
    
    if numerical_cols:
        for col in numerical_cols:
            if col in df_copy.columns:
                # Bin numerical variables into 5 categories
                try:
                    df_copy[f"{col}_binned"] = pd.qcut(
                        df_copy[col], q=5, duplicates='drop', 
                        labels=[f"Q{i}" for i in range(1, 6)]
                    )
                except:
                    df_copy[f"{col}_binned"] = pd.cut(
                        df_copy[col], bins=5, duplicates='drop', 
                        labels=[f"Q{i}" for i in range(1, 6)]
                    )
                # One-hot encode binned variables
                df_transformed = pd.concat([
                    df_transformed, 
                    pd.get_dummies(df_copy[f"{col}_binned"], prefix=col, drop_first=False)
                ], axis=1)
    
    if binary_cols:
        for col in binary_cols:
            if col in df_copy.columns:
                df_transformed[col] = df_copy[col]
    
    if target_col and target_col in df_copy.columns:
        if df_copy[target_col].dtype == 'object':
            df_transformed = pd.concat([
                df_transformed, 
                pd.get_dummies(df_copy[target_col], prefix=target_col, drop_first=False)
            ], axis=1)
        else:
            try:
                df_copy[f"{target_col}_binned"] = pd.qcut(
                    df_copy[target_col], q=5, duplicates='drop', 
                    labels=[f"Q{i}" for i in range(1, 6)]
                )
            except:
                df_copy[f"{target_col}_binned"] = pd.cut(
                    df_copy[target_col], bins=5, duplicates='drop', 
                    labels=[f"Q{i}" for i in range(1, 6)]
                )
            df_transformed = pd.concat([
                df_transformed, 
                pd.get_dummies(df_copy[f"{target_col}_binned"], prefix=target_col, drop_first=False)
            ], axis=1)
    
    df_transformed = df_transformed.astype(int)
    
    transactions = df_transformed.apply(
        lambda row: row.index[row == 1].tolist(),
        axis=1
    ).tolist()
    
    return df_transformed, transactions

This function, `preprocess_dataset`, is designed to transform a given dataset by handling missing values, encoding categorical variables, binning numerical columns, and preparing the data for analysis. It also returns a transactional representation of the dataset.

### **Processing Steps**
1. **Handling Missing Values:**
   - Categorical columns (`dtype=object`) are filled with `"Unknown"`.
   - Numerical columns are filled with their median value.

2. **Categorical Encoding:**
   - Categorical columns specified in `categorical_cols` are one-hot encoded.

3. **Numerical Binning:**
   - Columns in `numerical_cols` are binned into 5 quantiles (`qcut` or `cut` as a fallback).
   - Binned categories are one-hot encoded.

4. **Binary Columns:**
   - Binary columns are kept as is.

5. **Target Column Transformation:**
   - If categorical, one-hot encoding is applied.
   - If numerical, it is binned into 5 quantiles (or using `cut` as a fallback) and one-hot encoded.

6. **Final Transformation:**
   - All encoded columns are converted to integers.
   - A transaction-style dataset is generated where each row contains the names of active (nonzero) features.

### **Function Output**
- **df_transformed (DataFrame):** The processed dataset with numerical and categorical encoding.
- **transactions (list of lists):** A transactional format where each row contains active feature names.

In [5]:
preprocessing_specs = {
    'Adult Census': {
        'categorical_cols': ['workclass', 'education', 'marital-status', 'occupation', 
                            'relationship', 'race', 'sex', 'native-country'],
        'numerical_cols': ['age', 'fnlwgt', 'education-num', 'capital-gain', 
                          'capital-loss', 'hours-per-week'],
        'binary_cols': [],
        'target_col': 'income'
    },
    'Mushroom': {
        'categorical_cols': list(range(1, 23)),
        'numerical_cols': [],
        'binary_cols': [],
        'target_col': 0
    },
    'Bank Marketing': {
        'categorical_cols': ['job', 'marital', 'education', 'default', 'housing', 
                            'loan', 'contact', 'month', 'poutcome'],
        'numerical_cols': ['age', 'balance', 'day', 'duration', 'campaign', 'pdays', 'previous'],
        'binary_cols': [],
        'target_col': 'y'
    },
    'German Credit': {
        'categorical_cols': ['status', 'credit_history', 'purpose', 'savings', 
                            'employment_duration', 'personal_status_sex', 'other_debtors',
                            'property', 'other_installment_plans', 'housing', 'job',
                            'telephone', 'foreign_worker'],
        'numerical_cols': ['duration', 'amount', 'installment_rate', 'residence_since', 
                          'age', 'number_credits', 'people_liable'],
        'binary_cols': [],
        'target_col': 'credit_risk'
    },
    'House Prices': {
        'categorical_cols': ['MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour', 
                            'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 
                            'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 
                            'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 
                            'MasVnrType', 'Foundation', 'Heating', 'CentralAir', 
                            'Electrical', 'Functional', 'GarageType', 'PavedDrive', 
                            'Fence', 'MiscFeature', 'SaleType', 'SaleCondition'],
        'numerical_cols': ['LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 
                          'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 
                          'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', 
                          '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 
                          'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 
                          'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 
                          'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 
                          'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 
                          'MiscVal', 'MoSold', 'YrSold'],
        'binary_cols': [],
        'target_col': 'SalePrice'
    }
}

In [6]:
preprocessed_data = {}
print("Preprocessing datasets...")

for name, df in tqdm(datasets.items()):
    specs = preprocessing_specs.get(name, {
        'categorical_cols': df.select_dtypes(include=['object']).columns.tolist(),
        'numerical_cols': df.select_dtypes(include=['int64', 'float64']).columns.tolist(),
        'binary_cols': [],
        'target_col': None
    })
    
    try:
        df_transformed, transactions = preprocess_dataset(
            df, 
            categorical_cols=specs['categorical_cols'],
            numerical_cols=specs['numerical_cols'],
            binary_cols=specs['binary_cols'],
            target_col=specs['target_col']
        )
        
        preprocessed_data[name] = {
            'transformed_df': df_transformed,
            'transactions': transactions,
            'original_df': df
        }
        print(f"Preprocessed {name} dataset: {df_transformed.shape[1]} features")
    except Exception as e:
        print(f"Error preprocessing {name} dataset: {e}")

Preprocessing datasets...


  0%|          | 0/5 [00:00<?, ?it/s]

Preprocessed Adult Census dataset: 134 features
Preprocessed Mushroom dataset: 117 features
Preprocessed Bank Marketing dataset: 81 features
Preprocessed German Credit dataset: 94 features
Preprocessed House Prices dataset: 374 features


## 4. Association Rule Mining <a id="mining"></a>

Now we'll perform association rule mining using the Apriori algorithm.

In [7]:
def mine_association_rules(transactions, min_support=0.1, min_confidence=0.5, max_length=None):
    start_time = time.time()
    
    te = TransactionEncoder()
    te_ary = te.fit(transactions).transform(transactions)
    df = pd.DataFrame(te_ary, columns=te.columns_)
    
    frequent_itemsets = apriori(df, min_support=min_support, use_colnames=True, max_len=max_length, verbose=0)
    
    if not frequent_itemsets.empty:
        rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence)
        end_time = time.time()
        print(f"Mining completed in {end_time - start_time:.2f} seconds. Found {len(rules)} rules.")
        return frequent_itemsets, rules
    else:
        print("No frequent itemsets found with the current parameters.")
        return pd.DataFrame(), pd.DataFrame()

In [8]:
mining_params = {
    'Adult Census': {'min_support': 0.05, 'min_confidence': 0.5, 'max_length': 3},
    'Mushroom': {'min_support': 0.2, 'min_confidence': 0.7, 'max_length': 3},
    'Bank Marketing': {'min_support': 0.05, 'min_confidence': 0.5, 'max_length': 3},
    'German Credit': {'min_support': 0.1, 'min_confidence': 0.6, 'max_length': 3},
    'House Prices': {'min_support': 0.1, 'min_confidence': 0.5, 'max_length': 3}
}

In [9]:
mining_results = {}
print("Mining association rules...")

for name, data in tqdm(preprocessed_data.items()):
    params = mining_params.get(name, {'min_support': 0.1, 'min_confidence': 0.5, 'max_length': 3})
    
    try:
        print(f"\nMining rules for {name}...")
        frequent_itemsets, rules = mine_association_rules(
            data['transactions'],
            min_support=params['min_support'],
            min_confidence=params['min_confidence'],
            max_length=params['max_length']
        )
        
        if not rules.empty:
            mining_results[name] = {
                'frequent_itemsets': frequent_itemsets,
                'rules': rules
            }
            print(f"Found {len(frequent_itemsets)} frequent itemsets and {len(rules)} rules")
        else:
            print(f"No rules found for {name} dataset")
    except Exception as e:
        print(f"Error mining rules for {name} dataset: {e}")

Mining association rules...


  0%|          | 0/5 [00:00<?, ?it/s]


Mining rules for Adult Census...
Mining completed in 3.02 seconds. Found 5076 rules.
Found 2433 frequent itemsets and 5076 rules

Mining rules for Mushroom...
Mining completed in 0.29 seconds. Found 3232 rules.
Found 1594 frequent itemsets and 3232 rules

Mining rules for Bank Marketing...
Mining completed in 4.53 seconds. Found 7827 rules.
Found 3669 frequent itemsets and 7827 rules

Mining rules for German Credit...
Mining completed in 0.09 seconds. Found 5089 rules.
Found 3036 frequent itemsets and 5089 rules

Mining rules for House Prices...
Mining completed in 5.60 seconds. Found 353412 rules.
Found 111472 frequent itemsets and 353412 rules


## 5. Interestingness Measures Implementation <a id="measures"></a>

We'll implement a range of interestingness measures for evaluating association rules.

In [10]:
def calculate_additional_measures(rules):
    rules_copy = rules.copy()
    
    # Standard measures already in the dataframe:
    # - support
    # - confidence
    # - lift
    
    # Calculate additional measures
    
    # Conviction: measure of implication strength
    rules_copy['conviction'] = np.where(
        (1 - rules_copy['confidence']) == 0, 
        float('inf'), 
        (1 - rules_copy['antecedent support']) / (1 - rules_copy['confidence'])
    )
    
    # Leverage (Piatetsky-Shapiro): difference between observed and expected frequency
    rules_copy['leverage'] = rules_copy['support'] - (rules_copy['antecedent support'] * rules_copy['consequent support'])
    
    # Jaccard coefficient: similarity measure
    rules_copy['jaccard'] = rules_copy['support'] / (
        rules_copy['antecedent support'] + rules_copy['consequent support'] - rules_copy['support']
    )
    
    # Cosine: normalized measure of co-occurrence
    rules_copy['cosine'] = rules_copy['support'] / np.sqrt(
        rules_copy['antecedent support'] * rules_copy['consequent support']
    )
    
    # Kulczynski: average of two conditional probabilities
    rules_copy['kulczynski'] = 0.5 * (
        rules_copy['confidence'] + rules_copy['support'] / rules_copy['consequent support']
    )
    
    # All-confidence: minimum of the two confidence values
    rules_copy['all_confidence'] = rules_copy['support'] / np.maximum(
        rules_copy['antecedent support'], rules_copy['consequent support']
    )
    
    # Collective strength
    # P(violation in real data) / P(violation in independent case)
    p_v_real = 1 - rules_copy['support'] - (
        rules_copy['antecedent support'] * (1 - rules_copy['consequent support']) + 
        (1 - rules_copy['antecedent support']) * rules_copy['consequent support']
    )
    p_v_ind = 1 - (
        rules_copy['antecedent support'] * rules_copy['consequent support'] + 
        (1 - rules_copy['antecedent support']) * (1 - rules_copy['consequent support'])
    )
    rules_copy['collective_strength'] = np.where(
        p_v_ind == 0, 
        float('inf'), 
        (1 - p_v_real) / (1 - p_v_ind) * p_v_ind / p_v_real
    )
    
    # Gini index
    p_xy = rules_copy['support']
    p_x = rules_copy['antecedent support']
    p_y = rules_copy['consequent support']
    p_not_x = 1 - p_x
    p_not_y = 1 - p_y
    p_x_y = p_xy / p_x  # P(Y|X)
    p_x_not_y = (p_x - p_xy) / p_x  # P(¬Y|X)
    p_not_x_y = (p_y - p_xy) / p_not_x  # P(Y|¬X)
    p_not_x_not_y = (1 - p_x - p_y + p_xy) / p_not_x  # P(¬Y|¬X)
    
    gini_x = p_x * (p_x_y**2 + p_x_not_y**2) + p_not_x * (p_not_x_y**2 + p_not_x_not_y**2)
    gini_y = p_y**2 + p_not_y**2
    
    rules_copy['gini_index'] = gini_x - gini_y
    
    # Piatetsky-Shapiro: deviation from independence
    rules_copy['ps'] = rules_copy['support'] - (rules_copy['antecedent support'] * rules_copy['consequent support'])
    
    # Odds ratio: ratio of odds of occurrence
    p_xy = rules_copy['support']
    p_x = rules_copy['antecedent support']
    p_y = rules_copy['consequent support']
    p_not_xy = 1 - p_x - p_y + p_xy
    rules_copy['odds_ratio'] = np.where(
        (p_x - p_xy) * (p_y - p_xy) == 0, 
        float('inf'), 
        (p_xy * p_not_xy) / ((p_x - p_xy) * (p_y - p_xy))
    )
    
    # Klosgen: combines support and confidence
    rules_copy['klosgen'] = np.sqrt(rules_copy['support']) * (
        rules_copy['confidence'] - rules_copy['consequent support']
    )
    
    return rules_copy

In [11]:
for name, result in mining_results.items():
    if 'rules' in result and not result['rules'].empty:
        print(f"Calculating interestingness measures for {name}...")
        try:
            mining_results[name]['rules_with_measures'] = calculate_additional_measures(result['rules'])
            print(f"Calculated {len(mining_results[name]['rules_with_measures'].columns) - 9} additional measures")
        except Exception as e:
            print(f"Error calculating measures for {name} dataset: {e}")

Calculating interestingness measures for Adult Census...
Calculated 10 additional measures
Calculating interestingness measures for Mushroom...
Calculated 10 additional measures
Calculating interestingness measures for Bank Marketing...
Calculated 10 additional measures
Calculating interestingness measures for German Credit...
Calculated 10 additional measures
Calculating interestingness measures for House Prices...
Calculated 10 additional measures


## 6. Comparative Analysis <a id="analysis"></a>

Now, let's analyze the effectiveness of different interestingness measures.

In [12]:
def analyze_measure_correlations(rules_df):
    measure_cols = [col for col in rules_df.columns if col not in [
        'antecedent support', 'consequent support', 'antecedents', 'consequents'
    ]]
    
    correlation_matrix = rules_df[measure_cols].corr(method='spearman')
    
    return correlation_matrix

In [13]:
def analyze_rule_rankings(rules_df, top_n=50):
    measure_cols = [col for col in rules_df.columns if col not in [
        'antecedent support', 'consequent support', 'antecedents', 'consequents'
    ]]
    
    top_rules = {}
    for measure in measure_cols:
        sorted_rules = rules_df.sort_values(by=measure, ascending=False)
        top_rules[measure] = sorted_rules.head(top_n)
    
    jaccard_similarity = pd.DataFrame(index=measure_cols, columns=measure_cols, dtype=float)
    for m1 in measure_cols:
        for m2 in measure_cols:
            top_m1 = set(top_rules[m1].index)
            top_m2 = set(top_rules[m2].index)
            intersection = len(top_m1.intersection(top_m2))
            union = len(top_m1.union(top_m2))
            jaccard_similarity.loc[m1, m2] = intersection / union if union > 0 else 0
    
    return {
        'top_rules': top_rules,
        'jaccard_similarity': jaccard_similarity
    }

The `analyze_rule_rankings` function processes a given DataFrame of association rules to identify the top-ranked rules based on various evaluation measures. It also computes the Jaccard similarity between the top-ranked rules across different measures.

### **Processing Steps**
1. **Identify Relevant Columns:**
   - The function first identifies all columns in the `rules_df` except for the ones related to rule support and the antecedents/consequents (e.g., `'antecedent support'`, `'consequent support'`, `'antecedents'`, `'consequents'`).

2. **Sorting and Ranking:**
   - For each evaluation metric (measure), the rules are sorted in descending order based on that metric. The top `n` rules for each measure are stored in the `top_rules` dictionary.

3. **Jaccard Similarity Calculation:**
   - The function computes the Jaccard similarity between the top-ranked rules for each evaluation measure. This similarity is calculated by comparing the sets of top-ranked rule indices for each pair of measures. The formula for Jaccard similarity is:
   
   $$
   \text{Jaccard Similarity} = \frac{\text{Intersection of top rules}}{\text{Union of top rules}}
   $$
   - A similarity matrix is created where each cell represents the Jaccard similarity between two measures.

### **Function Output**
The function returns a dictionary with two key elements:
1. **top_rules (dict):** A dictionary where each key is a measure, and the value is a DataFrame containing the top `n` rules based on that measure.
2. **jaccard_similarity (DataFrame):** A matrix containing the Jaccard similarity values between the top rules for each pair of measures.

In [14]:
def analyze_rule_diversity(rules_df, measure, top_n=50):
    sorted_rules = rules_df.sort_values(by=measure, ascending=False)
    top_rules = sorted_rules.head(top_n)
    
    antecedent_items = set()
    for items in top_rules['antecedents']:
        antecedent_items.update(items)
    
    consequent_items = set()
    for items in top_rules['consequents']:
        consequent_items.update(items)
    
    support_range = (top_rules['support'].min(), top_rules['support'].max())
    support_std = top_rules['support'].std()
    
    confidence_range = (top_rules['confidence'].min(), top_rules['confidence'].max())
    confidence_std = top_rules['confidence'].std()
    
    # Calculate entropy-based diversity
    # Higher entropy means more diverse rules
    unique_antecedents = top_rules['antecedents'].apply(frozenset).value_counts()
    p_antecedents = unique_antecedents / unique_antecedents.sum()
    entropy_antecedents = -np.sum(p_antecedents * np.log2(p_antecedents))
    
    unique_consequents = top_rules['consequents'].apply(frozenset).value_counts()
    p_consequents = unique_consequents / unique_consequents.sum()
    entropy_consequents = -np.sum(p_consequents * np.log2(p_consequents))
    
    return {
        'num_antecedent_items': len(antecedent_items),
        'num_consequent_items': len(consequent_items),
        'support_range': support_range,
        'support_std': support_std,
        'confidence_range': confidence_range,
        'confidence_std': confidence_std,
        'entropy_antecedents': entropy_antecedents,
        'entropy_consequents': entropy_consequents
    }

The `analyze_rule_diversity` function evaluates the diversity of association rules based on a specified evaluation measure. It computes various statistics related to the antecedents, consequents, and the overall rule characteristics, including entropy-based diversity metrics.

### **Processing Steps**
1. **Sorting and Ranking:**
   - The function sorts the rules in descending order based on the specified evaluation measure (`measure`) and selects the top `n` rules.

2. **Antecedent and Consequent Item Collection:**
   - The antecedent and consequent items are extracted from the top-ranked rules and stored in sets, ensuring uniqueness.

3. **Support and Confidence Statistics:**
   - The function calculates the range and standard deviation of the `support` and `confidence` values for the top `n` rules.

4. **Entropy-Based Diversity Calculation:**
   - **Entropy** is used to measure the diversity of antecedents and consequents:
     - **Higher entropy** indicates more diversity in the antecedent and consequent sets.
     - For both antecedents and consequents, the function calculates the distribution of unique items and then computes entropy using the formula:
       
       $$
       H(X) = -\sum p(x) \log_2(p(x))
       $$
       where $ p(x) $ is the probability of occurrence of a particular antecedent or consequent item.

### **Function Output**
The function returns a dictionary containing the following diversity statistics:
- **num_antecedent_items:** The total number of unique antecedent items in the top `n` rules.
- **num_consequent_items:** The total number of unique consequent items in the top `n` rules.
- **support_range:** The range (min, max) of the support values for the top `n` rules.
- **support_std:** The standard deviation of the support values for the top `n` rules.
- **confidence_range:** The range (min, max) of the confidence values for the top `n` rules.
- **confidence_std:** The standard deviation of the confidence values for the top `n` rules.
- **entropy_antecedents:** The entropy of the antecedent items, indicating the diversity of antecedents.
- **entropy_consequents:** The entropy of the consequent items, indicating the diversity of consequents.


In [15]:
analysis_results = {}
for name, result in mining_results.items():
    if 'rules_with_measures' in result and not result['rules_with_measures'].empty:
        print(f"\nAnalyzing measures for {name} dataset...")
        
        rules_df = result['rules_with_measures']
        
        try:
            correlation_matrix = analyze_measure_correlations(rules_df)
            
            ranking_analysis = analyze_rule_rankings(rules_df, top_n=min(50, len(rules_df)))
            
            diversity_analysis = {}
            measure_cols = [col for col in rules_df.columns if col not in [
                'antecedent support', 'consequent support', 'antecedents', 'consequents'
            ]]
            
            for measure in measure_cols:
                diversity_analysis[measure] = analyze_rule_diversity(
                    rules_df, measure, top_n=min(50, len(rules_df))
                )
            
            analysis_results[name] = {
                'correlation_matrix': correlation_matrix,
                'ranking_analysis': ranking_analysis,
                'diversity_analysis': diversity_analysis
            }
            
            print(f"Analysis completed for {name} dataset")
        except Exception as e:
            print(f"Error analyzing {name} dataset: {e}")


Analyzing measures for Adult Census dataset...
Analysis completed for Adult Census dataset

Analyzing measures for Mushroom dataset...
Analysis completed for Mushroom dataset

Analyzing measures for Bank Marketing dataset...
Analysis completed for Bank Marketing dataset

Analyzing measures for German Credit dataset...
Analysis completed for German Credit dataset

Analyzing measures for House Prices dataset...
Analysis completed for House Prices dataset


## 7. Cross-Dataset Stability Analysis <a id="cross-dataset"></a>

Let's analyze the stability of measures across different datasets.

In [16]:
def analyze_cross_dataset_stability():
    all_measures = []
    for name, result in mining_results.items():
        if 'rules_with_measures' in result and not result['rules_with_measures'].empty:
            rules_df = result['rules_with_measures']
            measure_cols = [col for col in rules_df.columns if col not in [
                'antecedent support', 'consequent support', 'antecedents', 'consequents'
            ]]
            all_measures.extend(measure_cols)
    all_measures = list(set(all_measures))
    
    stability_metrics = {
        'correlation_variation': {measure: [] for measure in all_measures},
        'diversity_variation': {measure: [] for measure in all_measures}
    }
    
    for measure in all_measures:
        correlations_across_datasets = []
        for name, result in analysis_results.items():
            if 'correlation_matrix' in result:
                corr_matrix = result['correlation_matrix']
                if measure in corr_matrix.columns:
                    correlations = corr_matrix[measure].drop(measure).values
                    correlations_across_datasets.append(correlations)
                    
        if len(correlations_across_datasets) > 1:
            corr_arrays = [np.array(c) for c in correlations_across_datasets if len(c) > 0]
            
            if len(corr_arrays) > 1 and all(len(c) == len(corr_arrays[0]) for c in corr_arrays):
                stacked = np.vstack(corr_arrays)
                mean_corr = np.mean(stacked, axis=0)
                std_corr = np.std(stacked, axis=0)
                cv_corr = np.where(np.abs(mean_corr) > 1e-10, std_corr / np.abs(mean_corr), 0)
                stability_metrics['correlation_variation'][measure] = np.mean(cv_corr)

    for measure in all_measures:
        diversity_metrics = []
        for name, result in analysis_results.items():
            if 'diversity_analysis' in result and measure in result['diversity_analysis']:
                diversity = result['diversity_analysis'][measure]
                metrics = {
                    'entropy_antecedents': diversity.get('entropy_antecedents', 0),
                    'entropy_consequents': diversity.get('entropy_consequents', 0),
                    'support_std': diversity.get('support_std', 0)
                }
                diversity_metrics.append(metrics)
        
        if len(diversity_metrics) > 1:
            entropy_ant = [d['entropy_antecedents'] for d in diversity_metrics]
            entropy_cons = [d['entropy_consequents'] for d in diversity_metrics]
            support_std = [d['support_std'] for d in diversity_metrics]
            
            cv_entropy_ant = np.std(entropy_ant) / np.mean(entropy_ant) if np.mean(entropy_ant) > 0 else 0
            cv_entropy_cons = np.std(entropy_cons) / np.mean(entropy_cons) if np.mean(entropy_cons) > 0 else 0
            cv_support_std = np.std(support_std) / np.mean(support_std) if np.mean(support_std) > 0 else 0
            
            stability_metrics['diversity_variation'][measure] = np.mean([
                cv_entropy_ant, cv_entropy_cons, cv_support_std
            ])
    
    return stability_metrics

The `analyze_cross_dataset_stability` function is designed to evaluate the stability of association rule mining results across different datasets. Specifically, it measures two key aspects of stability:
1. **Correlation Variation**: The variation of correlation between different measures across datasets.
2. **Diversity Variation**: The variation in diversity metrics (entropy of antecedents, entropy of consequents, and standard deviation of support) across datasets.

### **Processing Steps**
1. **Identify All Measures**:
   - The function first scans through the `mining_results` and extracts all the measure columns from the association rules in `rules_with_measures`. These measures exclude specific columns like 'antecedent support', 'consequent support', 'antecedents', and 'consequents'.
   - A unique set of measures across all results is stored for later use.

2. **Correlation Variation Calculation**:
   - For each measure, the function collects the correlation values from the `correlation_matrix` in the `analysis_results` for each dataset.
   - It checks whether the correlation values for the same measure are consistent across multiple datasets.
   - The **coefficient of variation (CV)** of the correlation is computed using the formula:
     $$
     CV_{\text{corr}} = \frac{\text{std}(r)}{\text{mean}(r)}
     $$
     where $ r $ represents the correlation values.
   - The average CV for each measure is calculated to represent the stability of the correlation across datasets.

3. **Diversity Variation Calculation**:
   - The function also calculates the variation in diversity for each measure using the `diversity_analysis` from the `analysis_results`.
   - It collects three diversity metrics: 
     - **Entropy of Antecedents**: Measures the diversity in the antecedents of the rules.
     - **Entropy of Consequents**: Measures the diversity in the consequents of the rules.
     - **Support Standard Deviation**: Measures the variation in the support of the rules.
   - The CV for each diversity metric is calculated using the formula:
     $$
     CV_{\text{diversity}} = \frac{\text{std}(d)}{\text{mean}(d)}
     $$
     where $ d $ represents the respective diversity metrics (entropy or support).
   - The average of the CVs for the antecedents, consequents, and support is used to represent the diversity variation for each measure.

### **Function Output**
The function returns a dictionary `stability_metrics` containing the following:
- **correlation_variation**: A dictionary where each measure maps to its correlation variation (CV) across datasets.
- **diversity_variation**: A dictionary where each measure maps to its diversity variation (CV) across datasets.

In [17]:
print("Performing cross-dataset stability analysis...")
stability_metrics = analyze_cross_dataset_stability()

stability_scores = {}
for measure in stability_metrics['correlation_variation']:
    if measure in stability_metrics['correlation_variation'] and measure in stability_metrics['diversity_variation']:
        corr_var = stability_metrics['correlation_variation'][measure]
        div_var = stability_metrics['diversity_variation'][measure]
        
        if not np.isnan(corr_var) and not np.isnan(div_var):
            stability_scores[measure] = 1 - np.mean([
                corr_var / max(stability_metrics['correlation_variation'].values()),
                div_var / max(stability_metrics['diversity_variation'].values())
            ])

print("Cross-dataset stability analysis completed")

Performing cross-dataset stability analysis...
Cross-dataset stability analysis completed


## 8. Statistical Robustness Testing <a id="statistical"></a>

Let's perform statistical tests to evaluate the robustness of our findings.

In [18]:
def perform_wilcoxon_tests(top_n=50):
    all_measures = []
    for name, result in mining_results.items():
        if 'rules_with_measures' in result and not result['rules_with_measures'].empty:
            rules_df = result['rules_with_measures']
            measure_cols = [col for col in rules_df.columns if col not in [
                'antecedent support', 'consequent support', 'antecedents', 'consequents'
            ]]
            all_measures.extend(measure_cols)
    all_measures = list(set(all_measures))
    
    test_results = {
        dataset_name: {
            'p_values': pd.DataFrame(index=all_measures, columns=all_measures),
            'significant_differences': pd.DataFrame(index=all_measures, columns=all_measures)
        }
        for dataset_name in mining_results.keys()
    }
    
    for dataset_name, result in mining_results.items():
        if 'rules_with_measures' not in result or result['rules_with_measures'].empty:
            continue
        
        rules_df = result['rules_with_measures']
        dataset_measures = [col for col in rules_df.columns if col not in [
            'antecedent support', 'consequent support', 'antecedents', 'consequents'
        ]]
        
        ranks = pd.DataFrame(index=rules_df.index)
        for measure in dataset_measures:
            ranks[measure] = rules_df[measure].rank(ascending=False)
        
        for m1 in dataset_measures:
            for m2 in dataset_measures:
                if m1 != m2:
                    try:
                        stat, p_value = wilcoxon(ranks[m1], ranks[m2])
                        
                        test_results[dataset_name]['p_values'].loc[m1, m2] = p_value
                        
                        test_results[dataset_name]['significant_differences'].loc[m1, m2] = p_value < 0.05
                    except Exception:
                        test_results[dataset_name]['p_values'].loc[m1, m2] = 1.0
                        test_results[dataset_name]['significant_differences'].loc[m1, m2] = False
                else:
                    test_results[dataset_name]['p_values'].loc[m1, m2] = 1.0
                    test_results[dataset_name]['significant_differences'].loc[m1, m2] = False
    
    return test_results

The `perform_wilcoxon_tests` function applies the **Wilcoxon Signed-Rank Test** to compare the rankings of various measures across multiple datasets. It evaluates whether the distributions of two measures are significantly different by comparing their rankings in each dataset.

### **Processing Steps**
1. **Identify Measures**:
   - The function identifies all the relevant measures (columns) from the `rules_with_measures` DataFrame, excluding columns related to support and antecedents/consequents.

2. **Rank Rules**:
   - For each dataset in the `mining_results`, the rules are ranked based on their measures (excluding antecedent and consequent columns).
   - The `rank` method ranks each rule from highest to lowest for each measure.

3. **Wilcoxon Signed-Rank Test**:
   - For each pair of measures $ m_1 $ and $ m_2 $, the function performs the **Wilcoxon Signed-Rank Test** to compare the ranks of these two measures.
   - The Wilcoxon test assesses whether there is a significant difference in the distributions of the ranks of the two measures across the rules.
   - The result is a p-value that indicates whether the difference is statistically significant (p-value < 0.05).

4. **Test Results Storage**:
   - The results are stored in a dictionary `test_results`, which contains the following data:
     - **p_values**: A DataFrame of p-values for each pair of measures.
     - **significant_differences**: A DataFrame of boolean values indicating whether the difference is statistically significant (True if p-value < 0.05).

5. **Edge Cases**:
   - If an exception occurs during the Wilcoxon test (for example, when the data for a particular measure cannot be compared), the p-value is set to 1.0, and the significant difference is marked as False.
   - If two measures are the same, the p-value is set to 1.0 (no significant difference).

In [19]:
print("Performing statistical robustness testing...")
statistical_tests = perform_wilcoxon_tests()
print("Statistical testing completed")

Performing statistical robustness testing...
Statistical testing completed


## 9. Visualization of Results <a id="visualization"></a>

Now, let's visualize our results to better understand the relationships and effectiveness of different measures.
**All the visualizations are saved in the visualizations folder**

In [20]:
def visualize_measure_correlations(dataset_name, correlation_matrix):
    plt.figure(figsize=(14, 12))
    
    mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))
    
    sns.heatmap(
        correlation_matrix, 
        mask=mask,
        annot=True, 
        fmt=".2f", 
        cmap='coolwarm', 
        vmin=-1, 
        vmax=1,
        square=True,
        linewidths=0.5,
        cbar_kws={"shrink": 0.8}
    )
    
    plt.title(f'Spearman Rank Correlation Between Measures ({dataset_name})', fontsize=16)
    plt.tight_layout()
    
    return plt.gcf()

In [21]:
def visualize_measure_clusters(dataset_name, correlation_matrix):
    distance_matrix = 1 - np.abs(correlation_matrix)
    
    linkage_matrix = scipy.cluster.hierarchy.linkage(
        scipy.spatial.distance.squareform(distance_matrix), 
        method='average'
    )
    
    plt.figure(figsize=(14, 8))
    scipy.cluster.hierarchy.dendrogram(
        linkage_matrix,
        labels=correlation_matrix.columns,
        leaf_font_size=12,
        color_threshold=0.5,
        orientation='top'
    )
    
    plt.title(f'Hierarchical Clustering of Measures ({dataset_name})', fontsize=16)
    plt.xlabel('Measures', fontsize=14)
    plt.ylabel('Distance (1 - |correlation|)', fontsize=14)
    plt.axhline(y=0.5, c='k', linestyle='--', alpha=0.5)
    plt.tight_layout()
    
    return plt.gcf()

In [22]:
def visualize_rule_diversity(dataset_name, diversity_analysis):
    measures = list(diversity_analysis.keys())
    entropy_ant = [diversity_analysis[m]['entropy_antecedents'] for m in measures]
    entropy_cons = [diversity_analysis[m]['entropy_consequents'] for m in measures]
    support_std = [diversity_analysis[m]['support_std'] for m in measures]
    
    fig, axes = plt.subplots(3, 1, figsize=(14, 16))
    
    axes[0].bar(range(len(measures)), entropy_ant, color=colors[:len(measures)])
    axes[0].set_title('Entropy of Antecedents in Top Rules', fontsize=14)
    axes[0].set_xticks(range(len(measures)))
    axes[0].set_xticklabels(measures, rotation=90)
    axes[0].set_ylabel('Entropy')
    
    axes[1].bar(range(len(measures)), entropy_cons, color=colors[:len(measures)])
    axes[1].set_title('Entropy of Consequents in Top Rules', fontsize=14)
    axes[1].set_xticks(range(len(measures)))
    axes[1].set_xticklabels(measures, rotation=90)
    axes[1].set_ylabel('Entropy')
    
    axes[2].bar(range(len(measures)), support_std, color=colors[:len(measures)])
    axes[2].set_title('Standard Deviation of Support in Top Rules', fontsize=14)
    axes[2].set_xticks(range(len(measures)))
    axes[2].set_xticklabels(measures, rotation=90)
    axes[2].set_ylabel('Standard Deviation')
    
    plt.suptitle(f'Rule Diversity Metrics ({dataset_name})', fontsize=16)
    plt.tight_layout()
    plt.subplots_adjust(top=0.95)
    
    return fig

In [23]:
def visualize_jaccard_similarity(dataset_name, jaccard_similarity):
    plt.figure(figsize=(14, 12))
    
    mask = np.triu(np.ones_like(jaccard_similarity, dtype=bool))
    
    sns.heatmap(
        jaccard_similarity, 
        mask=mask,
        annot=True, 
        fmt=".2f", 
        cmap='YlGnBu', 
        vmin=0, 
        vmax=1,
        square=True,
        linewidths=0.5,
        cbar_kws={"shrink": 0.8},
        xticklabels=jaccard_similarity.columns,
        yticklabels=jaccard_similarity.index
    )
    
    plt.title(f'Jaccard Similarity Between Top Rules ({dataset_name})', fontsize=16)
    plt.xticks(rotation=45, ha='right')
    plt.yticks(rotation=0)
    plt.tight_layout()
    
    return plt.gcf()

In [24]:
def visualize_cross_dataset_stability(stability_scores):
    sorted_measures = sorted(stability_scores.items(), key=lambda x: x[1], reverse=True)
    measures = [m[0] for m in sorted_measures]
    scores = [m[1] for m in sorted_measures]
    
    plt.figure(figsize=(14, 8))
    plt.bar(range(len(measures)), scores, color=colors[:len(measures)])
    plt.xticks(range(len(measures)), measures, rotation=90)
    plt.title('Cross-Dataset Stability of Interestingness Measures', fontsize=16)
    plt.ylabel('Stability Score (higher is better)', fontsize=14)
    plt.ylim(0, 1)
    
    plt.grid(axis='y', linestyle='--', alpha=0.7)
    plt.tight_layout()
    
    return plt.gcf()

In [25]:
def visualize_significance_matrix(dataset_name, significant_differences):
    plt.figure(figsize=(14, 12))
    
    sns.heatmap(
        significant_differences.astype(int), 
        annot=True, 
        fmt="d", 
        cmap='Reds', 
        vmin=0, 
        vmax=1,
        square=True,
        linewidths=0.5,
        cbar_kws={"shrink": 0.8}
    )
    
    plt.title(f'Significant Differences Between Measures ({dataset_name})', fontsize=16)
    plt.tight_layout()
    
    return plt.gcf()

In [26]:
def create_measure_comparison_summary():
    all_measures = []
    for name, result in mining_results.items():
        if 'rules_with_measures' in result and not result['rules_with_measures'].empty:
            rules_df = result['rules_with_measures']
            measure_cols = [col for col in rules_df.columns if col not in [
                'antecedent support', 'consequent support', 'antecedents', 'consequents'
            ]]
            all_measures.extend(measure_cols)
    all_measures = list(set(all_measures))
    
    summary = pd.DataFrame(index=all_measures, columns=[
        'Avg Correlation', 'Stability Score', 'Avg Diversity', 
        'Significant Differences', 'Recommended For'
    ])
    
    for measure in all_measures:
        avg_corr = []
        for name, result in analysis_results.items():
            if 'correlation_matrix' in result:
                corr_matrix = result['correlation_matrix']
                if measure in corr_matrix.columns:
                    correlations = corr_matrix[measure].drop(measure).abs()
                    avg_corr.append(correlations.mean())
        
        summary.loc[measure, 'Avg Correlation'] = np.mean(avg_corr) if avg_corr else np.nan
        
        summary.loc[measure, 'Stability Score'] = stability_scores.get(measure, np.nan)
        
        avg_div = []
        for name, result in analysis_results.items():
            if 'diversity_analysis' in result and measure in result['diversity_analysis']:
                diversity = result['diversity_analysis'][measure]
                avg_div.append(diversity.get('entropy_antecedents', 0) + diversity.get('entropy_consequents', 0))
        
        summary.loc[measure, 'Avg Diversity'] = np.mean(avg_div) if avg_div else np.nan
        
        sig_diff_count = 0
        total_tests = 0
        for dataset_name, tests in statistical_tests.items():
            if 'significant_differences' in tests:
                sig_diff_matrix = tests['significant_differences']
                if measure in sig_diff_matrix.index:
                    sig_diff_count += sig_diff_matrix.loc[measure].sum()
                    total_tests += len(sig_diff_matrix.columns) - 1
        
        summary.loc[measure, 'Significant Differences'] = sig_diff_count / total_tests if total_tests > 0 else np.nan
    
    for measure in all_measures:
        recommendations = []
        
        stability = summary.loc[measure, 'Stability Score']
        if not np.isnan(stability):
            if stability > 0.8:
                recommendations.append("Cross-Dataset Analysis")
            elif stability < 0.4:
                recommendations.append("Dataset-Specific Analysis")
        
        diversity = summary.loc[measure, 'Avg Diversity']
        if not np.isnan(diversity):
            if diversity > 1.5:
                recommendations.append("Discovering Diverse Rules")
            elif diversity < 0.8:
                recommendations.append("Finding Core Patterns")
        
        correlation = summary.loc[measure, 'Avg Correlation']
        if not np.isnan(correlation):
            if correlation < 0.3:
                recommendations.append("Unique Perspective")
            elif correlation > 0.7:
                recommendations.append("Consensus Measure")
        
        summary.loc[measure, 'Recommended For'] = ", ".join(recommendations)
    
    return summary

The `create_measure_comparison_summary` function generates a summary table that compares various measures used in association rule mining across multiple datasets. The summary includes key metrics like average correlation, stability score, average diversity, significant differences, and recommendations for the usage of each measure.

### **Processing Steps**
1. **Identify All Measures**:
   - The function first identifies all unique measures across all datasets. These measures are derived from the `rules_with_measures` DataFrame, excluding columns related to antecedent and consequent support.

2. **Create Summary DataFrame**:
   - A DataFrame `summary` is initialized with the measures as the index and the following columns:
     - **Avg Correlation**: The average correlation of the measure across datasets.
     - **Stability Score**: The stability score for the measure, retrieved from pre-calculated `stability_scores`.
     - **Avg Diversity**: The average diversity of the measure across datasets, calculated from the entropy of antecedents and consequents.
     - **Significant Differences**: The proportion of statistically significant differences found for the measure across datasets.
     - **Recommended For**: A recommendation for the type of analysis based on the measure’s characteristics.

3. **Compute Average Correlation**:
   - For each measure, the function calculates the average correlation with other measures across different datasets. The correlation is taken from the `correlation_matrix` in `analysis_results`, excluding self-correlations (correlations of the measure with itself).
   
4. **Retrieve Stability Scores**:
   - The function looks up the stability score for each measure from the `stability_scores` dictionary. If no score is found, `NaN` is used.

5. **Calculate Average Diversity**:
   - For each measure, the function calculates the average diversity across datasets. Diversity is based on the entropy of antecedents and consequents, which is stored in the `diversity_analysis` dictionary in `analysis_results`.

6. **Calculate Significant Differences**:
   - The function counts the number of significant differences for each measure, as identified by the Wilcoxon test. This is derived from the `significant_differences` matrix in `statistical_tests`.

7. **Recommendations**:
   - For each measure, the function generates recommendations based on the calculated metrics:
     - **Stability**:
       - If the stability score is high (> 0.8), recommend "Cross-Dataset Analysis".
       - If the stability score is low (< 0.4), recommend "Dataset-Specific Analysis".
     - **Diversity**:
       - If the diversity is high (> 1.5), recommend "Discovering Diverse Rules".
       - If the diversity is low (< 0.8), recommend "Finding Core Patterns".
     - **Correlation**:
       - If the average correlation is low (< 0.3), recommend "Unique Perspective".
       - If the average correlation is high (> 0.7), recommend "Consensus Measure".
   - The recommendations are stored as a string in the `Recommended For` column of the summary.

In [27]:
print("Generating visualizations...")

if not os.path.exists('visualizations'):
    os.makedirs('visualizations')

for name, result in analysis_results.items():
    if 'correlation_matrix' in result:
        print(f"Visualizing results for {name} dataset...")
        
        fig_corr = visualize_measure_correlations(name, result['correlation_matrix'])
        fig_corr.savefig(f'visualizations/{name}_correlation_heatmap.png', dpi=150, bbox_inches='tight')
        
        try:
            fig_cluster = visualize_measure_clusters(name, result['correlation_matrix'])
            fig_cluster.savefig(f'visualizations/{name}_measure_clusters.png', dpi=150, bbox_inches='tight')
        except Exception as e:
            print(f"Error creating measure clusters for {name}: {e}")
        
        if 'ranking_analysis' in result and 'jaccard_similarity' in result['ranking_analysis']:
            jaccard_similarity = result['ranking_analysis']['jaccard_similarity']
            fig_jaccard = visualize_jaccard_similarity(name, jaccard_similarity)
            fig_jaccard.savefig(f'visualizations/{name}_jaccard_similarity.png', dpi=150, bbox_inches='tight')
        
        if 'diversity_analysis' in result:
            fig_diversity = visualize_rule_diversity(name, result['diversity_analysis'])
            fig_diversity.savefig(f'visualizations/{name}_rule_diversity.png', dpi=150, bbox_inches='tight')
        
        if name in statistical_tests and 'significant_differences' in statistical_tests[name]:
            fig_sig = visualize_significance_matrix(name, statistical_tests[name]['significant_differences'])
            fig_sig.savefig(f'visualizations/{name}_significant_differences.png', dpi=150, bbox_inches='tight')
        
        plt.close('all')

if stability_scores:
    fig_stability = visualize_cross_dataset_stability(stability_scores)
    fig_stability.savefig('visualizations/cross_dataset_stability.png', dpi=150, bbox_inches='tight')
    plt.close('all')

summary_comparison = create_measure_comparison_summary()
print("Visualizations completed")

print("\nSummary Comparison of Interestingness Measures:")
display(summary_comparison)

Generating visualizations...
Visualizing results for Adult Census dataset...
Visualizing results for Mushroom dataset...
Visualizing results for Bank Marketing dataset...
Visualizing results for German Credit dataset...
Visualizing results for House Prices dataset...
Visualizations completed

Summary Comparison of Interestingness Measures:


Unnamed: 0,Avg Correlation,Stability Score,Avg Diversity,Significant Differences,Recommended For
odds_ratio,0.462323,0.658296,7.634986,0.071429,Discovering Diverse Rules
support,0.292543,0.597289,6.868625,0.071429,"Discovering Diverse Rules, Unique Perspective"
gini_index,0.415608,0.327322,8.374914,0.057143,"Dataset-Specific Analysis, Discovering Diverse..."
zhangs_metric,0.504369,0.501875,7.670779,0.071429,Discovering Diverse Rules
all_confidence,0.453595,0.654175,7.695466,0.071429,Discovering Diverse Rules
ps,0.508589,0.530641,8.28845,0.071429,Discovering Diverse Rules
klosgen,0.499883,0.433946,8.057307,0.071429,Discovering Diverse Rules
cosine,0.411817,0.477737,7.876869,0.071429,Discovering Diverse Rules
lift,0.490595,0.501859,8.040964,0.071429,Discovering Diverse Rules
conviction,0.277443,0.23392,6.861409,0.071429,"Dataset-Specific Analysis, Discovering Diverse..."


## 10. Conclusions and Recommendations <a id="conclusions"></a>

Let's wrap up with conclusions and recommendations based on our analysis.

In [28]:
def generate_recommendations():
    summary = create_measure_comparison_summary()
    
    top_stability = summary.sort_values(by='Stability Score', ascending=False).head(3).index.tolist()
    top_diversity = summary.sort_values(by='Avg Diversity', ascending=False).head(3).index.tolist()
    top_uniqueness = summary.sort_values(by='Avg Correlation', ascending=True).head(3).index.tolist()
    
    dataset_recommendations = {}
    for name in mining_results.keys():
        if name in analysis_results:
            if 'rules_with_measures' in mining_results[name]:
                rules_df = mining_results[name]['rules_with_measures']
                
                top_consensus_rules = []
                if not rules_df.empty:
                    measure_cols = [col for col in rules_df.columns if col not in [
                        'antecedent support', 'consequent support', 'antecedents', 'consequents'
                    ]]
                    ranks = pd.DataFrame(index=rules_df.index)
                    for measure in measure_cols:
                        ranks[measure] = rules_df[measure].rank(ascending=False)
                    
                    ranks['avg_rank'] = ranks.mean(axis=1)
                    
                    top_rules = ranks.sort_values(by='avg_rank').head(5)
                    
                    for idx in top_rules.index:
                        antecedents = rules_df.loc[idx, 'antecedents']
                        consequents = rules_df.loc[idx, 'consequents']
                        support = rules_df.loc[idx, 'support']
                        confidence = rules_df.loc[idx, 'confidence']
                        
                        rule_str = f"{set(antecedents)} => {set(consequents)} [support={support:.3f}, confidence={confidence:.3f}]"
                        top_consensus_rules.append(rule_str)
                
                if 'correlation_matrix' in analysis_results[name]:
                    corr_matrix = analysis_results[name]['correlation_matrix']
                    
                    avg_corr = {}
                    for measure in corr_matrix.columns:
                        avg_corr[measure] = corr_matrix[measure].drop(measure).abs().mean()
                    
                    unique_measures = sorted(avg_corr.items(), key=lambda x: x[1])[:3]
                    unique_measures = [m[0] for m in unique_measures]
                    
                    diverse_measures = []
                    if 'diversity_analysis' in analysis_results[name]:
                        diversity_values = {}
                        for measure, metrics in analysis_results[name]['diversity_analysis'].items():
                            diversity_values[measure] = metrics.get('entropy_antecedents', 0) + metrics.get('entropy_consequents', 0)
                        
                        diverse_measures = sorted(diversity_values.items(), key=lambda x: x[1], reverse=True)[:3]
                        diverse_measures = [m[0] for m in diverse_measures]
                    
                    dataset_recommendations[name] = {
                        'top_consensus_rules': top_consensus_rules,
                        'unique_measures': unique_measures,
                        'diverse_measures': diverse_measures
                    }
    
    measure_recommendations = {}
    for measure in summary.index:
        strengths = []
        weaknesses = []
        
        stability = summary.loc[measure, 'Stability Score']
        if not np.isnan(stability):
            if stability > 0.7:
                strengths.append("High stability across datasets")
            elif stability < 0.4:
                weaknesses.append("Low stability across different datasets")
        
        diversity = summary.loc[measure, 'Avg Diversity']
        if not np.isnan(diversity):
            if diversity > 1.5:
                strengths.append("Discovers diverse and novel rules")
            elif diversity < 0.8:
                weaknesses.append("Tends to focus on similar rules")
        
        correlation = summary.loc[measure, 'Avg Correlation']
        if not np.isnan(correlation):
            if correlation < 0.3:
                strengths.append("Provides a unique perspective")
            elif correlation > 0.7:
                weaknesses.append("Highly correlated with other measures")
        
        sig_diff = summary.loc[measure, 'Significant Differences']
        if not np.isnan(sig_diff):
            if sig_diff > 0.7:
                strengths.append("Statistically different from most other measures")
            elif sig_diff < 0.3:
                weaknesses.append("Not statistically different from other measures")
        
        measure_recommendations[measure] = {
            'strengths': strengths,
            'weaknesses': weaknesses,
            'recommended_for': summary.loc[measure, 'Recommended For']
        }
    
    return {
        'top_measures': {
            'stability': top_stability,
            'diversity': top_diversity,
            'uniqueness': top_uniqueness
        },
        'dataset_recommendations': dataset_recommendations,
        'measure_recommendations': measure_recommendations
    }

The `generate_recommendations` function generates recommendations for mining results and analysis measures based on specific metrics. It evaluates various measures across datasets and provides insights on their stability, diversity, correlation, and performance, as well as recommends optimal uses for each measure.

### **Processing Steps**

1. **Top Measures Identification**:
   - The function first calls `create_measure_comparison_summary()` to get a summary of all measures.
   - It then identifies the top 3 measures based on:
     - **Stability**: Measures with the highest stability scores.
     - **Diversity**: Measures with the highest average diversity.
     - **Uniqueness**: Measures with the lowest average correlation (indicating uniqueness).

2. **Dataset-Specific Recommendations**:
   - For each dataset (`mining_results`), the function checks if it contains relevant information:
     - **Top Consensus Rules**: For each dataset, it ranks rules by their measures and selects the top 5 based on their average rank. The top rules are displayed with their antecedents, consequents, support, and confidence.
     - **Unique Measures**: It identifies the top 3 measures with the lowest correlation (i.e., the most unique) by calculating the average correlation with all other measures.
     - **Diverse Measures**: It selects the top 3 measures with the highest diversity values, based on the entropy of antecedents and consequents.

3. **Measure-Specific Recommendations**:
   - For each measure, the function identifies its strengths and weaknesses based on:
     - **Stability**: High stability indicates that the measure performs consistently across datasets. Low stability suggests that the measure’s performance may vary significantly across datasets.
     - **Diversity**: High diversity suggests that the measure identifies novel or diverse rules, while low diversity means the measure focuses on a narrow set of similar rules.
     - **Correlation**: Low correlation with other measures implies a unique perspective, while high correlation indicates that the measure may be too similar to others.
     - **Statistical Significance**: Measures that show significant differences compared to others are highlighted as strong, while those with little to no statistical difference are noted as weak.
   - The function also assigns recommendations for how to use each measure based on the analysis:
     - **High Stability**: "Cross-Dataset Analysis"
     - **Low Stability**: "Dataset-Specific Analysis"
     - **High Diversity**: "Discovering Diverse Rules"
     - **Low Diversity**: "Finding Core Patterns"
     - **Low Correlation**: "Unique Perspective"
     - **High Correlation**: "Consensus Measure"
     - **High Statistical Significance**: "Statistically Different from Other Measures"
     - **Low Statistical Significance**: "Not Statistically Different"



In [29]:
print("Generating final recommendations...")
recommendations = generate_recommendations()

print("\nTop Measures by Criteria:")
print(f"Stability: {', '.join(recommendations['top_measures']['stability'])}")
print(f"Diversity: {', '.join(recommendations['top_measures']['diversity'])}")
print(f"Uniqueness: {', '.join(recommendations['top_measures']['uniqueness'])}")

Generating final recommendations...

Top Measures by Criteria:
Stability: jaccard, odds_ratio, all_confidence
Diversity: gini_index, ps, leverage
Uniqueness: confidence, conviction, support


In [30]:
print("\nDataset-Specific Recommendations:")
for name, recs in recommendations['dataset_recommendations'].items():
    print(f"\n{name}:")
    print(f"Unique Perspective Measures: {', '.join(recs['unique_measures'])}")
    if 'diverse_measures' in recs and recs['diverse_measures']:
        print(f"Diverse Rule Discovery Measures: {', '.join(recs['diverse_measures'])}")
    print("Top Consensus Rules:")
    for i, rule in enumerate(recs['top_consensus_rules']):
        print(f"  {i+1}. {rule}")


Dataset-Specific Recommendations:

Adult Census:
Unique Perspective Measures: confidence, support, conviction
Diverse Rule Discovery Measures: zhangs_metric, kulczynski, all_confidence
Top Consensus Rules:
  1. {'relationship_Husband'} => {'marital-status_Married-civ-spouse', 'sex_Male'} [support=0.405, confidence=0.999]
  2. {'relationship_Husband'} => {'marital-status_Married-civ-spouse'} [support=0.405, confidence=0.999]
  3. {'sex_Male', 'relationship_Husband'} => {'marital-status_Married-civ-spouse'} [support=0.405, confidence=0.999]
  4. {'capital-gain_Q1', 'relationship_Husband'} => {'marital-status_Married-civ-spouse'} [support=0.400, confidence=0.999]
  5. {'capital-loss_Q1', 'relationship_Husband'} => {'marital-status_Married-civ-spouse'} [support=0.379, confidence=0.999]

Mushroom:
Unique Perspective Measures: conviction, support, confidence
Diverse Rule Discovery Measures: gini_index, leverage, ps
Top Consensus Rules:
  1. {'18_o', '4_t'} => {'19_p'} [support=0.379, confid

In [31]:
print("\nMeasure-Specific Recommendations:")
for measure, recs in recommendations['measure_recommendations'].items():
    print(f"\n{measure}:")
    if recs['strengths']:
        print("Strengths:")
        for strength in recs['strengths']:
            print(f"  + {strength}")
    if recs['weaknesses']:
        print("Weaknesses:")
        for weakness in recs['weaknesses']:
            print(f"  - {weakness}")
    if recs['recommended_for']:
        print(f"Recommended For: {recs['recommended_for']}")


Measure-Specific Recommendations:

odds_ratio:
Strengths:
  + Discovers diverse and novel rules
Weaknesses:
  - Not statistically different from other measures
Recommended For: Discovering Diverse Rules

support:
Strengths:
  + Discovers diverse and novel rules
  + Provides a unique perspective
Weaknesses:
  - Not statistically different from other measures
Recommended For: Discovering Diverse Rules, Unique Perspective

gini_index:
Strengths:
  + Discovers diverse and novel rules
Weaknesses:
  - Low stability across different datasets
  - Not statistically different from other measures
Recommended For: Dataset-Specific Analysis, Discovering Diverse Rules

zhangs_metric:
Strengths:
  + Discovers diverse and novel rules
Weaknesses:
  - Not statistically different from other measures
Recommended For: Discovering Diverse Rules

all_confidence:
Strengths:
  + Discovers diverse and novel rules
Weaknesses:
  - Not statistically different from other measures
Recommended For: Discovering Diver

The analysis of association rule interestingness measures across multiple datasets reveals several important insights:

### Key Findings:

1. **No Universal Best Measure**: No single measure consistently outperforms others across all datasets and evaluation criteria.

2. **Complementary Strengths**: Different measures capture different aspects of interestingness, with some focusing on statistical significance, others on diversity, and others on rule novelty.

3. **Dataset Dependency**: The effectiveness of interestingness measures varies significantly based on dataset.

4. **Stability Considerations**: Measures with high stability scores (particularly Jaccard, Odds ratio, and All-confidence) provide more consistent results across different datasets.

5. **Diversity Value**: Measures promoting diverse rule discovery (such as Added Value, Certainty Factor, and Jaccard) help uncover novel patterns that might be missed by conventional measures.

6. **Statistical Significance**: The Wilcoxon tests revealed that all of the measures have other measures that are not signifacantly different from them - makes sense as many measures rely on each other.

7. **Clustering of Measures**: For most of the datasets the measures divide to 3 clusters - where Support, Confidence and Lift each in different cluster - makes sense as they are the fundemental measures that indicate different aspects of the data.