# Unified Benchmarking Notebook

Benchmarks HUOPM, HUOPM_Improved, HUOIM (HUOMIL), CloFHUOIM, MaxCloFHUOIM, and TopK_Hybrid.

<details>
<summary><strong>Table of Contents</strong></summary>

1. [Hardware/Software Detection](#hardwaresoftware-detection)
   Records environment details for reproducibility.
2. [Finalized Algorithm Classes (Loaded from Notebooks)](#finalized-algorithm-classes-loaded-from-notebooks)
   Loads all mining algorithms and shared utilities.
3. [Unified Runner (Warm-up + Averaging + Error Handling)](#unified-runner-warm-up-averaging-error-handling)
   Standardized benchmark runner for fair comparisons.
4. [Benchmark Execution (Retail, Foodmart, Mushroom, Chainstore)](#benchmark-execution-retail-foodmart-mushroom-chainstore)
   Core algorithm wrappers and metric extraction.
</details>


## Hardware/Software Detection

In [15]:
import platform
import sys
import time
import tracemalloc
from statistics import mean

try:
    import psutil
except Exception:
    psutil = None

try:
    import numpy as np
except Exception:
    np = None

try:
    import pandas as pd
except Exception:
    pd = None

print("System Specifications")
print("-" * 80)
print("OS:", platform.platform())
print("Python:", sys.version.replace("\n", " "))

if psutil:
    freq = psutil.cpu_freq()
    print("CPU Cores (logical):", psutil.cpu_count(logical=True))
    print("CPU Cores (physical):", psutil.cpu_count(logical=False))
    if freq:
        print("CPU Frequency (MHz):", freq.max)
    mem = psutil.virtual_memory()
    print("Total RAM (GB):", round(mem.total / (1024**3), 2))
else:
    print("psutil not installed; skipping CPU/RAM details")

if np:
    print("numpy:", np.__version__)
else:
    print("numpy: not installed")

if pd:
    print("pandas:", pd.__version__)
else:
    print("pandas: not installed")

System Specifications
--------------------------------------------------------------------------------
OS: macOS-15.3.1-arm64-arm-64bit
Python: 3.11.14 (main, Oct 21 2025, 18:27:30) [Clang 20.1.8 ]
CPU Cores (logical): 8
CPU Cores (physical): 8
CPU Frequency (MHz): 3204
Total RAM (GB): 8.0
numpy: 2.3.5
pandas: 2.3.3


## Finalized Algorithm Classes (Loaded from Notebooks)

In [16]:
import json
from pathlib import Path

ROOT = Path(".")
if not (ROOT / "huopm.ipynb").exists():
    if (ROOT / "thesis" / "huopm.ipynb").exists():
        ROOT = ROOT / "thesis"
    elif (Path("/Users/mac/Desktop/thesis") / "huopm.ipynb").exists():
        ROOT = Path("/Users/mac/Desktop/thesis")

NOTEBOOKS = {
    "huopm": ROOT / "huopm.ipynb",
    "huopm_improved": ROOT / "huopm_improved.ipynb",
    "huomil": ROOT / "huomil.ipynb",
    "clofhuoim": ROOT / "CloFHUOIM.ipynb",
    "topk": ROOT / "TopK_Hybrid.ipynb",
}


def _extract_class_definitions(src: str) -> str:
    lines = src.splitlines()
    out = []
    i = 0
    while i < len(lines):
        line = lines[i]
        if line.lstrip().startswith("class ") or line.lstrip().startswith("def "):
            out.append(line)
            i += 1
            while i < len(lines) and (lines[i].startswith(" ") or lines[i].startswith("	")):
                out.append(lines[i])
                i += 1
            continue
        i += 1
    return "\n".join(out)


def load_notebook_defs(path: Path, strip_slots: bool = False) -> dict:
    """Load import/class/def cells from a notebook into a namespace.

    strip_slots=True removes __slots__ lines (for ablation study).
    On error, attempts to salvage class/def blocks from the failing cell.
    """
    with path.open("r", encoding="utf-8") as f:
        nb = json.load(f)

    namespace = {"__name__": "notebook_module"}
    for cell in nb.get("cells", []):
        if cell.get("cell_type") != "code":
            continue
        src = ''.join(cell.get("source", ""))
        lines = [ln for ln in src.splitlines() if ln.strip()]
        if not lines:
            continue
        first = None
        for ln in lines:
            if ln.lstrip().startswith("#"):
                continue
            first = ln.lstrip()
            break
        if not first:
            continue
        if first.startswith("import ") or first.startswith("from ") or first.startswith("class ") or first.startswith("def ") or first.startswith("@"):
            if strip_slots:
                src = "\n".join([ln for ln in src.splitlines() if "__slots__" not in ln])
            try:
                exec(compile(src, str(path), "exec"), namespace, namespace)
            except Exception:
                salvage = _extract_class_definitions(src)
                if salvage.strip():
                    exec(compile(salvage, str(path), "exec"), namespace, namespace)
    return namespace

mods = {name: load_notebook_defs(path) for name, path in NOTEBOOKS.items()}
mods_noslots = {"huomil": load_notebook_defs(NOTEBOOKS["huomil"], strip_slots=True)}

HUOPM = mods["huopm"].get("HUOPM")
HUOPM_Improved = mods["huopm_improved"].get("HUOPM_Improved")
HUOMIL = mods["huomil"].get("HUOMIL")  # HUOIM baseline
HUOMIL_NoSlots = mods_noslots["huomil"].get("HUOMIL")

CloFHUOIM = mods["clofhuoim"].get("CloFHUOIM")
MaxCloFHUOIM = mods["clofhuoim"].get("MaxCloFHUOIM")
QuantitativeDatabase = mods["clofhuoim"].get("QuantitativeDatabase")
load_database_from_dict = mods["clofhuoim"].get("load_database_from_dict")

TopKHybridHUOPM = mods["topk"].get("TopKHybridHUOPM")

parse_dataset = mods["huopm"].get("parse_dataset")
simulate_quantitative_data = mods["huopm"].get("simulate_quantitative_data")
if simulate_quantitative_data is None:
    simulate_quantitative_data = mods["huopm_improved"].get("simulate_quantitative_data")
if simulate_quantitative_data is None:
    raise RuntimeError("simulate_quantitative_data not loaded from huopm notebooks.")

# Fallbacks
if parse_dataset is None:
    parse_dataset = mods["huopm_improved"].get("parse_dataset")
if parse_dataset is None:
    raise RuntimeError("parse_dataset not loaded from huopm notebooks.")

# Sanity checks
ALGO_LOADED = True
missing = []
for name, obj in [("HUOPM", HUOPM), ("HUOPM_Improved", HUOPM_Improved), ("HUOMIL", HUOMIL), ("CloFHUOIM", CloFHUOIM), ("MaxCloFHUOIM", MaxCloFHUOIM), ("TopKHybridHUOPM", TopKHybridHUOPM)]:
    if obj is None:
        missing.append(name)
if missing:
    raise RuntimeError(f"Algorithm classes not loaded: {missing}. Ensure notebooks are in the same folder and the kernel can import their dependencies.")
print("Algorithm classes loaded successfully.")

Dataset parsing functions loaded!
QuadrupleEntry class loaded (memory-optimized with __slots__)
IndexedList class loaded!
HUOMIL class loaded successfully!
IndexedList class loaded!
HUOMIL class loaded successfully!
Algorithm classes loaded successfully.


In [17]:
# Utility: load dataset and generate quantitative transactions
import random
import json
import re
from pathlib import Path

# Keep ROOT consistent with earlier cells if already defined
ROOT = globals().get("ROOT", Path("."))

RESULTS_DIR = ROOT / "results"
RESULTS_DIR.mkdir(parents=True, exist_ok=True)


def load_quantitative_dataset(filepath, seed=42, min_quantity=1, max_quantity=5):
    """Load a sparse dataset file and return quantitative transactions + profits."""
    base_transactions, _ = parse_dataset(filepath)
    if not base_transactions:
        return {}, {}, 0

    state = None
    if seed is not None:
        state = random.getstate()
        random.seed(seed)
    try:
        transactions_dict, profit_table = simulate_quantitative_data(
            base_transactions,
            min_quantity=min_quantity,
            max_quantity=max_quantity,
        )
    finally:
        if state is not None:
            random.setstate(state)

    return transactions_dict, profit_table, len(transactions_dict)


def _slug(value):
    text = str(value)
    return re.sub(r"[^A-Za-z0-9_.-]+", "_", text).strip("_") or "value"


def _params_slug(params):
    if not params:
        return "no_params"
    parts = [f"{k}={params[k]}" for k in sorted(params.keys())]
    return _slug("_".join(parts))


def save_run_result(result, group="comparison", run_id=None):
    """Persist a single run (or average) result as JSON under RESULTS_DIR."""
    algo = _slug(result.get("Algorithm", "unknown"))
    dataset = _slug(result.get("Dataset", "unknown"))
    params = result.get("Params", {})
    param_slug = _params_slug(params)

    out_dir = RESULTS_DIR / group / dataset / algo / param_slug
    out_dir.mkdir(parents=True, exist_ok=True)

    if run_id is None:
        run_id = result.get("Run_ID", 1)
    out_path = out_dir / f"run-{run_id}.json"
    out_path.write_text(json.dumps(result, ensure_ascii=False, indent=2, default=str))
    return out_path


def save_avg_result(avg_result, group="comparison"):
    """Persist an averaged result as JSON under RESULTS_DIR."""
    avg_copy = dict(avg_result)
    avg_copy["Run_ID"] = "avg"
    return save_run_result(avg_copy, group=group, run_id="avg")


In [18]:
# Small test: HUOPM vs HUOMIL vs HUOPM_Improved (toy data)
print("SMALL TEST: Verifying HUOPM / HUOMIL / HUOPM_Improved")

toy_transactions = {
    'T1': [('A', 2), ('B', 6), ('C', 1)],
    'T2': [('A', 1), ('D', 3), ('G', 2)],
    'T3': [('B', 1), ('D', 2), ('G', 3)],
    'T4': [('A', 1), ('C', 5), ('D', 2)],
    'T5': [('B', 4), ('D', 1), ('F', 1)],
}

toy_profits = {
    'A': 3.0, 'B': 1.0, 'C': 1.0,
    'D': 5.0, 'F': 3.0, 'G': 2.0,
}

min_sup = 0.4
min_uo = 0.3

# HUOPM
miner_huopm = HUOPM(min_sup_ratio=min_sup, min_uo_ratio=min_uo)
res_huopm = miner_huopm.fit(toy_transactions, toy_profits)
print("HUOPM Results:")
for pattern, support, uo in res_huopm[:10]:
    print(f"  {pattern}: sup={support}, uo={uo:.4f}")

# HUOMIL (HUOIM baseline)
miner_huomil = HUOMIL(min_sup_ratio=min_sup, min_uo_ratio=min_uo)
res_huomil = miner_huomil.fit(toy_transactions, toy_profits)
print("HUOMIL Results:")
for pattern, support, uo in res_huomil[:10]:
    print(f"  {pattern}: sup={support}, uo={uo:.4f}")

# HUOPM Improved
miner_huopm_imp = HUOPM_Improved(min_sup_ratio=min_sup, min_uo_ratio=min_uo)
res_huopm_imp = miner_huopm_imp.fit(toy_transactions, toy_profits)
print("\nHUOPM Improved Results:")
for pattern, support, uo in res_huopm_imp[:10]:
    print(f"  {pattern}: sup={support}, uo={uo:.4f}")

print("Small test completed successfully!")


SMALL TEST: Verifying HUOPM / HUOMIL / HUOPM_Improved
Starting HUOPM Algorithm...
  Total transactions: 5
  min_sup_count: 2 (alpha: 0.4)
  min_uo_ratio (beta): 0.3
Phase 1: Scanning database for support and TU...
  Frequent 1-itemsets (I*): 5
Phase 2: Building initial UO-lists...
  Initial UO-lists built: 5
Phase 3: Starting recursive HUOP mining...
Mining completed in 0.0002s
Total HUOPs discovered: 5
HUOPM Results:
  ('C', 'A'): sup=2, uo=0.4915
  ('G', 'D'): sup=2, uo=0.9024
  ('A', 'D'): sup=2, uo=0.7702
  ('B', 'D'): sup=2, uo=0.6985
  ('D',): sup=4, uo=0.5606

Starting HUOMIL Algorithm (Alpha=0.4, Beta=0.3)...
  [1/3] First database scan: calculating item support...
      Found 5 promising items (min_sup=2)
  [2/3] Second database scan: constructing GUO-ILs...
      Constructed 5 GUO-ILs
  [3/3] Mining patterns...
      Found 5 patterns
HUOMIL Results:
  ('B', 'D'): sup=2, uo=0.6985
  ('A', 'D'): sup=2, uo=0.7702
  ('C', 'A'): sup=2, uo=0.4915
  ('D',): sup=4, uo=0.5606
  ('G', 

##  Unified Runner (Warm-up + Averaging + Error Handling)

In [19]:
class Runner:
    """
    Standardized benchmark runner.

    Returns:
        dict with keys:
            Runtime, Peak_Memory, Patterns_Found, Candidates_Evaluated, Pruned_Nodes,
            Wubocc_Pruned, Total_Expanded, Threshold_Trace
    """

    def __init__(self, algo_name: str, algo_callable, dataset_name: str, dataset_loader):
        self.algo_name = algo_name
        self.algo_callable = algo_callable
        self.dataset_name = dataset_name
        self.dataset_loader = dataset_loader

    def run_once(self, params: dict):
        data = self.dataset_loader()
        tracemalloc.start()
        t0 = time.time()
        result = self.algo_callable(data, **params)
        runtime = time.time() - t0
        current, peak = tracemalloc.get_traced_memory()
        tracemalloc.stop()

        return {
            "Algorithm": self.algo_name,
            "Dataset": self.dataset_name,
            "Runtime": runtime,
            "Peak_Memory": peak / (1024 * 1024),
            "Patterns_Found": result.get("Patterns_Found", 0),
            "Candidates_Evaluated": result.get("Candidates_Evaluated", 0),
            "Pruned_Nodes": result.get("Pruned_Nodes", 0),
            "Wubocc_Pruned": result.get("Wubocc_Pruned", 0),
            "Total_Expanded": result.get("Total_Expanded", 0),
            "Threshold_Trace": result.get("Threshold_Trace", []),
            "Params": params,
            "Patterns": result.get("Patterns", []),
        }

    def run(self, params: dict, runs: int = 3, warmup: bool = True):
        if warmup:
            try:
                _ = self.run_once(params)
            except Exception:
                pass
        results = [self.run_once(params) for _ in range(runs)]
        avg = results[0].copy()
        # Avoid carrying large pattern lists into the averaged summary
        if "Patterns" in avg:
            avg.pop("Patterns", None)
        avg["Runtime"] = mean(r["Runtime"] for r in results)
        avg["Peak_Memory"] = mean(r["Peak_Memory"] for r in results)
        avg["Patterns_Found"] = mean(r["Patterns_Found"] for r in results)
        avg["Candidates_Evaluated"] = mean(r["Candidates_Evaluated"] for r in results)
        avg["Pruned_Nodes"] = mean(r["Pruned_Nodes"] for r in results)
        avg["Wubocc_Pruned"] = mean(r["Wubocc_Pruned"] for r in results)
        avg["Total_Expanded"] = mean(r["Total_Expanded"] for r in results)
        return avg, results


##  Benchmark Execution (Retail, Foodmart, Mushroom, Chainstore)

In [20]:
print("Cell 4 started")

def _to_quantity_dict(transactions_dict):
    return {tid: {item: qty for item, qty in items} for tid, items in transactions_dict.items()}


def _extract_metrics(obj, patterns_len):
    """Best-effort extraction of counters; fall back to pattern count."""
    candidates = getattr(obj, "candidate_count", patterns_len)
    pruned = getattr(obj, "pruned_count", 0)
    wubocc_pruned = getattr(obj, "pruned_by_sdps", 0)
    total_expanded = candidates
    return candidates, pruned, wubocc_pruned, total_expanded


def algo_huopm(data, alpha=None, beta=None, **kwargs):
    transactions_dict, profit_table, n_tx = data
    miner = HUOPM(alpha, beta)
    patterns = miner.fit(transactions_dict, profit_table)
    candidates, pruned, wubocc_pruned, total_expanded = _extract_metrics(miner, len(patterns))
    return {
        "Patterns_Found": len(patterns),
        "Candidates_Evaluated": candidates,
        "Pruned_Nodes": pruned,
        "Wubocc_Pruned": wubocc_pruned,
        "Total_Expanded": total_expanded,
        "Patterns": patterns,
    }


def algo_huopm_improved(data, alpha=None, beta=None, **kwargs):
    transactions_dict, profit_table, n_tx = data
    miner = HUOPM_Improved(alpha, beta)
    patterns = miner.fit(transactions_dict, profit_table)
    candidates, pruned, wubocc_pruned, total_expanded = _extract_metrics(miner, len(patterns))
    return {
        "Patterns_Found": len(patterns),
        "Candidates_Evaluated": candidates,
        "Pruned_Nodes": pruned,
        "Wubocc_Pruned": wubocc_pruned,
        "Total_Expanded": total_expanded,
        "Patterns": patterns,
    }


def algo_huoim(data, alpha=None, beta=None, **kwargs):
    transactions_dict, profit_table, n_tx = data
    miner = HUOMIL(alpha, beta)
    patterns = miner.fit(transactions_dict, profit_table)
    candidates, pruned, wubocc_pruned, total_expanded = _extract_metrics(miner, len(patterns))
    return {
        "Patterns_Found": len(patterns),
        "Candidates_Evaluated": candidates,
        "Pruned_Nodes": pruned,
        "Wubocc_Pruned": wubocc_pruned,
        "Total_Expanded": total_expanded,
        "Patterns": patterns,
    }


def algo_clofhuoim(data, alpha=None, beta=None, **kwargs):
    transactions_dict, profit_table, n_tx = data
    qdb = load_database_from_dict(_to_quantity_dict(transactions_dict), profit_table)
    ms = max(1, int(alpha * len(transactions_dict) + 0.999))
    algo = CloFHUOIM(qdb, ms=ms, muo=beta)
    cfhuoi = algo.mine()
    return {
        "Patterns_Found": len(cfhuoi),
        "Candidates_Evaluated": getattr(algo, "candidate_count", 0),
        "Pruned_Nodes": getattr(algo, "pruned_by_lps", 0) + getattr(algo, "pruned_by_bcps", 0),
        "Wubocc_Pruned": getattr(algo, "pruned_by_sdps", 0),
        "Total_Expanded": getattr(algo, "candidate_count", 0),
        "Patterns": cfhuoi,
    }


def algo_maxclofhuoim(data, alpha=None, beta=None, **kwargs):
    transactions_dict, profit_table, n_tx = data
    qdb = load_database_from_dict(_to_quantity_dict(transactions_dict), profit_table)
    ms = max(1, int(alpha * len(transactions_dict) + 0.999))
    algo = MaxCloFHUOIM(qdb, ms=ms, muo=beta)
    cfhuoi, mfhuoi = algo.mine()
    return {
        "Patterns_Found": len(cfhuoi),
        "Candidates_Evaluated": getattr(algo, "candidate_count", 0),
        "Pruned_Nodes": getattr(algo, "pruned_by_lps", 0) + getattr(algo, "pruned_by_bcps", 0),
        "Wubocc_Pruned": getattr(algo, "pruned_by_sdps", 0),
        "Total_Expanded": getattr(algo, "candidate_count", 0),
    }


def algo_topk_hybrid(data, k=10, **kwargs):
    transactions_dict, profit_table, n_tx = data
    qdb = QuantitativeDatabase()
    qdb.set_profits(profit_table)
    for tid, items in transactions_dict.items():
        qdb.add_transaction(tid, {item: qty for item, qty in items})
    miner = TopKHybridHUOPM(k=k)
    results = miner.fit(qdb, ms=1)
    return {
        "Patterns_Found": len(results),
        "Threshold_Trace": getattr(miner, "threshold_trace", []),
        "Patterns": results,
    }

ALGORITHMS = {
    "HUOPM": algo_huopm,
    "HUOPM_Improved": algo_huopm_improved,
    "HUOIM": algo_huoim,
    "CloFHUOIM": algo_clofhuoim,
    "MaxCloFHUOIM": algo_maxclofhuoim,
    "TopK_Hybrid": algo_topk_hybrid,
}


Cell 4 started


##  Section 4.5 Pruning Effectiveness and Search Space Reduction

##  Comparison Group (HUOPM, HUOMIL, HUOPM_Improved)


In [None]:
# Ensure dataset configs are loaded
if "DATASETS" not in globals():
    from pathlib import Path
    ROOT = Path(".")
    DATASETS = {
        "Retail": str(ROOT / "datasets" / "retail.txt"),
        "Foodmart": str(ROOT / "datasets" / "foodmartFIM.txt"),
        "Mushroom": str(ROOT / "datasets" / "mushrooms.txt"),
        "Chainstore": str(ROOT / "datasets" / "chainstore.txt"),
    }
if "ALPHA_BETA_GRID" not in globals():
    ALPHA_BETA_GRID = {
        "Retail": {"alphas": [0.002, 0.001, 0.0008], "betas": [0.2, 0.3]},
        "Foodmart": {"alphas": [0.005, 0.001, 0.0005], "betas": [0.2, 0.3]},
        "Mushroom": {"alphas": [0.4, 0.3, 0.2], "betas": [0.4, 0.5]},
        "Chainstore": {"alphas": [0.01], "betas": [0.2]},
    }

# 1) Comparison group: HUOPM / HUOMIL / HUOPM_Improved
if "ALGORITHMS_COMPARISON" not in globals():
    ALGORITHMS_COMPARISON = {
        "HUOPM": algo_huopm,
        "HUOMIL": algo_huoim,
        "HUOPM_Improved": algo_huopm_improved,
    }

VERIFY_ACCURACY = False  # set True when you want per-run consistency checks
GROUP_NAME = "comparison"

all_results = []
raw_runs = []

for dataset_name, path in DATASETS.items():
    grid = ALPHA_BETA_GRID[dataset_name]
    loader = lambda p=path: load_quantitative_dataset(p)
    for a in grid["alphas"]:
        for b in grid["betas"]:
            results_for_verification = {}
            for algo_name, algo_fn in ALGORITHMS_COMPARISON.items():
                runner = Runner(algo_name, algo_fn, dataset_name, loader)
                avg, runs = runner.run(params={"alpha": a, "beta": b}, runs=3)
                all_results.append(avg)
                raw_runs.extend(runs)

                # Save individual runs + avg to disk
                for i, run in enumerate(runs, 1):
                    run["Run_ID"] = i
                    save_run_result(run, group=GROUP_NAME, run_id=i)
                save_avg_result(avg, group=GROUP_NAME)

                if VERIFY_ACCURACY and algo_name in ("HUOPM", "HUOMIL", "HUOPM_Improved"):
                    if runs:
                        results_for_verification[algo_name] = {"Patterns": runs[0].get("Patterns", [])}

            if VERIFY_ACCURACY:
                print(f"Verification ({dataset_name}) alpha={a}, beta={b}")
                verify_accuracy(results_for_verification)

# Overall summary only
import pandas as pd
summary_df = pd.DataFrame(all_results)
summary = summary_df.groupby(["Dataset", "Algorithm"]).agg({
    "Runtime": "mean",
    "Peak_Memory": "mean",
    "Patterns_Found": "mean",
    "Candidates_Evaluated": "mean",
    "Pruned_Nodes": "mean",
    "Wubocc_Pruned": "mean",
}).reset_index()

print("Saved comparison outputs to:", RESULTS_DIR / GROUP_NAME)
print(summary)


##  Concise Group (CloFHUOIM)


In [17]:
# Ensure dataset configs are loaded
if "DATASETS" not in globals():
    from pathlib import Path
    ROOT = Path(".")
    DATASETS = {
        "Retail": str(ROOT / "datasets" / "retail.txt"),
        "Foodmart": str(ROOT / "datasets" / "foodmartFIM.txt"),
        "Mushroom": str(ROOT / "datasets" / "mushrooms.txt"),
        "Chainstore": str(ROOT / "datasets" / "chainstore.txt"),
    }
if "ALPHA_BETA_GRID" not in globals():
    ALPHA_BETA_GRID = {
        "Retail": {"alphas": [0.002, 0.001, 0.0008], "betas": [0.2, 0.3]},
        "Foodmart": {"alphas": [0.005, 0.001, 0.0005], "betas": [0.2, 0.3]},
        "Mushroom": {"alphas": [0.4, 0.3, 0.2], "betas": [0.4, 0.5]},
        "Chainstore": {"alphas": [0.01], "betas": [0.2]},
    }

# 2) Concise group: CloFHUOIM (same alpha/beta, compare compression vs HUOPM)
if "ALGORITHMS_CONCISE" not in globals():
    ALGORITHMS_CONCISE = {
        "CloFHUOIM": algo_clofhuoim,
    }

GROUP_NAME = "concise"

if "all_results" not in globals():
    all_results = []
if "raw_runs" not in globals():
    raw_runs = []

for dataset_name, path in DATASETS.items():
    grid = ALPHA_BETA_GRID[dataset_name]
    loader = lambda p=path: load_quantitative_dataset(p)
    for algo_name, algo_fn in ALGORITHMS_CONCISE.items():
        runner = Runner(algo_name, algo_fn, dataset_name, loader)
        for a in grid["alphas"]:
            for b in grid["betas"]:
                avg, runs = runner.run(params={"alpha": a, "beta": b}, runs=3)
                all_results.append(avg)
                raw_runs.extend(runs)

                for i, run in enumerate(runs, 1):
                    run["Run_ID"] = i
                    save_run_result(run, group=GROUP_NAME, run_id=i)
                save_avg_result(avg, group=GROUP_NAME)

# Overall summary only
import pandas as pd
summary_df = pd.DataFrame([r for r in all_results if r.get("Algorithm") in ALGORITHMS_CONCISE])
if not summary_df.empty:
    summary = summary_df.groupby(["Dataset", "Algorithm"]).agg({
        "Runtime": "mean",
        "Peak_Memory": "mean",
        "Patterns_Found": "mean",
        "Candidates_Evaluated": "mean",
    }).reset_index()
    print("Saved concise outputs to:", RESULTS_DIR / GROUP_NAME)
    print(summary)


Parsing dataset: datasets/retail.txt
  Parsed 88162 transactions, 16470 unique items.
Generating profit table for 16470 items...
Simulating quantitative data for 88162 transactions...
Starting CloFHUOIM mining...
Parameters: ms=177, muo=0.2
Relevant items: 956 out of 16470
Building PUON-lists for 1-itemsets...
Mining CFHUOIs...

Mining completed in 100.06 seconds
Peak memory usage: 288.55 MB
CFHUOIs found: 1291
Candidates evaluated: 2597
Pruned by SDPS: 1104
Pruned by BCPS: 0
Pruned by LPS: 0
Parsing dataset: datasets/retail.txt
  Parsed 88162 transactions, 16470 unique items.
Generating profit table for 16470 items...
Simulating quantitative data for 88162 transactions...
Starting CloFHUOIM mining...
Parameters: ms=177, muo=0.2
Relevant items: 956 out of 16470
Building PUON-lists for 1-itemsets...
Mining CFHUOIs...

Mining completed in 99.71 seconds
Peak memory usage: 288.51 MB
CFHUOIs found: 1291
Candidates evaluated: 2597
Pruned by SDPS: 1104
Pruned by BCPS: 0
Pruned by LPS: 0
Parsi

##  Rank-Based Group (TopK Hybrid)


In [None]:
# Ensure dataset configs are loaded
if "DATASETS" not in globals():
    from pathlib import Path
    ROOT = Path(".")
    DATASETS = {
        "Retail": str(ROOT / "datasets" / "retail.txt"),
        "Foodmart": str(ROOT / "datasets" / "foodmartFIM.txt"),
        "Mushroom": str(ROOT / "datasets" / "mushrooms.txt"),
        "Chainstore": str(ROOT / "datasets" / "chainstore.txt"),
    }
if "ALPHA_BETA_GRID" not in globals():
    ALPHA_BETA_GRID = {
        "Retail": {"alphas": [0.002, 0.001, 0.0008], "betas": [0.2, 0.3]},
        "Foodmart": {"alphas": [0.005, 0.001, 0.0005], "betas": [0.2, 0.3]},
        "Mushroom": {"alphas": [0.4, 0.3, 0.2], "betas": [0.4, 0.5]},
        "Chainstore": {"alphas": [0.01], "betas": [0.2]},
    }

# 3) Rank-based group: Top-k Hybrid (reverse mapping)
if "ALGORITHMS_TOPK" not in globals():
    ALGORITHMS_TOPK = {
        "TopK_Hybrid": algo_topk_hybrid,
    }

GROUP_NAME = "topk"

if "all_results" not in globals():
    all_results = []
if "raw_runs" not in globals():
    raw_runs = []

# Run HUOPM at beta=0.3 to get N patterns, then run TopK with K=N
for dataset_name, path in DATASETS.items():
    loader = lambda p=path: load_quantitative_dataset(p)
    runner_huopm = Runner("HUOPM", algo_huopm, dataset_name, loader)
    avg_h, runs_h = runner_huopm.run(params={"alpha": 0.01, "beta": 0.3}, runs=3)
    all_results.append(avg_h)
    raw_runs.extend(runs_h)

    for i, run in enumerate(runs_h, 1):
        run["Run_ID"] = i
        save_run_result(run, group=GROUP_NAME, run_id=i)
    save_avg_result(avg_h, group=GROUP_NAME)

    N = int(avg_h.get("Patterns_Found", 0))
    if N <= 0:
        continue

    runner_topk = Runner("TopK_Hybrid", algo_topk_hybrid, dataset_name, loader)
    avg_t, runs_t = runner_topk.run(params={"k": N}, runs=3)
    avg_t["Params"]["mapped_from_beta"] = 0.3
    avg_t["Params"]["mapped_k"] = N
    all_results.append(avg_t)
    raw_runs.extend(runs_t)

    for i, run in enumerate(runs_t, 1):
        run["Run_ID"] = i
        save_run_result(run, group=GROUP_NAME, run_id=i)
    save_avg_result(avg_t, group=GROUP_NAME)

print("Saved topk outputs to:", RESULTS_DIR / GROUP_NAME)
print("Total benchmark results:", len(all_results))


Parsing dataset: datasets/retail.txt
  Parsed 88162 transactions, 16470 unique items.
Generating profit table for 16470 items...
Simulating quantitative data for 88162 transactions...
Starting HUOPM Algorithm...
  Total transactions: 88162
  min_sup_count: 881 (alpha: 0.01)
  min_uo_ratio (beta): 0.3
Phase 1: Scanning database for support and TU...
  Frequent 1-itemsets (I*): 70
Phase 2: Building initial UO-lists...
  Initial UO-lists built: 70
Phase 3: Starting recursive HUOP mining...
Mining completed in 2.3817s
Total HUOPs discovered: 6
Parsing dataset: datasets/retail.txt
  Parsed 88162 transactions, 16470 unique items.
Generating profit table for 16470 items...
Simulating quantitative data for 88162 transactions...
Starting HUOPM Algorithm...
  Total transactions: 88162
  min_sup_count: 881 (alpha: 0.01)
  min_uo_ratio (beta): 0.3
Phase 1: Scanning database for support and TU...
  Frequent 1-itemsets (I*): 70
Phase 2: Building initial UO-lists...
  Initial UO-lists built: 70
Phase