# CloFHUOIM: Closed Frequent High Utility Occupancy Itemset Mining
This notebook implements CloFHUOIM and MaxCloFHUOIM for concise high-utility occupancy mining.


<details>
<summary><strong>Table of Contents</strong></summary>

1. [Imports and Setup](#imports-and-setup)
   Libraries and shared configuration.
2. [PUONElement (PUON-list Element)](#puonelement-puon-list-element)
   Element structure for PUON-lists.
3. [PUONList (PUON-list Structure)](#puonlist-puon-list-structure)
   PUON-list representation and statistics.
4. [QuantitativeDatabase](#quantitativedatabase)
   Quantitative database structures and utilities.
5. [CloFHUOIM Algorithm](#clofhuoim-algorithm)
   Core concise mining algorithm.
6. [MaxCloFHUOIM Variant](#maxclofhuoim-variant)
   Maximal closed HUO mining variant.
7. [Utility Functions (Data Loading)](#utility-functions-data-loading)
   Dataset loading and QDB builders.
8. [Example Run](#example-run)
   Quick sanity check / demo.
</details>


## Imports and Setup


In [1]:
import time
import tracemalloc
from collections import defaultdict
from typing import Dict, List, Tuple, Set, Optional

## PUONElement (PUON-list Element)


In [2]:
class PUONElement:
    """
    Element in a PUON-list (Prefix Utility Occupancy Node list).

    Stores per-transaction statistics for an itemset.

    Attributes:
        nid (str): Transaction identifier.
        nu (float): Utility of the itemset in the transaction.
        nru (float): Remaining utility after the last item in the itemset.
        npu (float): Prefix utility (kept for completeness).
        tu (float): Transaction utility.
        nsup (int): Support counter for this element (default 1).
        ubrem_rel (float): Upper-bound remaining relative utility (nu+nru)/tu.

    Example:
        >>> e = PUONElement('t1', nu=10, nru=5, npu=0, tu=30)
        >>> e.ubrem_rel = (e.nu + e.nru) / e.tu
        >>> print(e.nid, e.ubrem_rel)
        t1 0.5
    """
    def __init__(self, nid: str, nu: float, nru: float, npu: float, tu: float):
        """
        Initialize a PUONElement.

        Args:
            nid (str): Transaction identifier.
            nu (float): Utility of the itemset in this transaction.
            nru (float): Remaining utility after last item of itemset.
            npu (float): Prefix utility.
            tu (float): Transaction utility.

        Returns:
            None

        Example:
            >>> PUONElement('t2', nu=6, nru=9, npu=0, tu=30)
        """
        self.nid = nid
        self.nu = nu
        self.nru = nru
        self.npu = npu
        self.tu = tu
        self.nsup = 1
        self.ubrem_rel = 0.0
    
    def __repr__(self):
        """
        Return a readable string representation.

        Returns:
            str: Human-friendly representation.

        Example:
            >>> e = PUONElement('t1', 10, 5, 0, 30)
            >>> repr(e)
            'PUONElement(nid=t1, nu=10, nru=5, tu=30)'
        """
        return f"PUONElement(nid={self.nid}, nu={self.nu}, nru={self.nru}, tu={self.tu})"

## PUONList (PUON-list Structure)


In [3]:
class PUONList:
    """
    PUON-list structure for an itemset.

    Stores PUONElement objects and aggregate statistics such as support,
    utility occupancy (occ), and weak upper bound (wubocc).

    Attributes:
        itemset (tuple): The itemset represented by this list.
        elements (list): List of PUONElement objects.
        f_occ (float): Average utility occupancy of the itemset.
        f_wubocc (float): Weak upper bound of utility occupancy.
        list_ub_rem_rel (list): Cached (nu+nru)/tu values.

    Example:
        >>> pl = PUONList(('a',))
        >>> pl.add_element(PUONElement('t1', 10, 5, 0, 30))
        >>> pl.compute_occ()
        0.3333333333333333
    """
    def __init__(self, itemset: Tuple[str, ...]):
        """
        Initialize a PUON-list for an itemset.

        Args:
            itemset (tuple): The itemset represented.

        Returns:
            None

        Example:
            >>> PUONList(('a', 'b'))
        """
        self.itemset = tuple(itemset)
        self.elements: List[PUONElement] = []
        self.f_occ = 0.0
        self.f_wubocc = 0.0
        self.list_ub_rem_rel: List[float] = []
    
    def add_element(self, element: PUONElement):
        """
        Add a PUONElement to the list.

        Args:
            element (PUONElement): Element to append.

        Returns:
            None

        Example:
            >>> pl.add_element(PUONElement('t1', 10, 5, 0, 30))
        """
        self.elements.append(element)
    
    def get_support(self) -> int:
        """
        Get support (number of transactions containing the itemset).

        Returns:
            int: Support count.

        Example:
            >>> pl.get_support()
            1
        """
        return len(self.elements)
    
    def compute_occ(self) -> float:
        """
        Compute utility occupancy of the itemset.

        occ(A) = (1/supp(A)) * sum_{t in rho(A)} (u(A,t)/tu(t))

        Returns:
            float: Utility occupancy value.

        Example:
            >>> pl.compute_occ()
            0.3333333333333333
        """
        if not self.elements:
            return 0.0
        
        sum_urel = sum(elem.nu / elem.tu for elem in self.elements)
        self.f_occ = sum_urel / len(self.elements)
        return self.f_occ
    
    def compute_wubocc(self, ms: int) -> float:
        """
        Compute weak upper bound of utility occupancy (wubocc).

        Uses TS(A) (transactions with possible extensions) and takes the top-ms
        values of (nu+nru)/tu.

        Args:
            ms (int): Minimum support threshold.

        Returns:
            float: Weak upper bound on utility occupancy.

        Example:
            >>> pl.compute_wubocc(ms=2)
            0.5
        """
        if not self.elements:
            return 0.0
        
        # Get transactions with remaining items (proper forward extensions exist)
        ts_values = []
        for elem in self.elements:
            if elem.nru > 0:  # Has items that can extend A
                ubrem_rel = (elem.nu + elem.nru) / elem.tu
                ts_values.append(ubrem_rel)
        
        if not ts_values:
            return 0.0
        
        # Sort descending and take top ms values
        ts_values.sort(reverse=True)
        top_values = ts_values[:min(ms, len(ts_values))]
        
        self.f_wubocc = sum(top_values) / ms
        return self.f_wubocc
    
    def get_TWU(self) -> float:
        """
        Compute Transaction Weighted Utilization (TWU).

        TWU(A) = sum_{t in rho(A)} tu(t)

        Returns:
            float: TWU value.

        Example:
            >>> pl.get_TWU()
            30.0
        """
        return sum(elem.tu for elem in self.elements)
    
    def __repr__(self):
        """
        Return a readable string representation.

        Returns:
            str: Human-friendly representation.

        Example:
            >>> repr(pl)
            "PUONList(('a',), sup=1, occ=0.3333)"
        """
        return f"PUONList({self.itemset}, sup={len(self.elements)}, occ={self.f_occ:.4f})"

## QuantitativeDatabase


In [4]:
class QuantitativeDatabase:
    """
    Quantitative database (QDB) for utility occupancy mining.

    Stores per-transaction item utilities, item profits, and transaction utilities.

    Attributes:
        transactions (dict): {tid: {item: utility}}
        item_profits (dict): {item: profit}
        transaction_utilities (dict): {tid: TU}

    Example:
        >>> db = QuantitativeDatabase()
        >>> db.set_profits({'a': 2})
        >>> db.add_transaction('t1', {'a': 3})
        >>> db.transaction_utilities['t1']
        6.0
    """
    def __init__(self):
        """
        Initialize an empty quantitative database.

        Returns:
            None

        Example:
            >>> db = QuantitativeDatabase()
        """
        self.transactions: Dict[str, Dict[str, float]] = {}  # tid -> {item: utility}
        self.item_profits: Dict[str, float] = {}  # item -> profit
        self.transaction_utilities: Dict[str, float] = {}  # tid -> total utility
    
    def add_transaction(self, tid: str, items_quantities: Dict[str, int]):
        """
        Add a transaction to the database.

        Args:
            tid (str): Transaction identifier.
            items_quantities (dict): {item: quantity}

        Returns:
            None

        Example:
            >>> db.add_transaction('t1', {'a': 2, 'b': 1})
        """
        transaction = {}
        total_utility = 0.0
        
        for item, quantity in items_quantities.items():
            utility = quantity * self.item_profits.get(item, 1.0)
            transaction[item] = utility
            total_utility += utility
        
        self.transactions[tid] = transaction
        self.transaction_utilities[tid] = total_utility
    
    def set_profits(self, profits: Dict[str, float]):
        """
        Set external utilities (profits) for items.

        Args:
            profits (dict): {item: profit}

        Returns:
            None

        Example:
            >>> db.set_profits({'a': 2.0, 'b': 3.0})
        """
        self.item_profits = profits
    
    def get_all_items(self) -> Set[str]:
        """
        Return all distinct items in the database.

        Returns:
            set: Unique item identifiers.

        Example:
            >>> db.get_all_items()
            {'a', 'b'}
        """
        items = set()
        for transaction in self.transactions.values():
            items.update(transaction.keys())
        return items
    
    def reduce_database(self, relevant_items: Set[str]):
        """
        Remove irrelevant items from the database (Strategy 2: RedS).

        Args:
            relevant_items (set): Items with support >= ms.

        Returns:
            None

        Example:
            >>> db.reduce_database({'a', 'c'})
        """
        for tid in self.transactions:
            self.transactions[tid] = {
                item: utility 
                for item, utility in self.transactions[tid].items()
                if item in relevant_items
            }
            # Recalculate transaction utility
            self.transaction_utilities[tid] = sum(self.transactions[tid].values())


## CloFHUOIM Algorithm


In [5]:
class CloFHUOIM:
    """
    CloFHUOIM algorithm for mining Closed Frequent High Utility Occupancy Itemsets.

    Implements strategies from the paper: RedS, TS (CMAP tightening), SDPS,
    BCPS, LPS, and jumping closure.

    Attributes:
        database (QuantitativeDatabase): Input quantitative database.
        ms (int): Minimum support threshold.
        muo (float): Minimum utility occupancy threshold.
        cfhuoi (set): Set of closed FHUOIs.
        puon_lists (dict): Cached PUON-lists for itemsets.

    Example:
        >>> algo = CloFHUOIM(db, ms=1, muo=0.7)
        >>> cfhuoi = algo.mine()
    """
    def __init__(self, database: QuantitativeDatabase, ms: int, muo: float):
        """
        Initialize CloFHUOIM.

        Args:
            database (QuantitativeDatabase): Input database.
            ms (int): Minimum support threshold.
            muo (float): Minimum utility occupancy threshold (0 < muo <= 1).

        Returns:
            None

        Example:
            >>> algo = CloFHUOIM(db, ms=1, muo=0.7)
        """
        self.database = database
        self.ms = ms
        self.muo = muo
        self.cfhuoi: Set[Tuple[str, ...]] = set()
        self.cfhuoi_by_twu: Dict[float, List[Tuple[str, ...]]] = defaultdict(list)
        self.puon_lists: Dict[Tuple[str, ...], PUONList] = {}
        self.cmap: Dict[str, Set[str]] = {}  # Co-occurrence map
        self.candidate_count = 0
        self.pruned_by_sdps = 0
        self.pruned_by_bcps = 0
        self.pruned_by_lps = 0
        self.lps_pruned: Set[Tuple[str, ...]] = set()
    
    def mine(self) -> Set[Tuple[str, ...]]:
        """
        Execute the mining process and return CFHUOIs.

        Returns:
            set: Set of closed FHUOI itemsets.

        Example:
            >>> cfhuoi = algo.mine()
        """
        start_time = time.time()
        tracemalloc.start()
        
        print("Starting CloFHUOIM mining...")
        print(f"Parameters: ms={self.ms}, muo={self.muo}")
        
        # Step 1: Scan database and compute item supports
        item_supports = self._compute_item_supports()
        
        # Step 2: Identify relevant items (Strategy 2: RedS)
        relevant_items = {
            item for item, sup in item_supports.items() 
            if sup >= self.ms
        }
        print(f"Relevant items: {len(relevant_items)} out of {len(item_supports)}")
        
        # Step 3: Reduce database
        self.database.reduce_database(relevant_items)
        
        # Step 4: Sort items by support (ascending), then alphabetically to break ties
        sorted_items = sorted(
            relevant_items, 
            key=lambda x: (item_supports[x], x)
        )
        # Create a mapping for quick lookup during remaining utility calculation
        self.item_order = {item: idx for idx, item in enumerate(sorted_items)}
        
        # Step 5: Build CMAP (Strategy 3: TS)
        self._build_cmap(sorted_items, item_supports)
        
        # Step 6: Construct PUON-lists for 1-itemsets
        print("Building PUON-lists for 1-itemsets...")
        for item in sorted_items:
            self._construct_puon_list((item,))
        
        # Step 7: Mine CFHUOIs using depth-first search
        print("Mining CFHUOIs...")
        for item in sorted_items:
            self._find_cfhuoi((item,), sorted_items)
        
        runtime = time.time() - start_time
        current, peak = tracemalloc.get_traced_memory()
        tracemalloc.stop()
        
        print(f"\nMining completed in {runtime:.2f} seconds")
        print(f"Peak memory usage: {peak / (1024*1024):.2f} MB")
        print(f"CFHUOIs found: {len(self.cfhuoi)}")
        print(f"Candidates evaluated: {self.candidate_count}")
        print(f"Pruned by SDPS: {self.pruned_by_sdps}")
        print(f"Pruned by BCPS: {self.pruned_by_bcps}")
        print(f"Pruned by LPS: {self.pruned_by_lps}")
        
        return self.cfhuoi
    
    def _compute_item_supports(self) -> Dict[str, int]:
        """
        Compute support for each item in the database.

        Returns:
            dict: {item: support}

        Example:
            >>> supports = self._compute_item_supports()
        """
        supports = defaultdict(int)
        for transaction in self.database.transactions.values():
            for item in transaction.keys():
                supports[item] += 1
        return dict(supports)
    
    def _build_cmap(self, sorted_items: List[str], item_supports: Dict[str, int]):
        """
        Build the CMAP co-occurrence map (Strategy 3: TS).

        Args:
            sorted_items (list): Items in global order.
            item_supports (dict): {item: support}

        Returns:
            None

        Example:
            >>> self._build_cmap(sorted_items, item_supports)
        """
        for i, item in enumerate(sorted_items):
            self.cmap[item] = set()
            
            # Find co-occurring items
            for tid, transaction in self.database.transactions.items():
                if item in transaction:
                    # Add all items that come after in the order
                    for j in range(i + 1, len(sorted_items)):
                        other_item = sorted_items[j]
                        if other_item in transaction:
                            self.cmap[item].add(other_item)
        
        # Filter by support
        for item in sorted_items:
            cooc_supports = defaultdict(int)
            for tid, transaction in self.database.transactions.items():
                if item in transaction:
                    for other_item in self.cmap[item]:
                        if other_item in transaction:
                            cooc_supports[other_item] += 1
            
            self.cmap[item] = {
                other for other, sup in cooc_supports.items()
                if sup >= self.ms
            }
    
    def _construct_puon_list(self, itemset: Tuple[str, ...]) -> PUONList:
        """
        Construct a PUON-list for an itemset by scanning the database.

        Args:
            itemset (tuple): Itemset to build.

        Returns:
            PUONList: Constructed list (cached).

        Example:
            >>> pl = self._construct_puon_list(('a', 'b'))
        """
        if itemset in self.puon_lists:
            return self.puon_lists[itemset]
        
        puon_list = PUONList(itemset)
        
        # Scan database to find supporting transactions
        for tid, transaction in self.database.transactions.items():
            # Check if all items in itemset are in transaction
            if all(item in transaction for item in itemset):
                # Compute utilities
                nu = sum(transaction[item] for item in itemset)
                
                # Compute remaining utility (items after last item in itemset)
                # These are items that could potentially extend this itemset
                last_item = itemset[-1]
                nru = sum(
                    utility for item, utility in transaction.items()
                    if self.item_order[item] > self.item_order[last_item]
                )
                
                # Compute prefix utility (not used in current implementation)
                npu = 0.0
                
                tu = self.database.transaction_utilities[tid]
                
                element = PUONElement(tid, nu, nru, npu, tu)
                
                # Compute ubrem_rel for wubocc
                ubrem = nu + nru
                element.ubrem_rel = ubrem / tu if tu > 0 else 0.0
                
                puon_list.add_element(element)
                puon_list.list_ub_rem_rel.append(element.ubrem_rel)
        
        # Compute metrics
        puon_list.compute_occ()
        puon_list.compute_wubocc(self.ms)
        
        self.puon_lists[itemset] = puon_list
        return puon_list
    
    def _find_cfhuoi(self, itemset: Tuple[str, ...], E: List[str]):
        """
        Recursive search procedure (FindMaxCloFHUOIM in the paper).

        Args:
            itemset (tuple): Current itemset A.
            E (list): Candidate extension items.

        Returns:
            None

        Example:
            >>> self._find_cfhuoi(('a',), E)
        """
        self.candidate_count += 1

        puon_list = self.puon_lists.get(itemset)
        if not puon_list:
            return

        sup = puon_list.get_support()
        occ = puon_list.f_occ
        wubocc = puon_list.f_wubocc

        # Strategy 1 (WPS): Prune if infrequent
        if sup < self.ms:
            return

        # Strategy 6 (LPS): Prune if marked by local pruning
        if itemset in self.lps_pruned:
            return

        # Strategy 5 (BCPS): Backward checking pruning
        if self._has_backward_extension(itemset, puon_list):
            self.pruned_by_bcps += 1
            return

        # Strategy 4 (SDPS): Strict depth pruning using wubocc
        if wubocc < self.muo:
            if occ >= self.muo:
                self._add_to_cfhuoi(itemset, puon_list)
            self.pruned_by_sdps += 1
            return

        # Build newE using Strategy 3 (TS) via CMAP
        last_item = itemset[-1]
        newE = []
        ext_lists = {}
        for y in E:
            if self.item_order[y] <= self.item_order[last_item]:
                continue
            if last_item in self.cmap and y in self.cmap[last_item]:
                ext_puon = self._construct_extension(itemset, y, puon_list)
                if ext_puon and ext_puon.get_support() >= self.ms:
                    ext_itemset = itemset + (y,)
                    self.puon_lists[ext_itemset] = ext_puon
                    newE.append(y)
                    ext_lists[y] = ext_puon

        # Grow extensions and apply LPS marking
        list_exts = []
        num_1_forward = 0
        prefix = itemset[:-1]

        for y in newE:
            ext_itemset = itemset + (y,)
            ext_puon = ext_lists.get(y)
            if not ext_puon:
                ext_puon = self._construct_extension(itemset, y, puon_list)
                if not ext_puon:
                    continue
                self.puon_lists[ext_itemset] = ext_puon

            list_exts.append((ext_itemset, ext_puon))

            # LPS: if supp(Ay) == supp(Py), mark Py as pruned
            if prefix:
                py_itemset = prefix + (y,)
                py_puon = self.puon_lists.get(py_itemset)
                if not py_puon:
                    py_puon = self._construct_puon_list(py_itemset)
                if py_puon and ext_puon.get_support() == py_puon.get_support():
                    self.lps_pruned.add(py_itemset)
                    self.pruned_by_lps += 1
            else:
                # P is empty; Py is just (y,)
                py_itemset = (y,)
                py_puon = self.puon_lists.get(py_itemset)
                if py_puon and ext_puon.get_support() == py_puon.get_support():
                    self.lps_pruned.add(py_itemset)
                    self.pruned_by_lps += 1

            # Check 1-forward
            if ext_puon.get_support() == sup:
                num_1_forward += 1

        # Jumping closure optimization
        if num_1_forward == len(newE) and len(newE) > 0:
            closure_itemset = itemset + tuple(newE)
            closure_puon = self._construct_puon_list(closure_itemset)
            if closure_puon and closure_puon.f_occ >= self.muo:
                self._add_to_cfhuoi(closure_itemset, closure_puon)
            return

        # Add current itemset if it is closed and FHUOI
        if num_1_forward == 0 and occ >= self.muo:
            self._add_to_cfhuoi(itemset, puon_list)

        # Recursive exploration
        for ext_itemset, ext_puon in list_exts:
            self._find_cfhuoi(ext_itemset, newE)

    def _construct_extension(self, itemset: Tuple[str, ...], ext_item: str, parent_puon: PUONList) -> Optional[PUONList]:
        """
        Construct PUON-list for extension A + {y} by joining parent list.

        Args:
            itemset (tuple): Parent itemset A.
            ext_item (str): Extension item y.
            parent_puon (PUONList): PUON-list of A.

        Returns:
            PUONList or None: PUON-list for A âˆª {y} if support > 0.

        Example:
            >>> ext = self._construct_extension(('a',), 'b', puon_list)
        """
        ext_itemset = itemset + (ext_item,)
        ext_puon = PUONList(ext_itemset)

        for elem in parent_puon.elements:
            tid = elem.nid
            transaction = self.database.transactions.get(tid, {})

            # Check if extension item is in transaction
            if ext_item in transaction:
                # Compute new utilities
                nu = elem.nu + transaction[ext_item]

                # Remaining utility excludes extension item
                nru = sum(
                    utility for item, utility in transaction.items()
                    if self.item_order[item] > self.item_order[ext_item]
                )

                npu = elem.npu
                tu = elem.tu

                new_elem = PUONElement(tid, nu, nru, npu, tu)

                # Compute ubrem_rel
                ubrem = nu + nru
                new_elem.ubrem_rel = ubrem / tu if tu > 0 else 0.0

                ext_puon.add_element(new_elem)
                ext_puon.list_ub_rem_rel.append(new_elem.ubrem_rel)

        if ext_puon.elements:
            ext_puon.compute_occ()
            ext_puon.compute_wubocc(self.ms)
            return ext_puon

        return None

    def _has_backward_extension(self, itemset: Tuple[str, ...], puon_list: PUONList) -> bool:
        """
        Check if itemset has a backward extension in CFHUOI (BCPS).

        Args:
            itemset (tuple): Itemset A.
            puon_list (PUONList): PUON-list of A.

        Returns:
            bool: True if a superset with same support exists.

        Example:
            >>> self._has_backward_extension(('a',), puon_list)
            False
        """
        twu = puon_list.get_TWU()
        sup = puon_list.get_support()

        # Check itemsets with same TWU
        for candidate in self.cfhuoi_by_twu.get(twu, []):
            # Check if candidate is superset with same support
            if (len(candidate) > len(itemset) and
                set(itemset).issubset(set(candidate))):
                # Verify support is same
                candidate_puon = self.puon_lists.get(candidate)
                if candidate_puon and candidate_puon.get_support() == sup:
                    return True

        return False

    def _add_to_cfhuoi(self, itemset: Tuple[str, ...], puon_list: PUONList):
        """
        Add an itemset to the CFHUOI set and index by TWU.

        Args:
            itemset (tuple): Itemset to add.
            puon_list (PUONList): PUON-list of itemset.

        Returns:
            None

        Example:
            >>> self._add_to_cfhuoi(('a','e'), puon_list)
        """
        self.cfhuoi.add(itemset)
        twu = puon_list.get_TWU()
        self.cfhuoi_by_twu[twu].append(itemset)

## MaxCloFHUOIM Variant


In [6]:
class MaxCloFHUOIM(CloFHUOIM):
    """
    MaxCloFHUOIM algorithm for mining CFHUOIs and MFHUOIs simultaneously.

    Extends CloFHUOIM and maintains a maximal set during mining.

    Attributes:
        mfhuoi (set): Set of maximal FHUOIs.

    Example:
        >>> algo = MaxCloFHUOIM(db, ms=1, muo=0.7)
        >>> cfhuoi, mfhuoi = algo.mine()
    """
    def __init__(self, database: QuantitativeDatabase, ms: int, muo: float):
        """
        Initialize MaxCloFHUOIM.

        Args:
            database (QuantitativeDatabase): Input database.
            ms (int): Minimum support threshold.
            muo (float): Minimum utility occupancy threshold.

        Returns:
            None

        Example:
            >>> algo = MaxCloFHUOIM(db, ms=1, muo=0.7)
        """
        super().__init__(database, ms, muo)
        self.mfhuoi: Set[Tuple[str, ...]] = set()
    
    def mine(self) -> Tuple[Set[Tuple[str, ...]], Set[Tuple[str, ...]]]:
        """
        Execute mining and return both CFHUOIs and MFHUOIs.

        Returns:
            tuple: (cfhuoi_set, mfhuoi_set)

        Example:
            >>> cfhuoi, mfhuoi = algo.mine()
        """
        # Call parent mine method
        cfhuoi = super().mine()
        
        # Extract maximal FHUOIs from CFHUOIs (Proposition 4)
        print("\nExtracting maximal FHUOIs...")
        self._extract_mfhuoi()
        
        print(f"MFHUOIs found: {len(self.mfhuoi)}")
        
        return cfhuoi, self.mfhuoi
    
    def _add_to_cfhuoi(self, itemset: Tuple[str, ...], puon_list: PUONList):
        """
        Add itemset to CFHUOI and update MFHUOI (UpdateMaxFHUOI).

        Args:
            itemset (tuple): Itemset to add.
            puon_list (PUONList): PUON-list of itemset.

        Returns:
            None

        Example:
            >>> self._add_to_cfhuoi(('a','e'), puon_list)
        """
        # Add to CFHUOI
        super()._add_to_cfhuoi(itemset, puon_list)
        
        # Update MFHUOI
        self._update_mfhuoi(itemset)
    
    def _update_mfhuoi(self, itemset: Tuple[str, ...]):
        """
        Update MFHUOI set when a new CFHUOI is added.

        Args:
            itemset (tuple): Newly added CFHUOI.

        Returns:
            None

        Example:
            >>> self._update_mfhuoi(('a','e'))
        """
        # Remove any existing MFHUOIs that are subsets of new itemset
        to_remove = set()
        for mfhuoi in self.mfhuoi:
            if set(mfhuoi).issubset(set(itemset)) and mfhuoi != itemset:
                to_remove.add(mfhuoi)
        
        self.mfhuoi -= to_remove
        
        # Check if new itemset should be added to MFHUOI
        # It should be added if no existing MFHUOI is its superset
        is_maximal = True
        for mfhuoi in self.mfhuoi:
            if set(itemset).issubset(set(mfhuoi)):
                is_maximal = False
                break
        
        if is_maximal:
            self.mfhuoi.add(itemset)
    
    def _extract_mfhuoi(self):
        """
        Extract MFHUOIs from the CFHUOI set (post-processing).

        Returns:
            None

        Example:
            >>> self._extract_mfhuoi()
        """
        for itemset in self.cfhuoi:
            is_maximal = True
            for other in self.cfhuoi:
                if other != itemset and set(itemset).issubset(set(other)):
                    is_maximal = False
                    break
            
            if is_maximal:
                self.mfhuoi.add(itemset)

## Utility Functions (Data Loading)


In [7]:
# Utility functions for loading data and creating database

def load_database_from_dict(transactions_dict: Dict[str, Dict[str, int]], profits: Dict[str, float]) -> QuantitativeDatabase:
    """
    Create a QuantitativeDatabase from dictionary inputs.

    Args:
        transactions_dict (dict): {tid: {item: quantity}}
        profits (dict): {item: profit}

    Returns:
        QuantitativeDatabase: Built database.

    Example:
        >>> db = load_database_from_dict({'t1': {'a': 2}}, {'a': 3})
    """
    db = QuantitativeDatabase()
    db.set_profits(profits)

    for tid, items in transactions_dict.items():
        db.add_transaction(tid, items)

    return db


def load_database_from_file(filepath: str, profit_filepath: Optional[str] = None) -> QuantitativeDatabase:
    """
    Load a QuantitativeDatabase from transaction and profit files.

    Args:
        filepath (str): Transaction file path.
        profit_filepath (str, optional): Profit file path.

    Returns:
        QuantitativeDatabase: Built database.

    Example:
        >>> db = load_database_from_file('transactions.txt', 'profits.txt')
    """
    db = QuantitativeDatabase()
    transactions = {}

    # Load transactions
    with open(filepath, 'r') as f:
        for tid, line in enumerate(f):
            items = {}
            for pair in line.strip().split():
                if ':' in pair:
                    item, quantity = pair.split(':')
                    items[item] = int(quantity)
            transactions[f"t{tid+1}"] = items

    # Load profits if provided
    if profit_filepath:
        profits = {}
        with open(profit_filepath, 'r') as f:
            for line in f:
                parts = line.strip().split()
                if len(parts) >= 2:
                    item, profit = parts[0], float(parts[1])
                    profits[item] = profit
        db.set_profits(profits)
    else:
        # Default profit of 1 for all items
        all_items = set()
        for items in transactions.values():
            all_items.update(items.keys())
        db.set_profits({item: 1.0 for item in all_items})

    # Add transactions
    for tid, items in transactions.items():
        db.add_transaction(tid, items)

    return db


## Example Run


In [8]:
if __name__ == "__main__":
    # Example from the paper (Table 1, 2, 3)
    print("=" * 80)
    print("RUNNING EXAMPLE FROM PAPER")
    print("=" * 80)
    
    # Example database from paper
    transactions = {
        't1': {'b': 8, 'c': 7, 'd': 5, 'e': 4, 'f': 2},
        't2': {'a': 2, 'e': 3},
        't3': {'c': 2, 'e': 1, 'f': 3},
        't4': {'d': 3, 'f': 5},
        't5': {'a': 3, 'b': 3, 'c': 5, 'd': 4, 'e': 2, 'f': 1}
    }
    
    profits = {
        'a': 2, 'b': 3, 'c': 1, 
        'd': 5, 'e': 11, 'f': 7
    }
    
    # Create database
    db = load_database_from_dict(transactions, profits)
    
    # Test with parameters from Example 4 in paper
    print("\nTest 1: ms=1, muo=0.7")
    print("-" * 80)
    
    max_algo = MaxCloFHUOIM(db, ms=1, muo=0.7)
    cfhuoi, mfhuoi = max_algo.mine()
    
    print("\nResults:")
    print(f"CFHUOIs: {sorted(cfhuoi)}")
    print(f"MFHUOIs: {sorted(mfhuoi)}")
    print("\nExpected from paper (Example 4):")
    print("CFHUOIs: {ae, bcdef, abcdef}")
    print("MFHUOIs: {abcdef}")
    
    # Test with lower thresholds
    print("\n" + "=" * 80)
    print("Test 2: ms=1, muo=0.1")
    print("-" * 80)
    
    db2 = load_database_from_dict(transactions, profits)
    algo2 = CloFHUOIM(db2, ms=1, muo=0.1)
    cfhuoi2 = algo2.mine()
    
    print(f"\nCFHUOIs found: {len(cfhuoi2)}")
    print(f"Expected from paper: 7 CFHUOIs (11.5% of 61 FHUOIs)")

RUNNING EXAMPLE FROM PAPER

Test 1: ms=1, muo=0.7
--------------------------------------------------------------------------------
Starting CloFHUOIM mining...
Parameters: ms=1, muo=0.7
Relevant items: 6 out of 6
Building PUON-lists for 1-itemsets...
Mining CFHUOIs...

Mining completed in 0.00 seconds
Peak memory usage: 0.03 MB
CFHUOIs found: 3
Candidates evaluated: 17
Pruned by SDPS: 4
Pruned by BCPS: 2
Pruned by LPS: 4

Extracting maximal FHUOIs...
MFHUOIs found: 1

Results:
CFHUOIs: [('a', 'b', 'c', 'd', 'e', 'f'), ('a', 'e'), ('b', 'c', 'd', 'e', 'f')]
MFHUOIs: [('a', 'b', 'c', 'd', 'e', 'f')]

Expected from paper (Example 4):
CFHUOIs: {ae, bcdef, abcdef}
MFHUOIs: {abcdef}

Test 2: ms=1, muo=0.1
--------------------------------------------------------------------------------
Starting CloFHUOIM mining...
Parameters: ms=1, muo=0.1
Relevant items: 6 out of 6
Building PUON-lists for 1-itemsets...
Mining CFHUOIs...

Mining completed in 0.00 seconds
Peak memory usage: 0.03 MB
CFHUOIs fou