# ECLAT Algorithm

The ECLAT algorithm stands for _Equivalence Class Clustering and bottom-up Lattice Traversal_. It is one of the popular methods of Association Rule mining. It is a more efficient and scalable version of the Apriori algorithm [^1]

[^1]: https://www.geeksforgeeks.org/ml-eclat-algorithm/#:~:text=The%20ECLAT%20algorithm%20stands%20for,version%20of%20the%20Apriori%20algorithm.

In [1]:
from collections import defaultdict
from itertools import chain, combinations

The `txs` is a list of transactions, each each transaction contains a list of items purchased.

For example, in the first transaction, we see the item `beer`, `wine` and `cheese` being purchased. 

In [2]:
txs = [{1, 2, 3, 4}, {1, 2, 4}, {1, 2}, {2, 3, 4}, {2, 3}, {3, 4}, {2, 4}]
txs = [
    ["beer", "wine", "cheese"],
    ["beer", "potato chips"],
    ["eggs", "flower", "butter", "cheese"],
    ["eggs", "flower", "butter", "beer", "potato chips"],
    ["wine", "cheese"],
    ["potato chips"],
    ["eggs", "flower", "butter", "wine", "cheese"],
    ["eggs", "flower", "butter", "beer", "potato chips"],
    ["wine", "beer"],
    ["beer", "potato chips"],
    ["butter", "eggs"],
    ["beer", "potato chips"],
    ["flower", "eggs"],
    ["beer", "potato chips"],
    ["eggs", "flower", "butter", "wine", "cheese"],
    ["beer", "wine", "potato chips", "cheese"],
    ["wine", "cheese"],
    ["beer", "potato chips"],
    ["wine", "cheese"],
    ["beer", "potato chips"],
]

In [3]:
def eclat(txs: list[list[any]], *, minsup=2) -> defaultdict[frozenset[str], set[int]]:
    # For each item, keep a list of the transaction ids the item appears in.
    ids_by_item = defaultdict(set)

    # Invert the mapping, so that the key are items, and values are txids.
    for i, tx in enumerate(txs):
        for item in tx:
            ids_by_item[frozenset([item])].add(i)

    # Exclude items that are below min support.
    for item, ids in ids_by_item.copy().items():
        if len(ids) < minsup:
            ids_by_item.pop(item)

    result = ids_by_item
    while len(ids_by_item) > 0:
        tmp = defaultdict(set)

        # Combine each items and their transaction ids.
        items = combinations(ids_by_item.keys(), r=2)
        for item1, item2 in items:
            ids1 = ids_by_item.get(item1)
            ids2 = ids_by_item.get(item2)

            ids = ids1 & ids2
            if len(ids) < minsup:
                continue

            tmp[item1 | item2] = ids

        result.update(tmp)
        ids_by_item = tmp

    return result

In [4]:
txids_by_items = eclat(txs, minsup=2)
txids_by_items

defaultdict(set,
            {frozenset({'beer'}): {0, 1, 3, 7, 8, 9, 11, 13, 15, 17, 19},
             frozenset({'wine'}): {0, 4, 6, 8, 14, 15, 16, 18},
             frozenset({'cheese'}): {0, 2, 4, 6, 14, 15, 16, 18},
             frozenset({'potato chips'}): {1, 3, 5, 7, 9, 11, 13, 15, 17, 19},
             frozenset({'eggs'}): {2, 3, 6, 7, 10, 12, 14},
             frozenset({'flower'}): {2, 3, 6, 7, 12, 14},
             frozenset({'butter'}): {2, 3, 6, 7, 10, 14},
             frozenset({'beer', 'wine'}): {0, 8, 15},
             frozenset({'beer', 'cheese'}): {0, 15},
             frozenset({'beer', 'potato chips'}): {1,
              3,
              7,
              9,
              11,
              13,
              15,
              17,
              19},
             frozenset({'beer', 'eggs'}): {3, 7},
             frozenset({'beer', 'flower'}): {3, 7},
             frozenset({'beer', 'butter'}): {3, 7},
             frozenset({'cheese', 'wine'}): {0, 4, 6, 14, 15, 16, 1

The result returned is the transaction ids the item appeared in, and the combination of items that fulfills the minimum support. To recommend items that are frequently purchased together, we can just find the combination of the items that fulfills the minimum support, and exclude the item.

In [5]:
def recommend(txs: list[list[str]], item: str, *, minsup=7) -> frozenset[str]:
    item = frozenset([item])
    result = eclat(txs, minsup=minsup)

    recommend = []
    for items in result.keys():
        if item & items and items - item:
            recommend.append(items - item)

    return recommend

Recommend something to go with "wine", with at least 3 transactions having that those items purchased together with wine.

In [6]:
recommend(txs, "wine", minsup=3)

[frozenset({'beer'}), frozenset({'cheese'})]

The `minsup` is the number of transactions the combination of items appeared in the transactions. One potential issue is that for large transactions, it is hard to determine the right `minsup` to use. We can perhaps use precentage instead. A `minsup` of 3 for the list of 20 transactions above is just roughly 15%. For a more relevant recommendation, it is good to have a higher percentage of recurring transactions.

This however, is dependent on the domain. Say, if we are only running a online business like Amazon where there can be purchases for books, toys, electronics etc, it is not relevant to get the minsup that is based on the whole transactions. Instead, we want to group the items based on categories and finding the minsup based on those purchases.

In [7]:
3 / len(txs)

0.15

In [8]:
def eclat(txs: list[list[any]], *, k=1, minsup=0.2) -> defaultdict[frozenset[str], int]:
    # For each item, keep a list of the transaction ids the item appears in.
    ids_by_item = defaultdict(set)

    # Invert the mapping, so that the key are items, and values are txids.
    for i, tx in enumerate(txs):
        for item in tx:
            ids_by_item[frozenset([item])].add(i)

    # Exclude items that are below min support.
    for item, ids in ids_by_item.copy().items():
        if len(ids) / len(txs) < minsup:
            ids_by_item.pop(item)

    result = ids_by_item
    while len(ids_by_item) > 0:
        tmp = defaultdict(set)

        # Combine each items and their transaction ids.
        items = combinations(ids_by_item.keys(), r=2)
        for item1, item2 in items:
            ids1 = ids_by_item.get(item1)
            ids2 = ids_by_item.get(item2)

            ids = ids1 & ids2
            if len(ids) / len(txs) < minsup:
                continue

            tmp[item1 | item2] = ids

        result.update(tmp)
        ids_by_item = tmp

    confidence_score_by_items = defaultdict(int)

    for items in result.copy().keys():
        if len(items) < k:
            continue

        confidence_score_by_items[items] = len(result[items]) / len(txs)

    return confidence_score_by_items

In [9]:
result = eclat(txs, k=2, minsup=0.2)
result

defaultdict(int,
            {frozenset({'beer', 'potato chips'}): 0.45,
             frozenset({'cheese', 'wine'}): 0.35,
             frozenset({'eggs', 'flower'}): 0.3,
             frozenset({'butter', 'eggs'}): 0.3,
             frozenset({'butter', 'flower'}): 0.25,
             frozenset({'butter', 'eggs', 'flower'}): 0.25})

In [10]:
item_to_recommend = "eggs"

for items in result.keys():
    if item_to_recommend in items:
        recommended_items = list(items - frozenset([item_to_recommend]))
        print(
            "If you buy {}, then you would buy {} too".format(
                item_to_recommend,
                ", ".join(recommended_items),
            )
        )

If you buy eggs, then you would buy flower too
If you buy eggs, then you would buy butter too
If you buy eggs, then you would buy flower, butter too


Actually, since we are interested to know which items are frequently purchased together with a target item, we do not need to generate the frequent itemsets for each transactions, only to toss it later.

We can simply filter transactions containing the target item first.

In [11]:
txs_with_eggs = [tx for tx in txs if item_to_recommend in tx]
print(len(txs_with_eggs)), txs_with_eggs

7


(None,
 [['eggs', 'flower', 'butter', 'cheese'],
  ['eggs', 'flower', 'butter', 'beer', 'potato chips'],
  ['eggs', 'flower', 'butter', 'wine', 'cheese'],
  ['eggs', 'flower', 'butter', 'beer', 'potato chips'],
  ['butter', 'eggs'],
  ['flower', 'eggs'],
  ['eggs', 'flower', 'butter', 'wine', 'cheese']])

In [20]:
result = eclat(txs_with_eggs, k=3, minsup=0.4)
result

defaultdict(int,
            {frozenset({'butter', 'eggs', 'flower'}): 0.7142857142857143,
             frozenset({'cheese', 'eggs', 'flower'}): 0.42857142857142855,
             frozenset({'butter',
                        'cheese',
                        'eggs',
                        'flower'}): 0.42857142857142855,
             frozenset({'butter', 'cheese', 'eggs'}): 0.42857142857142855,
             frozenset({'butter', 'cheese', 'flower'}): 0.42857142857142855})

Note that the result may contain combination of items that might not contain `eggs` too. We still need to filter those baskets away.

In [19]:
for items in result.keys():
    if item_to_recommend in items:
        recommended_items = list(items - frozenset([item_to_recommend]))
        print(
            "If you buy {}, then you would buy {} too".format(
                item_to_recommend,
                ", ".join(recommended_items),
            )
        )

If you buy eggs, then you would buy flower, butter too
If you buy eggs, then you would buy flower, cheese too
If you buy eggs, then you would buy flower, cheese, butter too
If you buy eggs, then you would buy cheese, butter too


In [21]:
# At least 80% of the transactions that purchases eggs, purchases butter and flower too.
eclat(txs_with_eggs, k=2, minsup=0.8)

defaultdict(int,
            {frozenset({'eggs', 'flower'}): 0.8571428571428571,
             frozenset({'butter', 'eggs'}): 0.8571428571428571})