# 3. Frequent Patterns

This JupyterNotebook is part of an exercise series titled *Frequent Patterns*. The series itself is based on lecture *6. Mining Frequent Patterns, Associations and Correlations*. 

There are two parts:

- Part One: Implementing A Priori and FP-Growth
- Part Two: Mining Frequent Patterns on a real dataset

Recall that we have two exercise groups. Depending on how each group progresses, some parts of these exercises may not be discussed in its entirety. If questions arise, ask them in your study group or in our StudOn forum.

## Part One: Implementing A Priori and FP-Growth

In this part we will take a closer look at the methods A Priori and FP-Growth, which are well known from the lecture. In the following, you will first implement both methods yourself step by step and then compare your implementation with the implementation of a common library.


In [None]:
# Import the required libraries
import itertools
import pandas as pd
from dataclasses import dataclass, field

We take a look at a very small data set in this part. It was already used in the lecture and should enable you to validate your code by yourself without knowing a sample solution.

In [None]:
# A very small data set in the form of a list (transactions) of sets (items)
dataset = [
    {"Beer", "Nuts", "Diapers"},
    {"Beer", "Coffee", "Diapers"},
    {"Beer", "Diapers", "Eggs"},
    {"Nuts", "Eggs", "Milk"},
    {"Nuts", "Coffee", "Diapers", "Eggs"},
]
dataset

### A Priori

The first method we consider is A Priori. It is a very basic approach, which requires many accesses to the data set under consideration.

#### Implementation

For our implementation, we first define a (data)class `Itemset`, which can be used to store a set of items together with the count of occurrences of these items in our data set.

In [None]:
# The (data)class Itemset
@dataclass
class Itemset:
    # Attributes
    items: set
    occurrence_count: int = 0


# Example of usage (might be a hint for later tasks)
# Create an example Itemset
example_itemset = Itemset({"Beer", "Nuts"})

# Increase the occurrence_count
example_itemset.occurrence_count += 1

# Check whether this itemset is a subset of a bigger set of items
example_itemset.items.issubset({"Beer", "Nuts", "Diapers"})
example_itemset

We also define a class `ItemsetList`, which is a list of `Itemset`s providing some functions you might want to use in later tasks.

In [None]:
# The class Itemset
class ItemsetList:
    # Constructor
    def __init__(self, itemsets: list[Itemset]):
        self.itemsets = itemsets

    # Functions
    # Return all Itemsets which are containing exactly the passed items
    def get_itemsets_with_items(self, items: set):
        return [x for x in self.itemsets if x.items == items]

    # Check if a there is at least a Itemset containing exactly the passed items
    def contains_itemset_with_items(self, items: set):
        return len(self.get_itemsets_with_items(items)) > 0

    # Return all Itemsets which are containing a superset of the passed items
    def get_itemsets_with_superset_of_items(self, items: set):
        return [x for x in self.itemsets if x.items.issuperset(items)]

    # Check if a there is at least a Itemset containing a superset of the passed items
    def contains_itemset_with_superset_of_items(self, items: set):
        return len(self.get_itemsets_with_superset_of_items(items)) > 0

    # Return all Itemsets which are containing a subset of the passed items
    def get_itemsets_with_subset_of_items(self, items: set):
        return [x for x in self.itemsets if x.items.issubset(items)]

    # Check if a there is at least a Itemset containing a subset of the passed items
    def contains_itemset_with_subset_of_items(self, items: set):
        return len(self.get_itemsets_with_subset_of_items(items)) > 0


# Example of usage (might be a hint for later tasks)
# Create an example ItemsetList
example_itemset_list = ItemsetList([])

# Add our example itemset to the list
example_itemset_list.itemsets.append(example_itemset)

# Check if there is a itemset with exactly the items Beer and Nuts
example_itemset_list.contains_itemset_with_items({"Beer", "Nuts"})

# Get the itemsets with a subset of the items Beer and Nuts
example_itemset_list.get_itemsets_with_subset_of_items({"Beer", "Nuts", "Diapers"})
example_itemset_list.itemsets

The first step in A Priori is to scan the dataset once to get all 1-itemsets. To avoid scanning the dataset multiple times during the search for frequent 1-itemsets the count of occurrences of each item is determined during that step.

<div class="alert alert-block alert-info">

**Task:** Complete the function below, which is intended to generate all 1-itemsets and their occurrence count based on a given dataset.
    
</div>

In [None]:
# Implement a function to generate all 1-itemsets
def generate_one_itemsets(dataset):
    # Initialize an ItemsetList
    itemsets = ItemsetList([])

    # ...

    # Return the itemsets
    return itemsets


# Get all 1-itemsets (and their occurrence count) within our dataset
one_itemsets = generate_one_itemsets(dataset)
one_itemsets.itemsets

In [None]:
# Implement a function to generate all 1-itemsets
def generate_one_itemsets(dataset):
    # Initialize an ItemsetList
    itemsets = ItemsetList([])

    # Iterate over all transactions
    for transaction in dataset:
        # Iterate over all items contained in that transaction
        for item in transaction:
            # Check whether the itemset already exists in itemsets
            if itemsets.contains_itemset_with_items({item}):
                # If yes just increment the items count
                itemsets.get_itemsets_with_items({item})[0].occurrence_count += 1
            else:
                # If no add the item to itemsets (occurrence_count has to be 1, as it is the first occurrence)
                itemsets.itemsets.append(Itemset({item}, 1))

    # Return the itemsets
    return itemsets


# Get all 1-itemsets (and their occurrence count) within our dataset
one_itemsets = generate_one_itemsets(dataset)
one_itemsets.itemsets

Only items that are occurring more often or the same number of times as defined in `minimal_support_count` are frequent 1-itemsets. For this reason, the next necessary step is to prune all itemsets that occur less frequently than this value.


<div class="alert alert-block alert-info">

**Task:** Create the function `prune_itemsets` that removes itemsets that do not satisfy `minimal_support_count` (we use a `minimal_support_count` of 3 in this example).
    
</div>

In [None]:
# Implement a function to prune itemsets that occurred less then minimal_support times
def prune_itemsets(itemsets, minimal_support_count):
    # Initialize an ItemsetList
    frequent_itemsets = ItemsetList([])

    # ...

    # Return the itemsets
    return frequent_itemsets


# Prune every itemset occuring less then three times
frequent_one_itemsets = prune_itemsets(one_itemsets, 3)
frequent_one_itemsets.itemsets

In [None]:
# Implement a function to prune itemsets that occurred less then minimal_support times
def prune_itemsets(itemsets, minimal_support_count):
    # Initialize an ItemsetList
    frequent_itemsets = ItemsetList([])

    # Get all itemsets that occurred at least minimal_support_count times
    # This is very similar to the functions given in ItemsetList
    # but yet it is not included in ItemsetList to provide a little challenge
    frequent_itemsets.itemsets = [
        x for x in itemsets.itemsets if x.occurrence_count >= minimal_support_count
    ]

    # Return the itemsets
    return frequent_itemsets


# Prune every itemset occuring less then three times
frequent_one_itemsets = prune_itemsets(one_itemsets, 3)
frequent_one_itemsets.itemsets

One of the most important principles that A Priori uses is that only itemsets that are themselves frequent can lead to supersets that are frequent. So to find possible candidates for frequent 2-itemsets, only the found frequent 1-itemsets have to be combined to 2-itemsets.

<div class="alert alert-block alert-info">
    
**Task:** Implement `generate_candidates` so that it can be used to generate length-(k+1) candidate itemsets from lenght-k frequent itemsets. You are allowed to use `itertools`.
    
</div>

In [None]:
# Implement a function to generate length-k+1 candidate itemsets from length-k frequent itemsets
def generate_candidates(frequent_k_itemsets):
    # Initialize an ItemsetList
    candidates = ItemsetList([])

    # ...

    # Return the candidates
    return candidates


# Generate the candidates of the second level
two_candidates = generate_candidates(frequent_one_itemsets)
two_candidates.itemsets

In [None]:
# Implement a function to generate length-k+1 candidate itemsets from length-k frequent itemsets
def generate_candidates(frequent_k_itemsets):
    # Initialize an ItemsetList
    candidates = ItemsetList([])

    # Get k
    k = len(frequent_k_itemsets.itemsets[0].items)

    # Iterate over the frequent_k_itemsets to get all items contained in at least a single frequent_k_itemset
    items = set()
    for itemset in frequent_k_itemsets.itemsets:
        # Add the items of the itemset to items
        items = items.union(itemset.items)

    # Find all combinations with lenght k+1
    for combination in itertools.combinations(items, k + 1):
        # Check that all subsets with length k are part of frequent_k_itemsets
        all_k_subsets_are_part_of_frequent_k_itemsets = True

        for i in range(k + 1):
            # Convert combination into a list
            # (== copy of the combination)
            k_subset = list(combination)

            # Remove the i-th element
            k_subset.pop(i)

            # Convert the list into set
            k_subset = set(k_subset)

            # Check if k_subset is contained in frequent_k_itemsets
            if not frequent_k_itemsets.contains_itemset_with_items(k_subset):
                # A k_subset is not part of frequent_k_itemsets
                # => The combination is no candidate for k+1
                all_k_subsets_are_part_of_frequent_k_itemsets = False

                # Of course we can skipping further checking now
                break

        # If all are part of frequent_k_itemsets the combination is a candidate
        if all_k_subsets_are_part_of_frequent_k_itemsets:
            candidates.itemsets.append(Itemset(set(combination), 0))

    # Return the candidates
    return candidates


# Generate the candidates of the second level
two_candidates = generate_candidates(frequent_one_itemsets)
two_candidates.itemsets

In [None]:
# You might want to check the list with an other example as well
extra_frequent_two_itemsets = ItemsetList(
    [
        Itemset({"Football", "Shoes"}),
        Itemset({"Football", "Glasses"}),
        Itemset({"Shoes", "Glasses"}),
        Itemset({"Glasses", "Tissues"}),
    ]
)

# Generate the candidates of the third level
extra_three_candidates = generate_candidates(extra_frequent_two_itemsets)
extra_three_candidates.itemsets

After generating candidates, the next step is to scan the dataset to find out how often which candidate occurs.

<div class="alert alert-block alert-info">
    
**Task:** Finalize the `scan_candidates` function, which is used to determine how often each candidate Itemset occurs in the dataset.
    
</div>

In [None]:
# Implement a function to determine how often each candidate Itemset occurs in the dataset
def scan_candidates(dataset, candidates):

    # ...

    # Return the candidates
    return candidates


# Determine how often each candidate Itemset occurs
two_candidates = scan_candidates(dataset, two_candidates)
two_candidates.itemsets

In [None]:
# Implement a function to determine how often each candidate Itemset occurs in the dataset
def scan_candidates(dataset, candidates):
    # Iterate over all transactions
    for transaction in dataset:
        # Get all candidates itemsets that are a subset of the items occurring in the dataset
        subset_candidates = candidates.get_itemsets_with_subset_of_items(transaction)

        # Increase the occurrence count of each candidate Itemset being a subset of the items occurring in the dataset
        for subset_candidate in subset_candidates:
            subset_candidate.occurrence_count += 1

    # Return the candidates
    return candidates


# Determine how often each candidate Itemset occurs
two_candidates = scan_candidates(dataset, two_candidates)
two_candidates.itemsets

Once the number of occurrences of each candidate has been determined, `prune_itemsets` can be used again to remove the candidates that do not match the `minimal_support_count` of 3. 

In [None]:
# Prune all Itemset below the minimal_support_count of 3
frequent_two_itemsets = prune_itemsets(two_candidates, 3)
frequent_two_itemsets.itemsets

After the initial execution of `generate_one_itemsets`, the functions `prune_itemsets`, `generate_candidates` and `scan_candidates` are executed in a loop until no further candidates or frequent itemsets can be found.

<div class="alert alert-block alert-info">
    
**Task:** Write the function a_priori that uses the functions `generate_one_itemsets`, `prune_itemsets`, `generate_candidates` and `scan_candidates` to perform a complete run of A Priori for an arbitrarily large data set.
    
</div>

In [None]:
# Implement an A Priori wrapper
def a_priori(dataset, minimal_support_count):
    # Initialize an ItemsetList
    frequent_itemsets = ItemsetList([])

    # ...

    # Return the frequent_itemsets
    return frequent_itemsets


# Get all frequent itemsets within our dataset satisfing the minimal_support_count of 3
frequent_itemsets = a_priori(dataset, 3)
frequent_itemsets.itemsets

In [None]:
# Implement an A Priori wrapper
def a_priori(dataset, minimal_support_count):
    # Initialize an ItemsetList
    frequent_itemsets = ItemsetList([])

    # Start by generating all 1-itemsets and make the first candidates for becoming frequent itemsets
    candidate_itemsets = generate_one_itemsets(dataset)

    # Start the loop that will run as long as there are candidate_itemsets
    while len(candidate_itemsets.itemsets) > 0:
        # Prune the candidate itemsets not satisfing the minimal_support_count
        frequent_k_itemsets = prune_itemsets(candidate_itemsets, minimal_support_count)

        # If there are no frequent_k_itemset we might also break the loop (second termination criterion)
        if len(frequent_k_itemsets.itemsets) == 0:
            break

        # Otherwise we should add the found frequent k-itemsets to the main list of frequent_itemsets
        frequent_itemsets.itemsets.extend(frequent_k_itemsets.itemsets)

        # Prepare the next loop run
        # Generate possible candidates
        candidate_itemsets = generate_candidates(frequent_k_itemsets)

        # Determine how often each candidate occurs
        candidate_itemsets = scan_candidates(dataset, candidate_itemsets)

    # Return the frequent_itemsets
    return frequent_itemsets


# Get all frequent itemsets within our dataset satisfing the minimal_support_count of 3
frequent_itemsets = a_priori(dataset, 3)
frequent_itemsets.itemsets

#### Libary: Mlxtend

Of course, it's tedious to program A Priori yourself every time you need it. For this reason, there are already some libraries that contain appropriate methods. On this worksheet we use `mlxtend`.

In [None]:
# Import the required packages of mlxtend
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

To be able to use the function `apriori` from `mlxtend` to get the frequent itemsets contained in our dataset, we first have to transform it into a suitable format.

<div class="alert alert-block alert-info">
    
**Task:** Take a look at the `mlxtend` [documentation](http://rasbt.github.io/mlxtend/USER_GUIDE_INDEX/) for information on how dataset must be structured for `apriori` and preprocess our `dataset` accordingly.
    
</div>

In [None]:
# Preprocess the dataset

In [None]:
# Preprocess the dataset
# Create a TransactionEncoder
transaction_encoder = TransactionEncoder()

# Use the TransactionEncoder to transform the dataset into a one-hot encoded NumPy boolean array
one_hot_encoded_dataset = transaction_encoder.fit(dataset).transform(dataset)

# Transform the one-hot encoded array into a pandas DataFrame
preprocessed_dataset = pd.DataFrame(
    one_hot_encoded_dataset, columns=transaction_encoder.columns_
)
preprocessed_dataset

After this preparation, the determination of the frequent itemset in our dataset is possible by using `apriori`. 

<div class="alert alert-block alert-info">
    
**Task:** Using `apriori` from `mlxtend`, determine the frequent itemsets in our dataset. Use a `min_support` comparable to the value we used in the previous section (`minimal_support_count` of 3).
    
</div>

In [None]:
# Use apriori from mlxtend to determine the frequent itemsets in our dataset

In [None]:
# Use apriori from mlxtend to determine the frequent itemsets in our dataset
# Min support has to be 0.6 as there are 5 tuples in our dataset
# => min_support of 0.6 == minimal_suport_count of 3 for 5 tuples)
apriori(preprocessed_dataset, min_support=0.6, use_colnames=True)

There are several differences between your own implementation and mlxtend's. 

<div class="alert alert-block alert-info">
    
**Task:** Consider what differences there are between your implementation and `mlxtend`'s implementation of `apriori` for the user of these functions. 
    
</div>

Write down your solution here:

Of course, the individual details also depend on your specific implementation. However, based on our specifications, it is to be expected that at least the following things will differ:

- <b>The format of the input:</b><br /> 
Both `mlxtend`s implementation, and your own implementation require a specific format of the dataset on entry. 

- <b>The format of the output:</b><br /> 
The format of the output is also different for both variants. 

### FP-Growth

<div class="alert alert-block alert-warning">
TODO
</div>

#### Implementation

<div class="alert alert-block alert-warning">
TODO
</div>

In [None]:
# Class definitions for FP-tree

# The class Node
class Node:
    # Constructor
    def __init__(self, item: str, parent, occurrence_count: int):
        # Save the arguments
        self.item = item
        self.parent = parent
        self.occurrence_count = occurrence_count

        # Set the other parameters used later in the lifespan
        self.childs = list()
        self.node_link = None

    # Output the node
    def print_node(self, level):
        # Print the node itself
        print(
            (" " * (level - 1) * 2)
            + "├── "
            + self.item
            + ": "
            + str(self.occurrence_count)
            + " - Node link: "
            + str(self.node_link)
        )

        # Print the childs
        for child in self.childs:
            child.print_node(level + 1)


# The class Root
class Root:
    # Constructor
    def __init__(self):
        # Set some member variables
        self.item = "Root-Node"
        self.childs = list()

    # Print the tree
    def print_tree(self):
        # Print the root itself
        print(self.item)

        # Print the childs
        for child in self.childs:
            child.print_node(1)


# Override the (data)class Itemset, as we want to be able to save link to a Node right next to our itemsets
@dataclass
class Itemset:
    # Attributes
    items: set
    occurrence_count: int = 0
    node_link: Node = None

In [None]:
# First step: Find frequent 1-itemsets (we can reuse the method implemented for A Priori)
one_itemsets = generate_one_itemsets(dataset)
frequent_one_itemsets = prune_itemsets(one_itemsets, 3)
frequent_one_itemsets.itemsets

In [None]:
# Second step: Sort the items in frequency-descending order (create the f-list)
frequent_one_itemsets.itemsets.sort(key=lambda x: x.occurrence_count, reverse=True)
frequent_one_itemsets.itemsets

In [None]:
# Third step: Construct the basic FP-tree
def construct_fp_tree(dataset, frequent_one_itemsets):
    # Initialize the root node of the FP-tree
    root = Root()

    # Iterate over all transactions
    for transaction in dataset:
        # Set the root node as current node
        current_node = root

        # Interate through the sorted frequent_one_itemsets
        for itemset in frequent_one_itemsets.itemsets:
            # Check if the itemset is part of the transaction
            if itemset.items.issubset(transaction):
                # Check if the item is already present
                if len([x for x in current_node.childs if x.item in itemset.items]) > 0:
                    # Set the node with the item to be the current_node
                    current_node = [
                        x for x in current_node.childs if x.item in itemset.items
                    ][0]

                    # Increase the occurence count of that node
                    current_node.occurrence_count += 1
                else:
                    # Create a new node
                    new_node = Node(list(itemset.items)[0], current_node, 1)

                    # Save the node_link to the last element of the node-link chain
                    if itemset.node_link == None:
                        # If the itemset is not yet linked to a node then set it there
                        itemset.node_link = new_node
                    else:
                        # If it is linked to a node, then follow the node_links until there is no other link
                        node = itemset.node_link
                        while node.node_link != None:
                            node = node.node_link

                        # Save the node_link
                        node.node_link = new_node

                    # Set it as child of the current_node
                    current_node.childs.append(new_node)

                    # Set the new_node as current_node
                    current_node = new_node

    # Return the root node and therefore the whole tree
    return root


# Construct the FP-Tree
fp_tree = construct_fp_tree(dataset, frequent_one_itemsets)

# Print the FP-Tree
print("FP-Tree:")
fp_tree.print_tree()

# Display the header table
print("\nHeader table:")
for itemset in frequent_one_itemsets.itemsets:
    print(
        str(itemset.items)
        + ": "
        + str(itemset.occurrence_count)
        + " - Node link: "
        + str(itemset.node_link)
    )

#### Libary: Mlxtend

Just with like A Priori, there is also a corresponding function for FP-Growth included in `mlxtend`.

In [None]:
# Import the required packages of mlxtend
from mlxtend.frequent_patterns import fpgrowth

Again `mlxtend` expects a certain input format.

<div class="alert alert-block alert-info">
    
**Task:** Take a look at the `mlxtend` [documentation](http://rasbt.github.io/mlxtend/USER_GUIDE_INDEX/) for information on how dataset must be structured for `fpgrowth` and preprocess our `dataset` accordingly.
    
</div>

In [None]:
# Preprocess the dataset

In [None]:
# Preprocess the dataset
# Create a TransactionEncoder
transaction_encoder = TransactionEncoder()

# Use the TransactionEncoder to transform the dataset into a one-hot encoded NumPy boolean array
one_hot_encoded_dataset = transaction_encoder.fit(dataset).transform(dataset)

# Transform the one-hot encoded array into a pandas DataFrame
preprocessed_dataset = pd.DataFrame(
    one_hot_encoded_dataset, columns=transaction_encoder.columns_
)
preprocessed_dataset

After this preparation, the determination of the frequent itemset in our dataset is possible by using `fpgrowth`. 

<div class="alert alert-block alert-info">
    
**Task:** Using `fpgrowth` from `mlxtend`, determine the frequent itemsets in our dataset. Use a `min_support` comparable to the value we used in the previous section (`minimal_support_count` of 3).
    
</div>

In [None]:
# Use fpgrowth from mlxtend to determine the frequent itemsets in our dataset

In [None]:
# Use fpgrowth from mlxtend to determine the frequent itemsets in our dataset
fpgrowth(preprocessed_dataset, min_support=0.6, use_colnames=True)

There are several differences between your own implementation and mlxtend's. 

<div class="alert alert-block alert-info">
    
**Task:** Consider what differences there are between your implementation and `mlxtend`'s implementation of `fpgrowth` for the user of these functions. 
    
</div>

Write down your solution here:

Of course, the individual details also depend on your specific implementation. However, based on our specifications, it is to be expected that at least the following things will differ:

- <b>The format of the input:</b><br /> 
Both `mlxtend`s implementation, and your own implementation require a specific format of the dataset on entry. 

- <b>The format of the output:</b><br /> 
The format of the output is also different for both variants. 

## Part Two: Mining Frequent Patterns on a real dataset

<div class="alert alert-block alert-warning">
TODO
</div>

## Temporary notes

<div class="alert alert-block alert-warning">
TODO: This section should be removed before the final publication of the exercise  
</div>


In [None]:
# Import required libraries
import os
import tempfile
import sqlite3
import urllib.request
import pandas as pd

from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import fpgrowth
from mlxtend.frequent_patterns import association_rules
import mlxtend

In [None]:
# Create a temporary directory
dataset_folder = tempfile.mkdtemp()

# Build path to database
database_path = os.path.join(dataset_folder, "adventure-works.db")

# Get the database
urllib.request.urlretrieve(
    "https://github.com/FAU-CS6/KDD-Databases/raw/main/AdventureWorks/adventure-works.db",
    database_path,
)

# Open connection to the adventure-works.db
connection = sqlite3.connect(database_path)

In [None]:
# Create the clean DataFrame(s)
# Order DataFrame
order_df = pd.read_sql_query(
    "SELECT ReferenceOrderID, COUNT(*) "
    "FROM TransactionHistory h "
    "GROUP BY ReferenceOrderID "
    "ORDER BY COUNT(*)",
    connection,
)

order_df_2 = pd.read_sql_query(
    "SELECT * " "FROM TransactionHistory h",
    connection,
)

order_df_3 = pd.read_sql_query(
    "SELECT ReferenceOrderID, GROUP_CONCAT(ProductID) "
    "FROM TransactionHistory h "
    "GROUP BY ReferenceOrderID ",
    connection,
    index_col="ReferenceOrderID",
)

In [None]:
order_df

In [None]:
order_df_2

In [None]:
list_1 = (
    order_df_2.groupby("ReferenceOrderID")["ProductID"]
    .apply(list)
    .reset_index(name="new")
)
list_1["new"].to_list()

In [None]:
list_1

In [None]:
dataset = [
    "Beer, Nuts, Diapers",
    "Beer, Coffee, Diapers",
    "Beer, Diapers, Eggs",
    "Nuts, Eggs, Milk",
    "Nuts, Coffee, Diapers, Eggs",
]

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

In [None]:
te = TransactionEncoder()
te_ary = te.fit(list_1["new"].to_list()).transform(list_1["new"].to_list())
df = pd.DataFrame(te_ary, columns=te.columns_)
df

In [None]:
apriori(df, min_support=0.01, use_colnames=True)

In [None]:
frequent_itemsets = fpgrowth(df, min_support=0.01, use_colnames=True)
frequent_itemsets

In [None]:
association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

In [None]:
order_df_3

In [None]:
list_2 = order_df_3.values.tolist()
list_2

In [None]:
te = TransactionEncoder()
te_ary = te.fit(list_2).transform(list_2)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

In [None]:
dataset = [
    ["Milk", "Onion", "Nutmeg", "Kidney Beans", "Eggs", "Yogurt"],
    ["Dill", "Onion", "Nutmeg", "Kidney Beans", "Eggs", "Yogurt"],
    ["Milk", "Apple", "Eggs"],
    ["Milk", "Unicorn", "Corn", "Kidney Beans", "Yogurt"],
    ["Corn", "Onion", "Onion", "Kidney Beans", "Ice cream", "Eggs"],
]


te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

apriori(df, min_support=0.6, use_colnames=True)

In [None]:
from treelib import Tree, Node
from inspect import getmembers, isfunction

tree = Tree()

node = tree.create_node("Test1", data=1)
node3 = tree.create_node("Test2", data=2, parent=node)
node2 = tree.create_node("Test3", data=3, parent=node)

tree.show()


node.successors(tree.identifier)