# **CPT_S 315 HW2-Coding: Apriori Algorithm (35 points)**
In this homework, you are required to implement the apriori algorithm and return valid association rules based on the output frequent itemset from you algorithm.

**Product Recommendations:**

The action or practice of selling additional products or services to existing customers is called cross-selling. Giving product recommendation is one of the examples of cross-selling that are frequently used by online retailers. One simple method to give product recommendations is to recommend products that are frequently browsed together by the customers.

Suppose we want to recommend new products to the customer based on the products they have already browsed on the online website. Write a program using the A-priori algorithm to find products which are frequently browsed together. Fix the support to s =100 (i.e., product pairs need to occur together at least 100 times to be considered frequent) and  nd itemsets of size 2, 3 and 4.

Use the online browsing behavior dataset provided with this homework. Each line represents a browsing session of a customer. On each line, each string of 8 characters represents the id of an item browsed during that session. The items are separated by spaces.

Our aim is to:

**a)** Identify pairs of items (X, Y ) such that the support of {X,Y} is at least 100. For all such pairs, compute the confidence scores of the corresponding association rules: X => Y , Y => X. Sort the rules in decreasing order of con dence scores and list the top 5 rules in the writeup. Break ties, if any, by lexicographically1 increasing order on the left hand side of the rule.

**b)** Identify items triples (X, Y, Z ) such that the support of {X,Y,Z} is at least 100. For all such triples, compute the confidence scores of the corresponding association rules: (X,Y) => Z , (X,Z) => Y, (Y,Z) => X. Sort the rules in decreasing order of con dence scores and list the top 5 rules in the writeup. Break ties, if any, by lexicographically1 increasing order of the first then second item pair.

### **Requirements:**
Please download the ``files hw_2.ipynb``, ``test-dataset.txt``, and ``browsing-dataset`` from Canvas and complete the codes in hw_2.ipynb following the instructions in the file. You can run and debug your codes using Google Colab. After completing the tasks, save hw_2.ipynb as hw_2.py by clicking File -> Download -> Download .py. Then upload hw_2.py to Gradescope. The autograder on Gradescope will automatically grade your coding homework.

In [537]:
#We will provide two sample of testing dataset for your debugging.
# The testing dataset used on Gradescope will be different but with the same structure.
# Upload the file "browsing-dataset.txt" and "test-data.txt"
# Select the two files and upload them together or You can run this cell twice to upload one by one
#These two dataset will be used for testing the outputs of your aprior algorithm
# These two files can be downloaded on Canvas
try:
    from google.colab import files
    uploaded = files.upload()
except ImportError as e:
    pass

 Your current working directory should include the following filenames from the assignment:

```
['browsing-dataset.txt', 'test-data.txt', 'Apriori_HW2_v3_solution']
```

# Introduction

In some cells and files you will see code blocks that look like this:

```python
##############################################################################
#                    TODO: Write the equation for a line                     #
##############################################################################
raise NotImplementedError()
##############################################################################
#                              END OF YOUR CODE                              #
##############################################################################
```

You should replace the "raise NotImplementedError()"  with your own code and leave the blocks intact, like this:

```python
##############################################################################
#                    TODO: Write the equation for a line                     #
##############################################################################
y = m * x + b
##############################################################################
#                              END OF YOUR CODE                              #
##############################################################################
```

# **When completing the notebook, please adhere to the following rules:**
## - **Do not write or modify any code outside of code blocks**
## - **Do not add or delete any cells from the notebook. You may add new cells to perform scatch work, but delete them before submitting.**
## - **Make sure at least you pass all the testing cases before submitting**.



# **You will only get credit for code that has been run!**.

# **We will not re-run your notebook -- you will get your credits automatically on Gradescope**

In [538]:
import itertools
from itertools import combinations, chain
from numpy import tri

**Your Answer is needed in the following cell:**

In [539]:
# New
def generatecandidates(Lk_prev, k):
    candidates = []
    len_Lk = len(Lk_prev)
    for i in range(len_Lk):
        for j in range(i + 1, len_Lk):
            #Check if L1 and L2 match:
            L1 = sorted(list(Lk_prev[i])[:k - 2])
            L2 = sorted(list(Lk_prev[j])[:k - 2])
            if L1 == L2:
                # insert to list:
                candidates.append(sorted(list(set(Lk_prev[i]) | set(Lk_prev[j]))))
    return candidates

#APRIORI Alg:
def apriori(trans_data, min_sup, K_max):
    item_counts = {}
    #For each transaction:
    for transaction in trans_data:
        for item in transaction:
            if item in item_counts:
                item_counts[item] += 1
            else:
                item_counts[item] = 1

    # get frequent items:
    frequent_items = [[] for _ in range(K_max)]

    #get first list, and replacement list
    L1 = [[item] for item, count in item_counts.items() if count >= min_sup]
    unsorted = [item for item, count in item_counts.items() if count >= min_sup]
    frequent_items[0] = sorted(L1)

    k = 2
    # go for length:
   # while len(frequent_items[k - 2]) > 0 and k <= K_max:
    while k <= K_max:
        # generate candidates:
        Ck = generatecandidates(frequent_items[k - 2], k)
        item_counts = {tuple(candidate): 0 for candidate in Ck}
        #for each transaction get the subset
        for transaction in trans_data:
            subsets = [tuple(sorted(comb)) for comb in combinations(transaction, k)] # Convert sets to tuples
            for subset in subsets:
                if subset in item_counts:
                    item_counts[subset] += 1

        Lk = [list(candidate) for candidate, count in item_counts.items() if count >= min_sup]
        frequent_items[k - 1] = sorted(Lk)
        k += 1
        
    #insert replacement list
    frequent_items[0] = unsorted
        
    return frequent_items

# New

# def apriori_pass_1(data_set, s):

#     item_counts = {}

#     for basket in data_set:
#         for item in basket:
#             if item_counts.get(item):
#                 item_counts[item] = item_counts[item] + 1
#             else:
#                 item_counts[item] = 1

#     #return {key: value for (key, value) in item_counts.items() if value >= s}
    
#     frequent_itemsets = {key: value for (key, value) in item_counts.items() if value >= s}
#     print("Frequent Itemsets:", frequent_itemsets)
#     return frequent_itemsets

# def generate_combinations(items, r):
#     """Generate combinations of items with length r as lists."""
#     if r == 0:
#         yield []
#     else:
#         items = list(items)  # Convert the slice to a list
#         for i in range(len(items)):
#             for combo in generate_combinations(items[i+1:], r-1):
#                 yield [items[i]] + combo

# def apriori_pass_2(frequent_items, data_set, s):
#     item_counts = {}
#     line_count = 0
#     candidates = set(combinations(frequent_items, 2))
#     print("Candidates:", candidates)

#     for line in data_set:
#         for candidate in candidates:
#             if candidate[0] in line and candidate[1] in line:
#                 if item_counts.get(candidate):
#                     item_counts[candidate] += 1
#                 else:
#                     item_counts[candidate] = 1
#         line_count += 1
        
#     print("Item Counts:", item_counts)
    
#     frequent_itemsets = {key: value for (key, value) in item_counts.items() if value >= s}
#     print("Frequent Itemsets:", frequent_itemsets)
#     return frequent_itemsets

# def create_triplets(lst):
#     triplets = []
#     for i in range(len(lst)-2):
#         triplet = lst[i:i+3]
#         triplets.append(triplet)
#     return triplets


# def create_triplets(lst):
#     triplets = []
#     for i in range(len(lst)-2):
#         triplet = [item for sublist in lst[i:i+3] for item in sublist]
#        # triplet = [item for sublist in lst[i:i] for item in sublist]
#         triplets.append(triplet)
#     return triplets

# def create_triplets(lst):
#     triplets = []
#     for itemset in lst:
#         # Generate combinations of three items from the itemset
#         for triplet in combinations(itemset, 3):
#             triplets.append(list(triplet))
#     return triplets

# def create_triplets(lst):
#     triplets = []
#     for i in range(len(lst)-3):
#         triplet = [lst[i], lst[i+1], lst[i+2]]
#         triplets.append(triplet)
#     return triplets


# def apriori_pass_3(frequent_items, data_set, s):
#     item_counts = {}
#     line_count = 0
#    # triples = list(itertools.combinations(frequent_items,3)) 
    
#     # triples = combinations(frequent_items, 3)
#     frequent_items = [list(itemset) for itemset in frequent_items]
#     triples = create_triplets(frequent_items)
#     #triples = list(itertools.combinations(frequent_items,3)) 
    
#     # print("Triples: ",  triples)
    
#     # # Generate triples
#     # triples = []
#     # n = len(frequent_items)
#     # for i in range(n):
#     #     for j in range(i+1, n):
#     #         for k in range(j+1, n):
#     #             triples.append([frequent_items[i], frequent_items[j], frequent_items[k]])
    
#     print("Triples: ",  triples)

#     for line in data_set:
#         for combination in triples:
#             if all(item in line for item in combination):
#                 if item_counts.get(combination):
#                     item_counts[combination] += 1
#                 else:
#                     item_counts[combination] = 1
#         line_count += 1
        
#     # for line in data_set:
#     #     for combination in triples:
#     #        # if all(item in line for item in combination):
#     #         if combination[0] in line and combination[1] in line and combination[2] in line:
#     #             if item_counts.get(combination):
#     #                 item_counts[combination] += 1
#     #             else:
#     #                 item_counts[combination] = 1
#     #     line_count += 1
    
#     frequent_itemsets = {key: value for (key, value) in item_counts.items() if value >= s}
#     print("Frequent Itemsets:", frequent_itemsets)
#     return frequent_itemsets
#     #return {key: value for (key, value) in item_counts.items() if value >= s}
    
# def apriori_pass_4(frequent_items, data_set, min_sup):
#     item_counts = {}
#     line_count = 0
#     quadruples = set(combinations(frequent_items, 4))

#     for line in data_set:
#         for combination in quadruples:
#             if all(item in line for item in combination):
#                 if item_counts.get(combination):
#                     item_counts[combination] += 1
#                 else:
#                     item_counts[combination] = 1
#         line_count += 1
    
#     frequent_itemsets = {key: value for (key, value) in item_counts.items() if value >= min_sup}
#     return frequent_itemsets

# def reorder_itemset(itemset):
#     """
#     Reorder the itemset based on the lexicographical order of the items.
#     """
#     return sorted(itemset)

# def apriori(trans_data : list, min_sup: int, K_max: int) -> list:
    """
    Please include your own implementation of the apriori algorithm.
    frequent_items is the output which is a list of frequent itemsets
    The input of the function:
    trans_data: (tpye:list) contains the transition dataset, for example, if we use the ''test-data.txt'' as input, then the trans_data is:
    [['I1', 'I2', 'I5'], ['I2', 'I4'], ['I2', 'I4'], ['I1', 'I2', 'I4'], ['I1', 'I3'], ['I2', 'I3'], ['I1', 'I3'], ['I1', 'I2', 'I3', 'I5'], ['I1', 'I2', 'I3']]
    You will find how to transform the 'txt' data into a list in the testing cell.
    min_sup: minimum support count, pay attention, this is for the support count not for support(relative support)

    K_max: indicate the largest size of frequent itemsets for the output. For example, if K_max = 3, then your algorithm will out put a list of frequent itemsets:
    [[L_1],[L_2],[L_3]], where L_k represents the set of frequent itemset with size k.
    For example, for the ''test-data.txt'' with min_sup = 2, and K_max = 3, the output is:
    frequent_items = [['I2', 'I1', 'I5', 'I4', 'I3'],[['I1', 'I2'], ['I1', 'I3'], ['I1', 'I5'], ['I2', 'I3'], ['I2', 'I4'], ['I2', 'I5']],[['I1', 'I2', 'I3'], ['I1', 'I2', 'I5']]]
     """
    ##########################################################################
    #                     TODO: Implement this function                      #
    ##########################################################################
    # Replace "raise NotImplementedError()" with your code
    
    frequent_items = []
    reordered_itemset = []
    frequent_itemsets_reordered = []
   
    # Apriori pass 1
    L1 = apriori_pass_1(trans_data, min_sup)
    frequent_items.append(list(L1.keys()))
    
    # Handle case when K_max is less than 2
    if K_max < 2:
        return frequent_items
          
         
    # Apriori pass 2
    L2 = apriori_pass_2(L1, trans_data, min_sup)
    # Convert tuple keys to lists
    L2_as_lists = [list(itemset) for itemset in L2.keys()]
   # L2_as_lists = [reorder_itemset(list(itemset)) for itemset in L2.keys()]
    print("L2_as_lists: " , L2_as_lists)
    frequent_items.append(L2_as_lists) 

    # Handle case when K_max is less than 3
    if K_max < 3:
        return frequent_items
        
           # What I added
           
    # # Apriori pass 3
    L3 = apriori_pass_3(L2, trans_data, min_sup)
    # Convert tuple keys to lists
    L3_as_lists = [list(itemset) for itemset in L3.keys()]
    print("L3_as_lists: " , L3_as_lists)
    frequent_items.append(L3_as_lists)
           
    # Handle case when K_max is less than 4
    if K_max < 4:
        return frequent_items
    
    # Initialize Lk for pass 3
    #Lk = L3

    # Apriori pass 3
    # k = 4
    # #k = 3
    # while k <= K_max:
    #     Lk = apriori_pass_3(Lk, trans_data, min_sup)
    #     if not Lk:
    #         break
    #     frequent_items.append(list(Lk.keys()))
    #     k += 1
        
           # What I added end 

    # Initialize Lk for pass 3
   # Lk = L2
    Lk = L3

    # Apriori pass 3
    #k = 6
   # k = 3
    k = 4
   # print("Entering loop")
    while k <= K_max:
        #Lk = apriori_pass_3(Lk, trans_data, min_sup)
        Lk = apriori_pass_4(Lk, trans_data, min_sup)
        if not Lk:
            break
        frequent_items.append(list(Lk.keys()))
        print("k:", k, "length of frequent_items:", len(frequent_items))  #
        k += 1
        
    frequent_itemsets_reordered = []
    for itemset in frequent_items:
        reordered_itemset = reorder_itemset(itemset)
        frequent_itemsets_reordered.append(reordered_itemset)
    
    print("Final frequent itemsets after reorder:", frequent_itemsets_reordered) 
    return frequent_itemsets_reordered


    #raise NotImplementedError()

    ###########################################################################
    #                            END OF YOUR CODE                             #
    ###########################################################################

    #return frequent_items

**Your Answer is needed in the following cell:**



In [540]:
# def gen_confidencePair_TOP5(trans_data : list, min_sup: int)-> list:
#     """
#     Generates  Confidence pairs.
#     You are supported to generate the (Top 5) association rules discoved in the dataset.
#     The ranking is achieved based on their confidences.

#     This function takes as input the list of frequent pairs and
#     and returns a list of pairs with corresponding confidence.
#     The input of the function:
#     trans_data: (tpye:list) contains the transition dataset, for example, if we use the ''test-data.txt'' as input, then the trans_data is:
#     [['I1', 'I2', 'I5'], ['I2', 'I4'], ['I2', 'I4'], ['I1', 'I2', 'I4'], ['I1', 'I3'], ['I2', 'I3'], ['I1', 'I3'], ['I1', 'I2', 'I3', 'I5'], ['I1', 'I2', 'I3']]
#     You will find how to transform the 'txt' data into a list in the testing cell.
#     min_sup: minimum support count, pay attention, this is for the support count not for support(relative support)

#     For the output, if you use the dataset ''test-dataset.txt'' with min_sup = 100, then output looks like:
#     [['FRO40251','GRO85051'], ['FRO40251','FRO92469'],['DAI62779','ELE21353'],['ELE88583','SNA24799'],['SNA53220','SNA93860']]
#     where the list ['FRO40251' 'GRO85051'] represents the association rule {'FRO40251'->'GRO85051'}
#     """
#     ##########################################################################
#     #                     TODO: Implement this function                      #
#     ##########################################################################
#     #Replace "raise NotImplementedError()" with your code

#     raise NotImplementedError()

#     ###########################################################################
#     #                            END OF YOUR CODE                             #
#     ###########################################################################
    
#     return confidencePair


def count_support(trans_data: list, itemset: set) -> int:
    count = 0
    for basket in trans_data:
        if itemset.issubset(set(basket)):
            count += 1
    return count

def gen_confidencePair_TOP5(trans_data : list, min_sup: int)-> list:
    """
    Generates  Confidence pairs.
    You are supported to generate the (Top 5) association rules discoved in the dataset.
    The ranking is achieved based on their confidences.

    This function takes as input the list of frequent pairs and
    and returns a list of pairs with corresponding confidence.
    The input of the function:
    trans_data: (tpye:list) contains the transition dataset, for example, if we use the ''test-data.txt'' as input, then the trans_data is:
    [['I1', 'I2', 'I5'], ['I2', 'I4'], ['I2', 'I4'], ['I1', 'I2', 'I4'], ['I1', 'I3'], ['I2', 'I3'], ['I1', 'I3'], ['I1', 'I2', 'I3', 'I5'], ['I1', 'I2', 'I3']]
    You will find how to transform the 'txt' data into a list in the testing cell.
    min_sup: minimum support count, pay attention, this is for the support count not for support(relative support)

    For the output, if you use the dataset ''test-dataset.txt'' with min_sup = 100, then output looks like:
    [['FRO40251','GRO85051'], ['FRO40251','FRO92469'],['DAI62779','ELE21353'],['ELE88583','SNA24799'],['SNA53220','SNA93860']]
    where the list ['FRO40251' 'GRO85051'] represents the association rule {'FRO40251'->'GRO85051'}
    """
    ##########################################################################
    #                     TODO: Implement this function                      #
    ##########################################################################
    #Replace "raise NotImplementedError()" with your code

    frequent_pairs = apriori(trans_data, min_sup, K_max=2)
    confidence_pairs = []
    
    for pair in frequent_pairs[1]:
        X, Y = pair
        support_XY = count_support(trans_data, set(X + Y))
        support_X = count_support(trans_data, set(X))
        confidence_XY = support_XY / support_X
        confidence_pairs.append([X, Y, confidence_XY])
    
    confidence_pairs.sort(key=lambda x: x[2], reverse=True)
    top5_confidence_pairs = confidence_pairs[:5]
    
    return [[X, Y] for X, Y, _ in top5_confidence_pairs]


### Test Case

In [541]:
"""
The following codes will be used for testing your results.
"""
#1. Test the frequent itemsets for the test data

#Loading dataset and convert the data into a list of transaction data
file_path = './test-dataset.txt'
file = open(file_path, "r")
trans_data = []
while True:
  line = file.readline()
  # END OF FILE IS REACHED
  if not line:
    file.close()
    break

  # SPLITTING THE ITEMS
  basket = line.split(" ")
  basket.pop()    #FOR REMOVING '\n'
  trans_data.append(list(basket))
# Begin Test
# The ground truth
frequent_items_testing = [['I1', 'I2', 'I3', 'I4', 'I5'],[['I1', 'I2'], ['I1', 'I3'], ['I1', 'I5'], ['I2', 'I3'], ['I2', 'I4'], ['I2', 'I5']],[['I1', 'I2', 'I3'], ['I1', 'I2', 'I5']]]
min_sup = 2
K_max = 3
# min_sup = 15
# K_max = 5
frequent_items = apriori(trans_data,min_sup,K_max)

# Compare converted generated frequent itemsets with expected frequent itemsets
for k, itemset_list in enumerate(frequent_items_testing):

#for k in range(min(K_max, len(frequent_items_testing))):
    itemset_list = frequent_items_testing[k]
    print("K test: ", k + 1)
    print(f"Expected frequent itemsets for size {k + 1}: {itemset_list}")
    print(f"Converted generated frequent itemsets for size {k + 1}: {frequent_items[k]}")
    if itemset_list != frequent_items[k]:
        print("Discrepancy found!")
    else:
        print("Itemsets match.")

for k in range(K_max):
  Lk = frequent_items[k]
  Lktest = frequent_items_testing[k]
  if k == 0:
    assert sorted(Lk)==sorted(Lktest), f"Frequent Itemset with size {k+1} is not correct"
  else:
    for item in Lk:
      if sorted(item) in Lktest:
        Lktest.remove(sorted(item))
    assert len(Lktest)==0, f"Frequent Itemset with size {k+1} is not correct"
print("YOU PASSED THE TEST!")

K test:  1
Expected frequent itemsets for size 1: ['I1', 'I2', 'I3', 'I4', 'I5']
Converted generated frequent itemsets for size 1: ['I1', 'I2', 'I5', 'I4', 'I3']
Discrepancy found!
K test:  2
Expected frequent itemsets for size 2: [['I1', 'I2'], ['I1', 'I3'], ['I1', 'I5'], ['I2', 'I3'], ['I2', 'I4'], ['I2', 'I5']]
Converted generated frequent itemsets for size 2: [['I1', 'I2'], ['I1', 'I3'], ['I1', 'I5'], ['I2', 'I3'], ['I2', 'I4'], ['I2', 'I5']]
Itemsets match.
K test:  3
Expected frequent itemsets for size 3: [['I1', 'I2', 'I3'], ['I1', 'I2', 'I5']]
Converted generated frequent itemsets for size 3: [['I1', 'I2', 'I3'], ['I1', 'I2', 'I5']]
Itemsets match.
YOU PASSED THE TEST!


In [543]:
#2. Test the frequent itemsets for the browsing-dataset
#Loading dataset and convert the data into a list of transaction data
file_path = './browsing-dataset.txt'
file = open(file_path, "r")
trans_data = []
while True:
  line = file.readline()
  # END OF FILE IS REACHED
  if not line:
    file.close()
    break
  # SPLITTING THE ITEMS
  basket = line.split(" ")
  basket.pop()    #FOR REMOVING '\n'
  trans_data.append(list(basket))
# Begin Test
# The ground truth
frequent_items_testing = [['DAI62779', 'DAI75645', 'ELE17451', 'FRO40251'], ['DAI62779', 'DAI75645', 'ELE17451', 'SNA80324'], ['DAI62779', 'DAI75645', 'FRO40251', 'SNA80324'], ['DAI62779', 'ELE17451', 'FRO40251', 'GRO85051'], ['DAI62779', 'ELE17451', 'FRO40251', 'SNA80324'], ['DAI62779', 'ELE17451', 'GRO85051', 'SNA80324'], ['DAI62779', 'FRO19221', 'SNA53220', 'SNA93860'], ['DAI62779', 'FRO40251', 'GRO85051', 'SNA80324'], ['DAI75645', 'ELE17451', 'FRO40251', 'SNA80324'], ['DAI75645', 'FRO40251', 'GRO85051', 'SNA80324'], ['ELE17451', 'FRO40251', 'GRO85051', 'SNA80324']]
# min_sup = 15
# K_max = 5
min_sup = 100
K_max = 4
frequent_items = apriori(trans_data,min_sup,K_max)
        
# Compare converted generated frequent itemsets with expected frequent itemsets
for k, itemset_list in enumerate(frequent_items_testing):

#for k in range(min(K_max, len(frequent_items_testing))):
    itemset_list = frequent_items_testing[k]
    print("K test: ", k + 1)
    print(f"Expected frequent itemsets for size {k + 1}: {itemset_list}")
    print(f"Converted generated frequent itemsets for size {k + 1}: {frequent_items[k]}")
    if itemset_list != frequent_items[k]:
        print("Discrepancy found!")
    else:
        print("Itemsets match.")
        
#For simplifity, we only check the case for L4
Lk = frequent_items[K_max-1]
Lktest = frequent_items_testing
for item in Lk:
  if sorted(item) in Lktest:
    Lktest.remove(sorted(item))
assert len(Lktest)==0, f"Frequent Itemset with size {K_max} is not correct"
print("YOU PASSED THE TEST!")

K test:  1
Expected frequent itemsets for size 1: ['DAI62779', 'DAI75645', 'ELE17451', 'FRO40251']
Converted generated frequent itemsets for size 1: []
Discrepancy found!
K test:  2
Expected frequent itemsets for size 2: ['DAI62779', 'DAI75645', 'ELE17451', 'SNA80324']
Converted generated frequent itemsets for size 2: []
Discrepancy found!
K test:  3
Expected frequent itemsets for size 3: ['DAI62779', 'DAI75645', 'FRO40251', 'SNA80324']
Converted generated frequent itemsets for size 3: []
Discrepancy found!
K test:  4
Expected frequent itemsets for size 4: ['DAI62779', 'ELE17451', 'FRO40251', 'GRO85051']
Converted generated frequent itemsets for size 4: []
Discrepancy found!
K test:  5
Expected frequent itemsets for size 5: ['DAI62779', 'ELE17451', 'FRO40251', 'SNA80324']


IndexError: list index out of range

In [None]:
#3. Test the TOP 5 association rule
#Loading dataset and convert the data into a list of transaction data
file_path = './browsing-dataset.txt'
file = open(file_path, "r")
trans_data = []
while True:
  line = file.readline()
  # END OF FILE IS REACHED
  if not line:
    file.close()
    break
  # SPLITTING THE ITEMS
  basket = line.split(" ")
  basket.pop()    #FOR REMOVING '\n'
  trans_data.append(list(basket))
# Begin Test
# The ground truth
association_rules_testing = [['DAI93865', 'FRO40251'], ['FRO40251', 'GRO85051'], ['FRO40251', 'GRO38636'], ['ELE12951', 'FRO40251'], ['DAI88079', 'FRO40251']]

min_sup = 100
association_rules = gen_confidencePair_TOP5(trans_data,min_sup)
#print(f'association_rules: {association_rules}')

for item in association_rules:
  if sorted(item) in association_rules_testing:
    association_rules_testing.remove(sorted(item))
assert len(association_rules_testing)==0, f"The TOP5 Associations Rules You Discovered are not correct!"
print("YOU PASSED THE TEST!")

AssertionError: The TOP5 Associations Rules You Discovered are not correct!