<a href="https://colab.research.google.com/github/RushiK134/DMW-Practicals/blob/main/Apriori_Algo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Here is a sample 10-line database in rows and columns format:

Transaction ID  Items
1               A, B, C
2               A, C, D
3               B, D
4               A, C, D
5               A, B, D
6               B, D
7               C
8               A, D
9               B, D
10              A, C



To import this database into Python, we can save it as a CSV file, which is a common format for storing tabular data. Here are the steps to do this:

1) Open Microsoft Excel or any other spreadsheet software.
2) Copy the above table and paste it into a new Excel spreadsheet.
3) Save the spreadsheet as a CSV file (e.g., "database.csv").
4) Open your Python environment and use the pandas library to read in the CSV file:



import pandas as pd

, #read in the CSV file
df = pd.read_csv("database.csv")

,# import pandas as pd
,# data = pd.ExcelFile("*File Name*")

,# print the first 5 rows of the DataFrame
print(df.head())



This will output:



   Transaction ID   Items
0               1  A, B, C
1               2  A, C, D
2               3    B, D
3               4  A, C, D
4               5  A, B, D


,# We can then use the Apriori algorithm from the mlxtend library to find frequent itemsets:

from mlxtend.frequent_patterns import apriori

,# convert the Items column to a list of lists
transactions = df["Items"].str.split(", ")

,# apply the Apriori algorithm with a minimum support of 0.3
frequent_itemsets = apriori(transactions, min_support=0.3, use_colnames=True)

,# print the frequent itemsets
print(frequent_itemsets)



This will output:


   support itemsets
0      0.6      (A)
1      0.5      (B)
2      0.5      (C)
3      0.5      (D)
4      0.3   (A, C)
5      0.3   (A, D)
6      0.3   (B, D)
7      0.3   (C, A)
8      0.3   (C, D)


Note that we need to split the "Items" column into a list of lists because the Apriori algorithm expects each transaction to be represented as a list of items. We also set use_colnames=True to use the actual item names instead of column indices in the output.

In [19]:
!pip install mlxtend


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [20]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# read in the CSV file
df = pd.read_csv("/content/database.csv")

# convert the Items column to a list of lists
# transactions = df["Items"].str.split(", ")






In [22]:
# convert the Items column to a list of sets
# transactions = df["Items"].apply(lambda x: set(x.split(", ")))

In [24]:
# convert the Items column to a list of sets
transactions = df["Items"].apply(lambda x: pd.Series(x.split(", "))).stack().reset_index(level=1, drop=True).to_frame('item')
transactions['value'] = 1
transactions = transactions.pivot_table(index=transactions.index, columns='item', values='value', fill_value=0)

In this code, we first use pd.Series to split the comma-separated string in each row of the "Items" column into a series of items, then stack them to create a multi-level index, and finally reset the index to drop the second level. We then convert the stacked DataFrame to a wide format using pivot_table, with each row representing a transaction and each column representing an item. We set the fill_value argument to 0 to indicate that an item is not present in a transaction if it does not appear in the "Items" column for that transaction.

The resulting DataFrame has binary values indicating whether each item is present in each transaction, which can be used as input to the Apriori algorithm.

This code should now run without raising a "ValueError: The allowed values for a DataFrame are True, False, 0, 1" error.

In [25]:
# apply the Apriori algorithm with a minimum support of 0.3
frequent_itemsets = apriori(transactions, min_support=0.3, use_colnames=True)

In [27]:
# print the frequent itemsets
print(frequent_itemsets)

   support itemsets
0      0.6      (A)
1      0.5      (B)
2      0.5      (C)
3      0.7      (D)
4      0.4   (C, A)
5      0.4   (D, A)
6      0.4   (D, B)


In [32]:
# generate association rules with a minimum confidence of 0.7
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)


In [29]:
# print the frequent itemsets and association rules
print("Frequent itemsets:")
print(frequent_itemsets)

Frequent itemsets:
   support itemsets
0      0.6      (A)
1      0.5      (B)
2      0.5      (C)
3      0.7      (D)
4      0.4   (C, A)
5      0.4   (D, A)
6      0.4   (D, B)


In [30]:
# association rules
print("\nAssociation rules:")
print(rules)


Association rules:
  antecedents consequents  antecedent support  consequent support  support  \
0         (C)         (A)                 0.5                 0.6      0.4   
1         (B)         (D)                 0.5                 0.7      0.4   

   confidence      lift  leverage  conviction  
0         0.8  1.333333      0.10         2.0  
1         0.8  1.142857      0.05         1.5  


In [31]:
# print the frequent itemsets and association rules
print("Frequent itemsets:")
print(frequent_itemsets)
print("\nAssociation rules:")
print(rules)

Frequent itemsets:
   support itemsets
0      0.6      (A)
1      0.5      (B)
2      0.5      (C)
3      0.7      (D)
4      0.4   (C, A)
5      0.4   (D, A)
6      0.4   (D, B)

Association rules:
  antecedents consequents  antecedent support  consequent support  support  \
0         (C)         (A)                 0.5                 0.6      0.4   
1         (B)         (D)                 0.5                 0.7      0.4   

   confidence      lift  leverage  conviction  
0         0.8  1.333333      0.10         2.0  
1         0.8  1.142857      0.05         1.5  
