# Exploring Frequent Itemsets: Closed vs Maximal

This notebook focuses on simulating transaction data for a supermarket scenario and applying frequent pattern mining using the Apriori algorithm.

In this section, we simulate transaction data that will later be used to identify:
- Frequent Itemsets
- Closed Frequent Itemsets
- Maximal Frequent Itemsets


##  Task 1: Simulate Supermarket Transaction Data

In this section, we simulate 3,000 supermarket transactions.  
Each transaction contains between 2 and 7 items, randomly selected from a pool of 30 unique items.  
The resulting dataset is saved to `supermarket_transactions.csv` for further analysis.

---

###  Student Responsible: Selmah Tzindori

In [3]:
# [Student: Selmah Tzindori] Simulation of 3,000 supermarket transactions and export to CSV

# Import the random module to help us randomly select items for each transaction
import random

# Import pandas for working with structured data like tables and CSV files
import pandas as pd

# Step 1: Define a list (pool) of 30 unique supermarket items
# These will be randomly picked to form each transaction
item_pool = [
    'milk', 'bread', 'eggs', 'cheese', 'butter', 'juice', 'apples', 'bananas', 'oranges', 'grapes',
    'cereal', 'chocolate', 'yogurt', 'chicken', 'beef', 'pasta', 'rice', 'tomatoes', 'onions', 'potatoes',
    'carrots', 'lettuce', 'beans', 'soda', 'water', 'coffee', 'tea', 'cookies', 'ice cream', 'toilet paper'
]

# Step 2: Set the number of transactions to simulate
num_transactions = 3000  # Total number of customers or baskets

# Create an empty list that will hold each simulated transaction
transactions = []

# Step 3: Loop 3,000 times to create each transaction
for _ in range(num_transactions):
    # Randomly choose a number between 2 and 7 to determine how many items in this transaction
    transaction_length = random.randint(2, 7)

    # Randomly select 'transaction_length' number of unique items from the item pool
    transaction = random.sample(item_pool, transaction_length)

    # Add the generated transaction (a list of items) to our list of all transactions
    transactions.append(transaction)

# Step 4: Convert the list of transactions into a format suitable for saving to CSV
# Each transaction will become one string, with items separated by commas
transaction_strings = [', '.join(t) for t in transactions]

# Create a pandas DataFrame with one column called 'Transaction'
# Each row in the DataFrame represents a customer transaction
transactions_df = pd.DataFrame({'Transaction': transaction_strings})

# Step 5: Save the DataFrame to a CSV file
# This file will be used in the next steps of the project (frequent itemset mining)
transactions_df.to_csv('supermarket_transactions.csv', index=False)

# Step 6: Show the first 5 transactions to check the output looks correct
transactions_df.head()


Unnamed: 0,Transaction
0,"rice, water, bananas, cheese, potatoes, grapes"
1,"toilet paper, onions, beef, potatoes, lettuce"
2,"pasta, carrots"
3,"onions, soda"
4,"bananas, toilet paper"
