## Simulating Data
In this section, we will be simulating transaction data. We will simulate 20 records with where a customer walks into the store and buys a minimum of 2 items and a maximum of 5 items.

The purpose of this project is to predict which items are more likely to appear together when a customer is shopping. This will help the store make decisions 

In [None]:
# Inporting necessary libraries
import pandas as pd
import random
from mlxtend.frequent_patterns import apriori, association_rules

# Creating a pool of goods
goods = ['Milk', 'Bread', 'Beer', 'Diaper', 'Eggs', 'Butter', 'Coffee', 'Cereal', 'Salt']

# Set seed
random.seed(123)

# Simulating 20 transactions
transactions = []
for _ in range(20):
    num_goods = random.randint(2, 5)
    transaction = random.sample(goods, num_goods)
    transactions.append(transaction)

# Display results
df = pd.DataFrame(transactions)
df

Unnamed: 0,0,1,2,3,4
0,Eggs,Bread,,,
1,Eggs,Bread,Coffee,Milk,Diaper
2,Butter,Milk,Bread,Coffee,
3,Salt,Butter,Cereal,Bread,
4,Milk,Coffee,Cereal,,
5,Coffee,Bread,,,
6,Butter,Cereal,,,
7,Milk,Bread,,,
8,Beer,Milk,Salt,,
9,Cereal,Eggs,Diaper,Milk,Beer


## Coverting Data to One-Hot Encoded Data
In this next step, we will convert the data to a one-hot encoded data in preparation for the apriori algorithm. 

In [134]:
# One-hot encode using pandas
encoded_df = pd.get_dummies(df)
print("\nOne-Hot Encoded DataFrame: ")
encoded_df


One-Hot Encoded DataFrame: 


Unnamed: 0,0_Beer,0_Bread,0_Butter,0_Cereal,0_Coffee,0_Diaper,0_Eggs,0_Milk,0_Salt,1_Bread,...,2_Eggs,2_Salt,3_Bread,3_Butter,3_Coffee,3_Diaper,3_Eggs,3_Milk,4_Beer,4_Diaper
0,False,False,False,False,False,False,True,False,False,True,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,True,False,False,True,...,False,False,False,False,False,False,False,True,False,True
2,False,False,True,False,False,False,False,False,False,False,...,False,False,False,False,True,False,False,False,False,False
3,False,False,False,False,False,False,False,False,True,False,...,False,False,True,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,True,False,False,...,False,False,False,False,False,False,False,False,False,False
5,False,False,False,False,True,False,False,False,False,True,...,False,False,False,False,False,False,False,False,False,False
6,False,False,True,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7,False,False,False,False,False,False,False,True,False,True,...,False,False,False,False,False,False,False,False,False,False
8,True,False,False,False,False,False,False,False,False,False,...,False,True,False,False,False,False,False,False,False,False
9,False,False,False,True,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,True,False


## Finding the frequent itemset using Apriori Algorithm
We will set the minimum support hold to 30%

In [135]:
# Find frequent itemsets using Apriori for minimus support threshold
frequent_itemsets = apriori(encoded_df, min_support = 0.3, use_colnames=True)

print("\nFrequent Itemsets (min_support = 0.3): ")
frequent_itemsets


Frequent Itemsets (min_support = 0.3): 


Unnamed: 0,support,itemsets
0,0.3,(1_Cereal)


## Generating Rules
In this stage, we will generate rules form the frequent itemsets. We will set the metric to `confidence` and the minimum thershold to 0.7

In [136]:
# Generate association rules for minimun confidence threshold
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

print("\nAssociation Rules (metric='confidence', min_threshold=0.7):")

# Display at least 2 rules
if not rules.empty:
    print("\nExample Rules:")
    if len(rules) >= 1:
        rule1 = rules.iloc[0]
        print(f"\nRule 1: If customers buy {list(rule1['antecedents'])}, they are likely to buy {list(rule1['consequents'])}")
        print(f"   Confidence: {rule1['confidence']:.2f}")

    if len(rules) >= 2:
        rule2 = rules.iloc[1]
        print(f"\nRule 2: If customers buy {list(rule2['antecedents'])}, they are likely to buy {list(rule2['consequents'])}")
        print(f"   Confidence: {rule2['confidence']:.2f}")
else:
    print("\nNo association rules found with the given thresholds. Try lowering min_support or min_threshold.")
rules


Association Rules (metric='confidence', min_threshold=0.7):

No association rules found with the given thresholds. Try lowering min_support or min_threshold.


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski


Explanation of a Rule

Rule Example: {Bread} -> {Beer}
Confidence: 0.7

What this rule means in real life:

This rule suggests that customers who buy Bread are 75% likely to also buy Bear. In a grocery store context, this is a valuable insight. It indicates a strong co-occurrence of these two items in shopping baskets. A store manager could use this information to, for example, place Bread and Beer closer together to make shopping more convenient, or strategically place related items (like butter or jam) near both to encourage impulse purchases. This kind of pattern helps in optimizing store layout, product placement, and even promotional strategies.