### Importing the necessary libraries

In [84]:
import pandas as pd
import random
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

- This code defines a pool of 8 grocery items and generates 10 fake shopping transactions. Each transaction contains 2 to 5 randomly selected items. The results are stored in a list called transactions and printed out with clear labels like "Transaction 1", "Transaction 2", etc. A random seed is set to make the output reproducible.

In [None]:
# Define a pool of items
item_pool = ['Milk', 'Eggs', 'Butter', 'Cheese', 'Bananas', 'Bread', 'Apples', 'Chicken']

# Generate 10 fake transactions, each with 2–5 random items
random.seed(42)  # for reproducibility
transactions = []
for _ in range(10):
    num_items = random.randint(2, 5)
    transaction = random.sample(item_pool, num_items)
    transactions.append(transaction)

# Display transactions
for i, t in enumerate(transactions, 1):
    print(f"Transaction {i}: {t}")


Transaction 1: ['Apples', 'Bananas']
Transaction 2: ['Bananas', 'Milk', 'Bread']
Transaction 3: ['Milk', 'Bananas', 'Apples', 'Eggs', 'Butter']
Transaction 4: ['Eggs', 'Bread', 'Butter', 'Bananas', 'Apples']
Transaction 5: ['Bananas', 'Chicken']
Transaction 6: ['Apples', 'Butter', 'Cheese', 'Chicken']
Transaction 7: ['Cheese', 'Chicken', 'Bread']
Transaction 8: ['Butter', 'Bananas', 'Apples', 'Chicken', 'Cheese']
Transaction 9: ['Bananas', 'Butter']
Transaction 10: ['Apples', 'Chicken', 'Bananas']


- This code extracts all unique items from the transaction list, encodes each transaction by marking whether each item is present (True) or not (False), and creates a one-hot encoded DataFrame where rows represent transactions and columns represent items.
- Encoding the transactions is important because the Apriori algorithm only takes values in binary format (True/False or 1/0), which allows it to compute item frequencies, identify frequent itemsets, and generate meaningful association rules

In [86]:
# Step 1: Extract all unique items from the transactions
all_items = sorted(set(item for transaction in transactions for item in transaction))

# Step 2: Manually one-hot encode
Encoded_data = []
for transaction in transactions:
    Encoded_data.append({item: (item in transaction) for item in all_items})

# Step 3: Create a DataFrame
df_new = pd.DataFrame(Encoded_data)

- The section of code below applies the Apriori algorithm to the one-hot encoded transaction DataFrame df_new, using a minimum support threshold of 0.3 (meaning an itemset must appear in at least 30% of the transactions to be considered frequent).
The use_colnames=True argument ensures that the item names (e.g., 'Apples', 'Bananas') are shown instead of their column indices.
The result, stored in frequent_itemsets, is printed and shows all item combinations (itemsets) that meet or exceed the support threshold

In [None]:
frequent_itemsets = apriori(df_new, min_support=0.3, use_colnames=True)
print(frequent_itemsets)


    support                   itemsets
0       0.6                   (Apples)
1       0.8                  (Bananas)
2       0.3                    (Bread)
3       0.5                   (Butter)
4       0.3                   (Cheese)
5       0.5                  (Chicken)
6       0.5          (Bananas, Apples)
7       0.4           (Apples, Butter)
8       0.3          (Apples, Chicken)
9       0.4          (Bananas, Butter)
10      0.3         (Bananas, Chicken)
11      0.3          (Cheese, Chicken)
12      0.3  (Bananas, Apples, Butter)


- Association rules are generated below using confidence as the evaluation metric and a minimum threshold of 0.7, meaning only rules where the consequent appears at least 70% of the time given the antecedent are included. The rules DataFrame contains key metrics such as support, confidence, and lift, which help in understanding the strength and relevance of each rule. The first two rules are then displayed to give insight into the most confident item associations within the dataset.

In [90]:

# Generate rules using confidence >= 0.7
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
# Sort rules by confidence in descending order and select the top 2
top_rules = rules.sort_values(by='confidence', ascending=False).head(2)

# Display at least 2 rules
print("\nAssociation Rules:\n", top_rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head(2))



Association Rules:
   antecedents consequents  support  confidence      lift
3    (Cheese)   (Chicken)      0.3    1.000000  2.000000
0    (Apples)   (Bananas)      0.5    0.833333  1.041667


## Eplanation of rule one in everyday life:
“If someone buys Apples, there is a 83.3% chance they will also buy Bananas. So, a store could place these items closer together or bundle them for promotions