## Association Rule Mining with Simulated Data

Objective: Simulate basic transactional data and use association rules to uncover shopping patterns.

### 1. Simulate Transaction Data

- Create at least **10 fake transactions** in Python.

- Each transaction should have **2-5 items** selected from a pool of **atleast 8 unique items** (e.g., Bread, Milk, Eggs, etc.)


In [2]:
# Loading necessary libraries
import pandas as pd
import random
from mlxtend.frequent_patterns import apriori, association_rules

In [3]:
# Set seed for reproducibility
random.seed(123)

# Creating a product pool 
products = ['chevdo', 'shrikhand', 'chakri', 'sev', 'ladoo', 
           'rasmalai', 'barfi', 'gathiya', 'bhusu', 'farsipuri', 'khoya']

# Generating 10 random transactions
dataset = []
for x in range(10):
    # Randomly choose 2-5 items per transaction
    transaction = random.sample(products, k=random.randint(2,5))
    dataset.append(transaction)

# Displaying the simulated data
print("Generated Transactions:")
for i, t in enumerate(dataset, 1):
    print(f"Transaction {i}: {t}")


Generated Transactions:
Transaction 1: ['ladoo', 'shrikhand']
Transaction 2: ['ladoo', 'shrikhand', 'chevdo', 'barfi', 'khoya']
Transaction 3: ['rasmalai', 'chevdo', 'chakri', 'bhusu']
Transaction 4: ['bhusu', 'rasmalai', 'sev', 'chakri']
Transaction 5: ['barfi', 'shrikhand']
Transaction 6: ['shrikhand', 'chevdo', 'rasmalai', 'gathiya', 'farsipuri']
Transaction 7: ['shrikhand', 'chakri']
Transaction 8: ['chevdo', 'ladoo', 'barfi']
Transaction 9: ['ladoo', 'gathiya', 'chevdo', 'khoya', 'chakri']
Transaction 10: ['sev', 'farsipuri', 'bhusu', 'rasmalai', 'chevdo']


### 2. Analyze with Apriori

- Convert the data into a **one-hot encoded format** using pandas.

- Use the Apriori algorithm (mlxtend) to find frequent itemsets.

- Set minimum support to **0.3** (30%).

In [4]:
# Converting the data to one-hot encoded DataFrame
all_items = sorted(set(item for transaction in dataset for item in transaction))
encoded_data = []

for transaction in dataset:
    encoded_data.append({item: (item in transaction) for item in all_items})

df = pd.DataFrame(encoded_data)

# using the Apriori algorithm to find frequent itemsets with a minimum support of 0.3
frequent_itemsets = apriori(df, min_support=0.3, use_colnames=True)

''' 
frequent itemsets: it is a group of items that appear together in transactions more often than a specified threshold (in this case, 30% of the transactions).
'min_support=0.3': this means we are looking for itemsets that appear in at least 30% of the transactions.

'''

# Displaying the results
print("Frequent Itemsets:\n", frequent_itemsets)


Frequent Itemsets:
    support            itemsets
0      0.3             (barfi)
1      0.3             (bhusu)
2      0.4            (chakri)
3      0.6            (chevdo)
4      0.4             (ladoo)
5      0.4          (rasmalai)
6      0.5         (shrikhand)
7      0.3   (bhusu, rasmalai)
8      0.3     (ladoo, chevdo)
9      0.3  (chevdo, rasmalai)


### 3. Generate Rules 

- Generate association rules with:
    
    - Metric: confidence

    - Minimum threshold: 0.7
    
- Show **at least 2 rules** and briefly explain **what one rule means** in everyday language.

In [None]:
# Generate association rules with a minimum confidence threshold of 0.7
rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.7)

# Displaying the results
print("\nAssociation Rules:\n", rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

'''
- 'antecedents': the items that lead to the rule.
- 'consequents': the items that are likely to be bought as a result.
- 'support': the proportion of transactions that contain the itemset.
- 'confidence': the likelihood that the consequent is bought when the antecedent is bought.
- 'lift': a measure of how much more likely the consequent is bought when the antecedent is bought compared to when it is not.

'''



Association Rules:
   antecedents consequents  support  confidence  lift
0     (bhusu)  (rasmalai)      0.3        1.00  2.50
1  (rasmalai)     (bhusu)      0.3        0.75  2.50
2     (ladoo)    (chevdo)      0.3        0.75  1.25
3  (rasmalai)    (chevdo)      0.3        0.75  1.25


### Briefly describing atleast 2 rules


#### Rule 1:

antecedents: (bhusu)

consequents: (rasmalai)

support: 0.3

confidence: 1.00

lift: 2.50

**Explanation:**

- antecedents (bhusu): this means that the rule is based on the purchase of 'bhusu'.

- consequents (rasmalai): this shows that when a customer buys 'bhusu', they are also likely to buy 'rasmalai'.

- support 0.3: This means that 30% of all transactions in the dataset include both 'bhusu' and 'rasmalai'.

- confidence 1.00: This value indicates that whenever 'bhusu' is bought, 'rasmalai' is purchased 100% of the time. Simply put, every time someone buys 'bhusu' they also buy 'rasmalai'

- lift 2.50: This value suggests that buying 'bhusu' increases the likelihood of buying 'rasmalai' by 125% compared to if there were bought independently.

**Meaning:**

If a customer buys bhusu, we are very confident that they will also buy rasmalai. This may suggest that they like spicy-sweet food.


#### Rule 2:

antecedents: (ladoo)

consequents: (chevdo)

support: 0.3

confidence: 0.75

lift: 1.25

**Explanation:**

- antecedents (ladoo): this means that the rule is based on the purchase of 'ladoo'.

- consequents (chevdo): this shows that when a customer buys 'ladoo', they are also likely to buy 'chevdo'.

- support 0.3: This means that 30% of all transactions in the dataset include both 'ladoo' and 'chevdo'.

- confidence 0.75: This value indicates that whenever 'ladoo' is bought, 'chevdo' is purchased 75% of the time. Simply put, every time someone buys 'ladoo' they also buy 'chevdo'.

- lift 1.25: This value suggests that buying 'ladoo' increases the likelihood of buying 'chevdo' by 25% compared to if there were bought independently.

**Meaning:**

If a customer buys ladoo, we are very confident that they will also buy chevdo. This may suggest that they like sweet-spicy food.