# **IMPORTING THE NECESSARY LIBRARIES**



In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from mlxtend.frequent_patterns import apriori,association_rules
from mlxtend.preprocessing import TransactionEncoder
import warnings
warnings.filterwarnings('ignore')

# **Loading and Data preprocessing**

In [4]:
# --- 1. Load the Dataset ---
data = pd.read_excel("Online retail.xlsx")
print("Dataset loaded successfully!")
print(data.head())

# --- 2. Check Missing Values ---
print('Missing values:', data.isnull().sum())

# --- 3. Remove Duplicates ---
print(f"Original transaction count: {len(data)}")
data.drop_duplicates(inplace=True)
print(f"Transaction count after removing duplicates: {len(data)}")

# --- 4. Prepare Data for Analysis ---
'''
The dataset has a list of items for each transaction, meaning each row contains multiple items.
We must convert each row into a list of transactions for the Apriori algorithm.
Missing values are also handled in this step.
'''
transactions = [
    [item.strip() for item in str(row).split(',')]
    for row in data.iloc[:, 0] if pd.notnull(row)  # Skip rows with null values
]

# --- 5. One-Hot Encode the Transactions ---
# This converts your 'transactions' list into a True/False DataFrame
print("Encoding transactions...")
te = TransactionEncoder()
te.fit(transactions)  # Learns the unique items
te_arr = te.transform(transactions)  # Encodes transactions
te_encoded = pd.DataFrame(te_arr, columns=te.columns_)

print("Encoding complete. Here is the one-hot encoded data:")
print(te_encoded.head())

# --- 6. Run the Apriori Algorithm ---
print('---Running the Apriori algorithm---')
print('**Finding frequent itemsets**')

# Minimum support threshold = 0.003 (0.3%)
frequent_itemsets = apriori(te_encoded, min_support=0.003, use_colnames=True) #without use_colnames apriori returns column indices without the actual itemset name

'''
Total transactions in the dataset: 7,500
Minimum number of transactions an itemset must appear in:
7,500 × 0.003 = 22.5 ≈ 23
That means any itemset that appears in at least 23 transactions or more is considered frequent.
'''

print('===Successfully applied Apriori===')
print('*Top 10 Frequent Itemsets*')
print(frequent_itemsets.nlargest(10, 'support'))


Dataset loaded successfully!
  shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
0                             burgers,meatballs,eggs                                                                                                                                                                             
1                                            chutney                                                                                                                                                                             
2                                     turkey,avocado                                                                                                                                                                             
3  mineral water,milk,energy bar,whole wheat rice...               

# **Association Rule Mining:**

In [5]:
#Generating Association rules
print('---Generating Association rules---')
# generate association rules with 20 % confidence and lift > 3
# Set thresholds
CONFIDENCE_THRESHOLD = 0.2
LIFT_THRESHOLD = 3
rules = association_rules(frequent_itemsets,metric = 'confidence',min_threshold = CONFIDENCE_THRESHOLD)

print(f"Found {len(rules)} rules with > {CONFIDENCE_THRESHOLD*100}% confidence.")
meaningful_rules = rules[rules['lift'] > LIFT_THRESHOLD]
print(f"Found {len(meaningful_rules)} meaningful rules with lift > {LIFT_THRESHOLD}.")

print('rules generated succesfully!')

print('---Top 10 strongest rules---')
print(rules.sort_values(by = 'lift',ascending = False).head(10))




---Generating Association rules---
Found 2666 rules with > 20.0% confidence.
Found 85 meaningful rules with lift > 3.
rules generated succesfully!
---Top 10 strongest rules---
                                    antecedents  \
2555                      (olive oil, tomatoes)   
2515                  (frozen vegetables, soup)   
2501                      (shrimp, ground beef)   
1691                (parmesan cheese, tomatoes)   
2030         (whole wheat pasta, mineral water)   
2506             (frozen vegetables, olive oil)   
2301                 (herb & pepper, chocolate)   
1690       (frozen vegetables, parmesan cheese)   
2298  (herb & pepper, mineral water, chocolate)   
395                                     (pasta)   

                         consequents  antecedent support  consequent support  \
2555  (frozen vegetables, spaghetti)            0.010435            0.039034   
2515           (mineral water, milk)            0.011594            0.067826   
2501  (frozen vegetabl

# **Analysis and interpretation:**

Analysis and Interpretation:

After applying the Apriori algorithm with a confidence threshold of 0.2, clear relationships between products were identified, revealing how items are often bought together and offering insights into customer purchasing behavior.

Frequent items like mineral water, eggs, and spaghetti appeared most often, showing they are popular staples in many transactions. Association rules showed patterns such as customers who bought spaghetti also buying mineral water, and those purchasing chocolate likely buying milk or eggs.

Confidence reflects the reliability of each rule (e.g., 0.7 means 70% of the time both items are bought together). Lift, however, measures the strength of the relationship — a value >1 indicates a strong positive link, while values near or below 1 suggest weak or random associations. For example, a lift of 1.5 for “(spaghetti) → (mineral water)” means customers are 1.5 times more likely to buy mineral water with spaghetti than by chance.

Insights:

Mineral water frequently appears with many items, showing it’s a common complementary product.

Spaghetti and eggs act as core ingredients, often paired with milk or chocolate.

These insights help retailers design combo offers, improve store layout, and optimize product recommendations to boost cross-selling.

Overall, the Apriori results highlight strong, actionable purchasing patterns that can guide marketing and sales strategies.