<a href="https://colab.research.google.com/github/dravidshankar/Data-Science-using-Azure-GD-Goenka/blob/main/Association_Rule_Mining.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Market Basket Analysis / Association Rule Mining using Apriori and FP growth Algorithms

**Load the dataset:** We load the Online Retail dataset from the UCI Machine Learning Repository.

**Data preprocessing:** We clean the data by stripping whitespace from item descriptions, dropping rows with missing invoice numbers, converting invoice numbers to strings, and removing canceled transactions (those with 'C' in the invoice number).

**Basket analysis:** We create a basket of transactions for customers in France, where each row represents a transaction (invoice) and each column represents a product. The values in the table are the quantities of each product in each transaction.

**Encoding the data:** We encode the quantities into 1 (if the product is bought) and 0 (if not bought).

**Apriori algorithm**: We apply the Apriori algorithm to find frequent itemsets and generate association rules based on those itemsets.

**FP-Growth algorithm**: Similarly, we apply the FP-Growth algorithm to find frequent itemsets and generate association rules.

In [None]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules, fpgrowth

# Load the dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx'
df = pd.read_excel(url)

# Data Preprocessing
df['Description'] = df['Description'].str.strip()
df.dropna(axis=0, subset=['InvoiceNo'], inplace=True)
df['InvoiceNo'] = df['InvoiceNo'].astype('str')
df = df[~df['InvoiceNo'].str.contains('C')]

# Basket analysis
basket = (df[df['Country'] == 'France']
          .groupby(['InvoiceNo', 'Description'])['Quantity']
          .sum().unstack().reset_index().fillna(0)
          .set_index('InvoiceNo'))

def encode_units(x):
    return 1 if x >= 1 else 0

basket_sets = basket.applymap(encode_units)

# Apriori Algorithm
frequent_itemsets_apriori = apriori(basket_sets, min_support=0.07, use_colnames=True)
rules_apriori = association_rules(frequent_itemsets_apriori, metric="lift", min_threshold=1)

# FP-Growth Algorithm
frequent_itemsets_fpgrowth = fpgrowth(basket_sets, min_support=0.07, use_colnames=True)
rules_fpgrowth = association_rules(frequent_itemsets_fpgrowth, metric="lift", min_threshold=1)

# Display results
print("Frequent Itemsets using Apriori:\n", frequent_itemsets_apriori)
print("\nAssociation Rules using Apriori:\n", rules_apriori)
print("\nFrequent Itemsets using FP-Growth:\n", frequent_itemsets_fpgrowth)
print("\nAssociation Rules using FP-Growth:\n", rules_fpgrowth)


  InvoiceNo StockCode                          Description  Quantity  \
0    536365    85123A   WHITE HANGING HEART T-LIGHT HOLDER         6   
1    536365     71053                  WHITE METAL LANTERN         6   
2    536365    84406B       CREAM CUPID HEARTS COAT HANGER         8   
3    536365    84029G  KNITTED UNION FLAG HOT WATER BOTTLE         6   
4    536365    84029E       RED WOOLLY HOTTIE WHITE HEART.         6   

          InvoiceDate  UnitPrice  CustomerID         Country  
0 2010-12-01 08:26:00       2.55     17850.0  United Kingdom  
1 2010-12-01 08:26:00       3.39     17850.0  United Kingdom  
2 2010-12-01 08:26:00       2.75     17850.0  United Kingdom  
3 2010-12-01 08:26:00       3.39     17850.0  United Kingdom  
4 2010-12-01 08:26:00       3.39     17850.0  United Kingdom  
