# Contents

- [About](#About)
- [Setup](#Setup)
- [Implementation](#Implementation)
- [Implementation Frequent Pattern Tree](#Implementation-Frequent-Pattern-Tree)
- [98 Archive](#98-Archive)



# About

`Market Basket Analysis` (MBA) is a technique used to discover relationships between items within large sets of data in shopping baskets. The goal is to determine what products are frequently bought together, often to support marketing or merchandising decisions. Here's a breakdown of how to approach learning and implementing Market Basket Analysis:

1. `Concepts`  
Association Rules: Market Basket Analysis is primarily done through association rule mining, where you look for patterns where the presence of one set of items in a transaction (or basket) implies the presence of other items.
Metrics: Key metrics include:
Support: The frequency of when items appear together in transactions.
Confidence: Measures how often items B are purchased when item A is purchased.
Lift: The ratio of the observed support to that expected if A and B were independent.

2. Techniques  
`Apriori Algorithm`: This is the most common algorithm used for MBA. It reduces the number of itemsets you need to check by eliminating those that have support less than the user-specified 'minimum support threshold'.
`FP-Growth Algorithm`: A faster alternative to Apriori that uses a tree structure to store the transaction records, reducing the number of database scans.

[Back to the top](#Contents)

# Setup 

In [42]:
import pandas as pd
import numpy as np
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# IPython configuration for enhanced interactivity
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

#### Data 

In [43]:
# Seed for reproducibility
np.random.seed(42)

# Product details
products = [
    ("Apples", "Fruit", "Food"),
    ("Bananas", "Fruit", "Food"),
    ("Bread", "Bakery", "Food"),
    ("Butter", "Dairy", "Food"),
    ("Milk", "Dairy", "Food"),
    ("Eggs", "Dairy", "Food"),
    ("Cheese", "Dairy", "Food"),
    ("Chicken", "Meat", "Food"),
    ("Beef", "Meat", "Food"),
    ("Fish", "Seafood", "Food"),
    ("Shrimp", "Seafood", "Food"),
    ("Rice", "Grains", "Food"),
    ("Pasta", "Grains", "Food"),
    ("Water", "Beverages", "Drinks"),
    ("Beer", "Alcoholic Beverages", "Drinks")
]

# Number of transactions
num_transactions = 100

# Simulating data
data = []
for i in range(1, num_transactions + 1):
    num_products = np.random.randint(1, 6)  # Each transaction has 1-5 products
    products_sampled = np.random.choice([p[0] for p in products], size=num_products, replace=False)
    for product in products_sampled:
        product_details = next(p for p in products if p[0] == product)
        units = np.random.randint(1, 10)  # 1-9 units per product
        price_per_unit = np.random.uniform(1, 10)  # Price between $1 and $10
        total_price = units * price_per_unit
        data.append({
            'transaction_id': i,
            'transaction_date': pd.Timestamp('2024-01-01') + pd.to_timedelta(np.random.randint(0, 30), unit='D'),
            'product': product_details[0],
            'product_sub_category': product_details[1],
            'product_category': product_details[2],
            'units': units,
            'total_price': total_price
        })

# Convert to DataFrame
df = pd.DataFrame(data)
df.to_csv('data_Market_Basket_Analysis.csv', index=True)
df.head(10)  # Display the first 10 rows of the dataframe


Unnamed: 0,transaction_id,transaction_date,product,product_sub_category,product_category,units,total_price
0,1,2024-01-12,Apples,Fruit,Food,5,48.645943
1,1,2024-01-28,Bananas,Fruit,Food,6,6.042053
2,1,2024-01-26,Eggs,Dairy,Food,5,32.786668
3,1,2024-01-25,Butter,Dairy,Food,6,6.38158
4,2,2024-01-04,Bananas,Fruit,Food,9,14.269179
5,3,2024-01-04,Shrimp,Seafood,Food,3,27.551651
6,4,2024-01-16,Apples,Fruit,Food,4,25.524399
7,4,2024-01-28,Bread,Bakery,Food,8,49.071966
8,5,2024-01-09,Rice,Grains,Food,1,1.140728
9,6,2024-01-09,Milk,Dairy,Food,9,83.111734


Data Transformation

In [44]:
# Transform data into the format needed for MBA
basket = (df
          .groupby(['transaction_id', 'product'])['units']
          .sum().unstack().reset_index().fillna(0)
          .set_index('transaction_id'))
basket.head()
type(basket)

# Convert the units sold to 1s and 0s, to disregard the magnitude
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

basket_sets = basket.applymap(encode_units)
basket_sets.head()

product,Apples,Bananas,Beef,Beer,Bread,Butter,Cheese,Chicken,Eggs,Fish,Milk,Pasta,Rice,Shrimp,Water
transaction_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1,5.0,6.0,0.0,0.0,0.0,6.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0
4,4.0,0.0,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


pandas.core.frame.DataFrame

product,Apples,Bananas,Beef,Beer,Bread,Butter,Cheese,Chicken,Eggs,Fish,Milk,Pasta,Rice,Shrimp,Water
transaction_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1,1,1,0,0,0,1,0,0,1,0,0,0,0,0,0
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
4,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


EDA

In [45]:
basket_sets.sum(axis=0).sort_values(ascending=False)

product
Bananas    25
Water      24
Shrimp     22
Eggs       20
Beef       19
Milk       19
Beer       18
Chicken    18
Pasta      18
Apples     17
Cheese     17
Fish       17
Butter     16
Rice       15
Bread      14
dtype: int64

[Back to the top](#Contents)

# Implementation

#### Applying Apriori Algorithm
- `Support` measures how frequently an itemset appears in the dataset. For an itemset A, this is calculated as the proportion of transactions that contain A out of all transactions.

In [46]:
frequent_itemsets = apriori(basket_sets, min_support=0.001, use_colnames=True)

type(frequent_itemsets)
frequent_itemsets.head(20)



pandas.core.frame.DataFrame

Unnamed: 0,support,itemsets
0,0.17,(Apples)
1,0.25,(Bananas)
2,0.19,(Beef)
3,0.18,(Beer)
4,0.14,(Bread)
5,0.16,(Butter)
6,0.17,(Cheese)
7,0.18,(Chicken)
8,0.2,(Eggs)
9,0.17,(Fish)


#### Generating rules
`Confidence` for a rule measures how often a rule holds true. For a rule A→B, where A and B are different itemsets, confidence is defined as the proportion of transactions with A that also contain B.

Confidence(A→B)= Support(A∪B) / Support(A)

`Lift` for a rule measures how much more often A and B occur together than would be expected if they were statistically independent of each other. The lift for a rule A→B is defined as:

Lift(A→B)= Confidence(A→B) / Support(B) OR Support(A∪B) / Support(A) Support(B)

Lift = 1: A and B are independent, and there is no association between them.  
Lift > 1: A and B are positively correlated, and B is likely to be bought when A is bought, more so than would be expected by chance. This indicates a strong rule.  
Lift < 1: A and B are negatively correlated, and the presence of A in a transaction implies the absence of B, or B occurs less frequently than would be expected by chance.

`Leverage(A→B)` = Support(A∪B)−(Support(A)×Support(B))

A→B. if A (antecedent), then B (consequents).

In [47]:
# Applying Apriori to find frequent itemsets with a minimum support threshold

rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1) # 0.40, 0.70
type(rules)
# rules
rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

pandas.core.frame.DataFrame

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(Beef),(Apples),0.05,0.263158,1.547988
1,(Apples),(Beef),0.05,0.294118,1.547988
2,(Beer),(Apples),0.04,0.222222,1.307190
3,(Apples),(Beer),0.04,0.235294,1.307190
4,(Apples),(Butter),0.04,0.235294,1.470588
...,...,...,...,...,...
2513,(Water),"(Shrimp, Milk, Pasta, Beef)",0.01,0.041667,4.166667
2514,(Beef),"(Milk, Water, Shrimp, Pasta)",0.01,0.052632,5.263158
2515,(Milk),"(Shrimp, Water, Beef, Pasta)",0.01,0.052632,2.631579
2516,(Pasta),"(Shrimp, Milk, Water, Beef)",0.01,0.055556,5.555556


[Back to the top](#Contents)

# Implementation Frequent Pattern Tree

Generate Data

In [48]:
# Seed for reproducibility
np.random.seed(42)

# Product details
products = [
    ("Apples", "Fruit", "Food"),
    ("Bananas", "Fruit", "Food"),
    ("Bread", "Bakery", "Food"),
    ("Butter", "Dairy", "Food"),
    ("Milk", "Dairy", "Food"),
    ("Eggs", "Dairy", "Food"),
    ("Cheese", "Dairy", "Food"),
    ("Chicken", "Meat", "Food"),
    ("Beef", "Meat", "Food"),
    ("Fish", "Seafood", "Food"),
    ("Shrimp", "Seafood", "Food"),
    ("Rice", "Grains", "Food"),
    ("Pasta", "Grains", "Food"),
    ("Water", "Beverages", "Drinks"),
    ("Beer", "Alcoholic Beverages", "Drinks")
]

# Number of transactions
num_transactions = 100

# Simulating data
data = []
for i in range(1, num_transactions + 1):
    num_products = np.random.randint(1, 6)  # Each transaction has 1-5 products
    products_sampled = np.random.choice([p[0] for p in products], size=num_products, replace=False)
    for product in products_sampled:
        product_details = next(p for p in products if p[0] == product)
        units = np.random.randint(1, 10)  # 1-9 units per product
        price_per_unit = np.random.uniform(1, 10)  # Price between $1 and $10
        total_price = units * price_per_unit
        data.append({
            'transaction_id': i,
            'transaction_date': pd.Timestamp('2024-01-01') + pd.to_timedelta(np.random.randint(0, 30), unit='D'),
            'product': product_details[0],
            'product_sub_category': product_details[1],
            'product_category': product_details[2],
            'units': units,
            'total_price': total_price
        })

# Convert to DataFrame
df = pd.DataFrame(data)
df.to_csv('data_Market_Basket_Analysis.csv', index=True)
df.head(10)  # Display the first 10 rows of the dataframe

Unnamed: 0,transaction_id,transaction_date,product,product_sub_category,product_category,units,total_price
0,1,2024-01-12,Apples,Fruit,Food,5,48.645943
1,1,2024-01-28,Bananas,Fruit,Food,6,6.042053
2,1,2024-01-26,Eggs,Dairy,Food,5,32.786668
3,1,2024-01-25,Butter,Dairy,Food,6,6.38158
4,2,2024-01-04,Bananas,Fruit,Food,9,14.269179
5,3,2024-01-04,Shrimp,Seafood,Food,3,27.551651
6,4,2024-01-16,Apples,Fruit,Food,4,25.524399
7,4,2024-01-28,Bread,Bakery,Food,8,49.071966
8,5,2024-01-09,Rice,Grains,Food,1,1.140728
9,6,2024-01-09,Milk,Dairy,Food,9,83.111734


Process data for MBA

In [49]:
# Transform data into the format needed for MBA
basket = (df
          .groupby(['transaction_id', 'product'])['units']
          .sum().unstack().reset_index().fillna(0)
          .set_index('transaction_id'))
basket.head()
type(basket)

# Convert the units sold to 1s and 0s, to disregard the magnitude
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

basket_sets = basket.applymap(encode_units)
basket_sets.head()

product,Apples,Bananas,Beef,Beer,Bread,Butter,Cheese,Chicken,Eggs,Fish,Milk,Pasta,Rice,Shrimp,Water
transaction_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1,5.0,6.0,0.0,0.0,0.0,6.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0
4,4.0,0.0,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


pandas.core.frame.DataFrame

product,Apples,Bananas,Beef,Beer,Bread,Butter,Cheese,Chicken,Eggs,Fish,Milk,Pasta,Rice,Shrimp,Water
transaction_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1,1,1,0,0,0,1,0,0,1,0,0,0,0,0,0
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
4,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


Process data for FP MBA

In [50]:
# Using TransactionEncoder to reformat for fpgrowth
transaction_list = basket_sets.apply(lambda x: list(basket_sets.columns[x.astype(bool)]), axis=1).tolist()
transaction_list # list of lists

# Encode the data
te = TransactionEncoder()
te_ary = te.fit(transaction_list).transform(transaction_list)
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)
df_encoded

[['Apples', 'Bananas', 'Butter', 'Eggs'],
 ['Bananas'],
 ['Shrimp'],
 ['Apples', 'Bread'],
 ['Rice'],
 ['Milk'],
 ['Beer', 'Fish', 'Shrimp', 'Water'],
 ['Apples'],
 ['Bananas', 'Bread', 'Milk', 'Pasta'],
 ['Cheese', 'Milk', 'Water'],
 ['Chicken'],
 ['Bananas', 'Milk', 'Water'],
 ['Eggs', 'Milk', 'Pasta'],
 ['Cheese', 'Milk', 'Rice'],
 ['Bananas', 'Beer', 'Eggs', 'Fish', 'Shrimp'],
 ['Bananas', 'Beef', 'Chicken'],
 ['Apples', 'Milk'],
 ['Apples', 'Eggs', 'Pasta', 'Rice', 'Water'],
 ['Bananas'],
 ['Beef', 'Eggs', 'Milk'],
 ['Rice'],
 ['Bananas', 'Beef', 'Bread', 'Chicken', 'Eggs'],
 ['Water'],
 ['Beef', 'Butter', 'Chicken', 'Eggs'],
 ['Fish'],
 ['Bananas'],
 ['Bananas', 'Chicken', 'Eggs', 'Pasta', 'Water'],
 ['Beer', 'Cheese', 'Water'],
 ['Rice', 'Shrimp'],
 ['Apples', 'Bananas', 'Beef', 'Chicken', 'Shrimp'],
 ['Cheese', 'Pasta', 'Shrimp', 'Water'],
 ['Butter', 'Water'],
 ['Bananas'],
 ['Chicken', 'Eggs', 'Shrimp'],
 ['Apples', 'Beef', 'Beer', 'Fish', 'Rice'],
 ['Apples', 'Beef', 'Beer',

Unnamed: 0,Apples,Bananas,Beef,Beer,Bread,Butter,Cheese,Chicken,Eggs,Fish,Milk,Pasta,Rice,Shrimp,Water
0,True,True,False,False,False,True,False,False,True,False,False,False,False,False,False
1,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False
3,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False
96,True,False,False,False,False,True,False,True,False,False,False,False,False,False,False
97,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False
98,True,False,False,False,False,False,True,False,False,False,False,False,True,False,False


Applying FP-Growth algorithm

In [51]:
frequent_itemsets = fpgrowth(df_encoded, min_support=0.05, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.25,(Bananas)
1,0.2,(Eggs)
2,0.17,(Apples)
3,0.16,(Butter)
4,0.22,(Shrimp)
5,0.14,(Bread)
6,0.15,(Rice)
7,0.19,(Milk)
8,0.24,(Water)
9,0.18,(Beer)


Generating rules

In [52]:
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.3)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Bananas),(Eggs),0.25,0.2,0.09,0.36,1.8,0.04,1.25,0.592593
1,(Eggs),(Bananas),0.2,0.25,0.09,0.45,1.8,0.04,1.363636,0.555556
2,(Water),(Shrimp),0.24,0.22,0.08,0.333333,1.515152,0.0272,1.17,0.447368
3,(Shrimp),(Water),0.22,0.24,0.08,0.363636,1.515152,0.0272,1.194286,0.435897
4,(Bread),(Bananas),0.14,0.25,0.06,0.428571,1.714286,0.025,1.3125,0.484496
5,(Bread),(Eggs),0.14,0.2,0.05,0.357143,1.785714,0.022,1.244444,0.511628
6,(Rice),(Apples),0.15,0.17,0.05,0.333333,1.960784,0.0245,1.245,0.576471
7,(Milk),(Water),0.19,0.24,0.06,0.315789,1.315789,0.0144,1.110769,0.296296
8,(Beer),(Water),0.18,0.24,0.06,0.333333,1.388889,0.0168,1.14,0.341463
9,(Pasta),(Bananas),0.18,0.25,0.06,0.333333,1.333333,0.015,1.125,0.304878


Display frequent itemsets and rules

In [53]:
rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(Bananas),(Eggs),0.09,0.36,1.8
1,(Eggs),(Bananas),0.09,0.45,1.8
2,(Water),(Shrimp),0.08,0.333333,1.515152
3,(Shrimp),(Water),0.08,0.363636,1.515152
4,(Bread),(Bananas),0.06,0.428571,1.714286
5,(Bread),(Eggs),0.05,0.357143,1.785714
6,(Rice),(Apples),0.05,0.333333,1.960784
7,(Milk),(Water),0.06,0.315789,1.315789
8,(Beer),(Water),0.06,0.333333,1.388889
9,(Pasta),(Bananas),0.06,0.333333,1.333333


[Back to the top](#Contents)

# 98 Archive

`Applications`

1. Product Placement and Store Layout
2. Cross-Selling and Up-Selling
3. Targeted Promotions
4. Inventory Management
5. Pricing Strategies
6. Loyalty Programs
7. New Product Development
8. Seasonal Adjustments
9. Online Recommendations
10. Strategic Supplier Relationships

[Back to the top](#Contents)