# Market Basket Analysis in Python

Course Description
 
What do Amazon product recommendations and Netflix movie suggestions have in common? They both rely on Market Basket Analysis, which is a powerful tool for translating vast amounts of customer transaction and viewing data into simple rules for product promotion and recommendation. In this course, you’ll learn how to perform Market Basket Analysis using the Apriori algorithm, standard and custom metrics, association rules, aggregation and pruning, and visualization. You’ll then reinforce your new skills through interactive exercises, building recommendations for a small grocery store, a library, an e-book seller, a novelty gift retailer, and a movie streaming service. In the process, you’ll uncover hidden insights to improve recommendations for customers.


## 1. ntroduction to Market Basket Analysis


In [1]:
# Installing libraries
!pip install mlxtend
!pip install pandas
!pip install numpy

Collecting mlxtend
  Downloading mlxtend-0.22.0-py2.py3-none-any.whl (1.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m48.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: mlxtend
Successfully installed mlxtend-0.22.0
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m[33m
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m[33m
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m

### Loading the data

In [2]:
import pandas as pd 
 
# Load transactions from pandas. 
books = pd.read_csv("bookstore_transactions.csv") 
groceries=  pd.read_csv("online_retail.csv") 
 
# Print the header 
print(books.head(2)) 

        Transaction
0  History,Bookmark
1  History,Bookmark


### Building transactions

In [3]:
# Split transaction strings into lists. 
transactions = books['Transaction'].apply(lambda t: t.split(',')) 

# Convert DataFrame into list of strings. 
transactions = list(transactions) 

### Counting the itemsets

In [4]:
# Print the first transaction
print(transactions[0]) 

['History', 'Bookmark']


### Generating rules with itertools

In [5]:
from itertools import permutations 

# Extract unique items. 
flattened = [item for transaction in transactions for item in transaction] 
items = list(set(flattened)) 

# Compute and print rules. 
rules = list(permutations(items, 2)) 
print(rules) 

[('History', 'Biography'), ('History', 'Fiction'), ('History', 'Poetry'), ('History', 'Bookmark'), ('Biography', 'History'), ('Biography', 'Fiction'), ('Biography', 'Poetry'), ('Biography', 'Bookmark'), ('Fiction', 'History'), ('Fiction', 'Biography'), ('Fiction', 'Poetry'), ('Fiction', 'Bookmark'), ('Poetry', 'History'), ('Poetry', 'Biography'), ('Poetry', 'Fiction'), ('Poetry', 'Bookmark'), ('Bookmark', 'History'), ('Bookmark', 'Biography'), ('Bookmark', 'Fiction'), ('Bookmark', 'Poetry')]


### Counting the rules

In [6]:
# Print the number of rules 
print(len(rules)) 

20


### Looking ahead

In [7]:
# Import TransactionEncoder
from mlxtend.preprocessing import TransactionEncoder 

#Fit TransactionEncoder to transactions
encoder = TransactionEncoder().fit(transactions) 

# One-hot encode itemsets by applying fit and transform 
onehot = encoder.transform(transactions)

# Convert one-hot encoded data to DataFrame 
onehot = pd.DataFrame(onehot, columns = encoder.columns_)

# Import the association rules function 
from mlxtend.frequent_patterns import association_rules 
from mlxtend.frequent_patterns import apriori 
 
# Compute frequent itemsets using the Apriori algorithm 
frequent_itemsets = apriori(onehot, min_support = 0.001,  
                            max_len = 2, use_colnames = True) 
 
# Compute all association rules for frequent_itemsets 
rules = association_rules(frequent_itemsets,  
                            metric = "lift",  
                             min_threshold = 1.0) 

### Computing support for single items

In [8]:
support= onehot.mean()
print(support)

Biography    0.404040
Bookmark     1.000000
Fiction      0.252525
History      0.252525
Poetry       0.090909
dtype: float64


### Computing support for multiple items

In [9]:
import numpy as np 
 
# Define itemset that contains fiction and poetry 
onehot['Fiction+Poetry'] = np.logical_and(onehot['Fiction'],onehot['Poetry']).mean()
onehot['Biography+History'] = np.logical_and(onehot['Biography'],onehot['History']).mean()
onehot['Biography+Bookmark'] = np.logical_and(onehot['Biography'],onehot['Bookmark']).mean() 
print(onehot.mean())

Biography             0.404040
Bookmark              1.000000
Fiction               0.252525
History               0.252525
Poetry                0.090909
Fiction+Poetry        0.000000
Biography+History     0.000000
Biography+Bookmark    0.404040
dtype: float64


## 2. Association Rules

### Computing confidence and lift

In [10]:
# Compute and print confidence and lift. 
confidence = onehot['Biography+Bookmark'].mean() / onehot['Biography'].mean()
lift = onehot['Biography+Bookmark'].mean()/(onehot['Biography'].mean()*onehot['Bookmark'].mean()) 

# Print results. 
print(onehot['Bookmark'].mean(), confidence, lift)

1.0 0.9999999999999999 0.9999999999999999


### Computing leverage

In [11]:
# Compute support for Biography & Bookmark
onehot['Biography+Bookmark'] = np.logical_and(onehot['Biography'],onehot['Bookmark']).mean() 

# Compute support for Biography
onehot['Biography'] = onehot['Biography'].mean()

# Compute support for Bookmark
onehot['Bookmark']= onehot['Bookmark'].mean()

# Compute and print leverage
leverage = onehot['Biography+Bookmark'] - onehot['Biography'] * onehot['Bookmark']
print(leverage)

0     0.0
1     0.0
2     0.0
3     0.0
4     0.0
     ... 
94    0.0
95    0.0
96    0.0
97    0.0
98    0.0
Length: 99, dtype: float64


### Computing conviction

In [12]:
# Compute support for Biography+Bookmark and Biography
onehot['Biography+Bookmark'] = np.logical_and(onehot['Biography'],onehot['Bookmark']).mean() 
onehot['Biography'] = onehot['Biography'].mean()

# Compute support for NOT Bookmark
onehot['notBookmark']= 1 - onehot['Bookmark']

# Compute support for Biography and NOT Bookmark
onehot['Biography+notBookmark']= onehot['Biography'].mean() - onehot['Biography+Bookmark']

# Compute conviction for Biography and Bookmark
conviction = onehot['Biography'] * onehot['notBookmark'] / onehot['Biography+notBookmark']
print(conviction)

0    -0.0
1    -0.0
2    -0.0
3    -0.0
4    -0.0
     ... 
94   -0.0
95   -0.0
96   -0.0
97   -0.0
98   -0.0
Length: 99, dtype: float64


### 

### Computing Zhang's Metric

![Picture title](image-20230710-132128.png)

### Alternative Expression for Zhang's Metric

![Picture title](image-20230710-132307.png)

In [13]:
# Compute the support of Biography and Bookmark individually
onehot['Biography'] = onehot['Biography'].mean()
onehot['Bookmark'] = onehot['Bookmark'].mean()

# Compute the support of both Biography and Bookmark
onehot['Biography+Bookmark'] = np.logical_and(onehot['Biography'],onehot['Bookmark']).mean() 

# Compute the numerator 
numerator = onehot['Biography+Bookmark'] - onehot['Biography'] *  onehot['Bookmark'] 

# Compute the denominator 
denominator = np.maximum(onehot['Biography+Bookmark'] * (1 -onehot['Biography']),onehot['Biography']*(onehot['Bookmark']-onehot['Biography+Bookmark']))

# Compute Zhang's metric 
Zhang = numerator/denominator
print(Zhang)


0     1.0
1     1.0
2     1.0
3     1.0
4     1.0
     ... 
94    1.0
95    1.0
96    1.0
97    1.0
98    1.0
Length: 99, dtype: float64


## Aggregation(Pending) and Pruning

In [14]:
# Set minimum antecedent support to 0.35
rules = rules[rules['antecedent support'] > 0.35]

# Set maximum consequent support to 0.35
rules = rules[rules['consequent support'] < 0.35]

# Print the remaining rules
print(rules)


  antecedents consequents  antecedent support  consequent support   support  \
3  (Bookmark)   (Fiction)                 1.0            0.252525  0.252525   
5  (Bookmark)   (History)                 1.0            0.252525  0.252525   
7  (Bookmark)    (Poetry)                 1.0            0.090909  0.090909   

   confidence  lift  leverage  conviction  zhangs_metric  
3    0.252525   1.0       0.0         1.0            0.0  
5    0.252525   1.0       0.0         1.0            0.0  
7    0.090909   1.0       0.0         1.0            0.0  


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=69f4f169-8014-4ed3-8b68-7cae4197e215' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>