# Practice Session 04: Basket analysis

<font size="+2" color="blue">Additional results: experiments on cross-department association rules</font>

Author: <font color="blue">Luca Franceschi</font>

E-mail: <font color="blue">luca.franceschi01@estudiant.upf.edu</font>

Date: <font color="blue">16/10/2024</font>

In [59]:
import numpy as np  
import matplotlib.pyplot as plt
import pandas as pd  
import csv
import gzip
                     
from apyori import apriori
from itertools import combinations

# 1. Playing with apyori

In [2]:
# LEAVE AS-IS

def print_apyori_output (association_results, info=False, info_key=False):
    for relation_record in association_results:
        itemset = list(relation_record.items)
        
        # Consider only itemsets of two elements
        if len(itemset) > 1: 
        
            print("Rules involving itemset %s" % itemset)
            support = relation_record.support

            for rules in relation_record.ordered_statistics:
                antecedent = list(rules.items_base)
                consequent = list(rules.items_add)
                
                if info_key:
                    antecedent = [info.loc[x][info_key] for x in antecedent]
                    consequent = [info.loc[x][info_key] for x in consequent]
                
                confidence = rules.confidence
                lift = rules.lift

                print("%s => %s (support=%.4f, confidence=%.2f, lift=%.2f)" %
                      (antecedent, consequent, support, confidence, lift))
            print()

<font size="+1" color="red">Replace this cell with your own example of transactions (at least 20 transactions). Execute the apriori algorithm, in which you should obtain at least <strong>two</strong> rules of the form ['A', 'B'] => ['C'], i.e., at least two rules having a 2-itemset in the antecedent and a 1-itemset in the consequent. Modify the transactions until you obtain such rules.</font>

In [3]:
transactions = [
    ['Margarita', 'Pepperoni', 'Veggie Supreme', 'Spicy Sausage'],
    ['BBQ Chicken', 'Meat Lovers'],
    ['Pesto', 'Four-Cheese', 'Sicilian', 'Pepperoni'],
    ['White Pizza', 'Buffalo Chicken', 'Hawaiian', 'Pesto', 'BBQ Chicken'],
    ['Sicilian', 'Veggie Supreme', 'Pepperoni'],
    ['Margarita', 'Spicy Sausage', 'Pesto', 'Buffalo Chicken'],
    ['Meat Lovers', 'White Pizza', 'BBQ Chicken', 'Pepperoni'],
    ['Sicilian', 'Four-Cheese', 'White Pizza', 'Pepperoni', 'Margarita'],
    ['Hawaiian', 'Spicy Sausage', 'Four-Cheese'],
    ['BBQ Chicken', 'Sicilian', 'Pepperoni'],
    ['Veggie Supreme', 'Meat Lovers', 'Four-Cheese'],
    ['White Pizza', 'Four-Cheese'],
    ['Pepperoni', 'BBQ Chicken', 'White Pizza', 'Hawaiian'],
    ['Margarita', 'Veggie Supreme', 'Buffalo Chicken', 'Pesto'],
    ['Spicy Sausage', 'Meat Lovers', 'Four-Cheese'],
    ['Veggie Supreme', 'Sicilian', 'Buffalo Chicken'],
    ['Pesto', 'Hawaiian'],
    ['Four-Cheese', 'Buffalo Chicken', 'Sicilian', 'White Pizza'],
    ['Pepperoni', 'Hawaiian'],
    ['Spicy Sausage', 'Sicilian', 'Veggie Supreme', 'Meat Lovers'],
]

results = list(apriori(transactions, min_support=0.1, min_confidence=0.9, min_lift=1.0))
print_apyori_output(results)

Rules involving itemset ['Hawaiian', 'BBQ Chicken', 'White Pizza']
['Hawaiian', 'BBQ Chicken'] => ['White Pizza'] (support=0.1000, confidence=1.00, lift=3.33)
['Hawaiian', 'White Pizza'] => ['BBQ Chicken'] (support=0.1000, confidence=1.00, lift=4.00)

Rules involving itemset ['Pesto', 'Margarita', 'Buffalo Chicken']
['Margarita', 'Buffalo Chicken'] => ['Pesto'] (support=0.1000, confidence=1.00, lift=4.00)
['Pesto', 'Margarita'] => ['Buffalo Chicken'] (support=0.1000, confidence=1.00, lift=4.00)

Rules involving itemset ['Pepperoni', 'Sicilian', 'Four-Cheese']
['Pepperoni', 'Four-Cheese'] => ['Sicilian'] (support=0.1000, confidence=1.00, lift=2.86)

Rules involving itemset ['Sicilian', 'White Pizza', 'Four-Cheese']
['Sicilian', 'White Pizza'] => ['Four-Cheese'] (support=0.1000, confidence=1.00, lift=2.86)



<font size="+1" color="red">Replace this cell with a markdown cell containing (1) a printout of the rules you have obtained, and (2) for each of those rules, indicate clearly how the support, confidence, and lift is calculated. Do not merely repeat the formula: indicate how each number is computed based on the transactions you provided, as if you were trying to verify that the numbers are correct.</font>

We first have to construct the table containing the 1-itemsets along with their count and support (count / len(transactions)).

| Pizza Type          | Count | Support |
| ------------------- | ----- | ------- |
| {Margarita}         | 4     | 0.2     |
| {Pepperoni}         | 8     | 0.4     |
| {Veggie Supreme}    | 6     | 0.3     |
| {Spicy Sausage}     | 5     | 0.25    |
| {BBQ Chicken}       | 5     | 0.25    |
| {Meat Lovers}       | 5     | 0.25    |
| {Pesto}             | 5     | 0.25    |
| {Four-Cheese}       | 7     | 0.35    |
| {Sicilian}          | 7     | 0.35    |
| {White Pizza}       | 6     | 0.3     |
| {Buffalo Chicken}   | 5     | 0.25    |
| {Hawaiian}          | 5     | 0.25    |

Now we have to discard the ones that have less than 0.1 support (none). After that, construct the same table but with 2-itemsets in a similar way (note that there are no entries with count=0, I skipped them):

I am starting to believe that this exercise was supposed to have like 3 elements but I chose 12... It is getting out of hand... In my defense I will say that for 20 transactions only 3 elements seemed too little. The little support of 0.1 does not help, too.

| Pizza Type                        | Count | Support |
| --------------------------------- | ----- | ------- |
| {Margarita, Pepperoni}            | 2     | 0.10    |
| {Margarita, Veggie Supreme}       | 2     | 0.10    |
| {Margarita, Spicy Sausage}        | 2     | 0.10    |
| {Pepperoni, Veggie Supreme}       | 2     | 0.10    |
| {Pepperoni, Spicy Sausage}        | 1     | 0.05    |
| {Veggie Supreme, Spicy Sausage}   | 2     | 0.10    |
| {BBQ Chicken, Meat Lovers}        | 2     | 0.10    |
| {Pepperoni, Pesto}                | 1     | 0.05    |
| {Pepperoni, Four-Cheese}          | 2     | 0.10    |
| {Pepperoni, Sicilian}             | 4     | 0.20    |
| {Pesto, Four-Cheese}              | 1     | 0.05    |
| {Pesto, Sicilian}                 | 1     | 0.05    |
| {Four-Cheese, Sicilian}           | 3     | 0.15    |
| {BBQ Chicken, Pesto}              | 1     | 0.05    |
| {BBQ Chicken, White Pizza}        | 3     | 0.15    |
| {BBQ Chicken, Buffalo Chicken}    | 1     | 0.05    |
| {BBQ Chicken, Hawaiian}           | 2     | 0.10    |
| {Pesto, White Pizza}              | 1     | 0.05    |
| {Pesto, Buffalo Chicken}          | 3     | 0.15    |
| {Pesto, Hawaiian}                 | 2     | 0.10    |
| {White Pizza, Buffalo Chicken}    | 2     | 0.10    |
| {White Pizza, Hawaiian}           | 2     | 0.10    |
| {Buffalo Chicken, Hawaiian}       | 1     | 0.05    |
| {Veggie Supreme, Sicilian}        | 3     | 0.15    |
| {Margarita, Pesto}                | 2     | 0.10    |
| {Margarita, Buffalo Chicken}      | 2     | 0.10    |
| {Spicy Sausage, Pesto}            | 1     | 0.05    |
| {Spicy Sausage, Buffalo Chicken}  | 1     | 0.05    |
| {Pepperoni, BBQ Chicken}          | 3     | 0.15    |
| {Pepperoni, Meat Lovers}          | 1     | 0.05    |
| {Pepperoni, White Pizza}          | 3     | 0.15    |
| {Meat Lovers, White Pizza}        | 1     | 0.05    |
| {Margarita, Four-Cheese}          | 1     | 0.05    |
| {Margarita, Sicilian}             | 1     | 0.05    |
| {Margarita, White Pizza}          | 1     | 0.05    |
| {Four-Cheese, White Pizza}        | 3     | 0.15    |
| {Sicilian, White Pizza}           | 2     | 0.10    |
| {Spicy Sausage, Four-Cheese}      | 2     | 0.10    |
| {Spicy Sausage, Hawaiian}         | 1     | 0.05    |
| {Four-Cheese, Hawaiian}           | 1     | 0.05    |
| {BBQ Chicken, Sicilian}           | 1     | 0.05    |
| {Veggie Supreme, Meat Lovers}     | 2     | 0.10    |
| {Veggie Supreme, Four-Cheese}     | 1     | 0.05    |
| {Meat Lovers, Four-Cheese}        | 2     | 0.10    |
| {Pepperoni, Hawaiian}             | 2     | 0.10    |
| {Veggie Supreme, Pesto}           | 1     | 0.05    |
| {Veggie Supreme, Buffalo Chicken} | 2     | 0.10    |
| {Spicy Sausage, Meat Lovers}      | 2     | 0.10    |
| {Sicilian, Buffalo Chicken}       | 2     | 0.10    |
| {Four-Cheese, Buffalo Chicken}    | 1     | 0.05    |
| {Spicy Sausage, Sicilian}         | 1     | 0.05    |
| {Meat Lovers, Sicilian}           | 1     | 0.05    |

Similarly, now we have to discard the ones that have less than 0.1 support, below the simplified table.

| Pizza Type                        | Count | Support |
| --------------------------------- | ----- | ------- |
| {Margarita, Pepperoni}            | 2     | 0.10    |
| {Margarita, Veggie Supreme}       | 2     | 0.10    |
| {Margarita, Spicy Sausage}        | 2     | 0.10    |
| {Pepperoni, Veggie Supreme}       | 2     | 0.10    |
| {Veggie Supreme, Spicy Sausage}   | 2     | 0.10    |
| {BBQ Chicken, Meat Lovers}        | 2     | 0.10    |
| {Pepperoni, Four-Cheese}          | 2     | 0.10    |
| {Pepperoni, Sicilian}             | 4     | 0.20    |
| {Four-Cheese, Sicilian}           | 3     | 0.15    |
| {BBQ Chicken, White Pizza}        | 3     | 0.15    |
| {BBQ Chicken, Hawaiian}           | 2     | 0.10    |
| {Pesto, Buffalo Chicken}          | 3     | 0.15    |
| {Pesto, Hawaiian}                 | 2     | 0.10    |
| {White Pizza, Buffalo Chicken}    | 2     | 0.10    |
| {White Pizza, Hawaiian}           | 2     | 0.10    |
| {Veggie Supreme, Sicilian}        | 3     | 0.15    |
| {Margarita, Pesto}                | 2     | 0.10    |
| {Margarita, Buffalo Chicken}      | 2     | 0.10    |
| {Pepperoni, BBQ Chicken}          | 3     | 0.15    |
| {Pepperoni, White Pizza}          | 3     | 0.15    |
| {Four-Cheese, White Pizza}        | 3     | 0.15    |
| {Sicilian, White Pizza}           | 2     | 0.10    |
| {Spicy Sausage, Four-Cheese}      | 2     | 0.10    |
| {Veggie Supreme, Meat Lovers}     | 2     | 0.10    |
| {Meat Lovers, Four-Cheese}        | 2     | 0.10    |
| {Pepperoni, Hawaiian}             | 2     | 0.10    |
| {Veggie Supreme, Buffalo Chicken} | 2     | 0.10    |
| {Spicy Sausage, Meat Lovers}      | 2     | 0.10    |
| {Sicilian, Buffalo Chicken}       | 2     | 0.10    |

Build a table and simplify again,  for 3-itemsets

| Pizza Type                            | Count | Support |
| ------------------------------------- | ----- | ------- |
| {Pepperoni, Four-Cheese, Sicilian}    | 2     | 0.10    |
| {BBQ Chicken, White Pizza, Hawaiian}  | 2     | 0.10    |
| {Margarita, Pesto, Buffalo Chicken}   | 2     | 0.10    |
| {Pepperoni, BBQ Chicken, White Pizza} | 2     | 0.10    |
| {Four-Cheese, Sicilian, White Pizza}  | 2     | 0.10    |

Now calculate the confidence and lift for all the rules remaining in the simplified 3-itemsets. The confidence is the 3-itemset support divided by the antecedent support, and the lift is the 3-itemset support divided by the multiplication of the antecedent and consequent support.

| Pizza Type                               | 3-itemset Support | Antecedent support | Consequent support | Confidence  | Lift |
| ---------------------------------------- | ----------------- | ------------------ | ------------------ | ----------- | ---- |
| {Pepperoni, Four-Cheese} → {Sicilian}    | 0.10              | 0.10               | 0.35               | 1.00        | 2.86 |
| {Pepperoni, Sicilian} → {Four-Cheese}    | 0.10              | 0.20               | 0.35               | 0.50        | 1.43 |
| {Four-Cheese, Sicilian} → {Pepperoni}    | 0.10              | 0.15               | 0.40               | 0.67        | 1.67 |
| {BBQ Chicken, White Pizza} → {Hawaiian}  | 0.10              | 0.15               | 0.25               | 0.67        | 2.67 |
| {BBQ Chicken, Hawaiian} → {White Pizza}  | 0.10              | 0.10               | 0.30               | 1.00        | 3.33 |
| {White Pizza, Hawaiian} → {BBQ Chicken}  | 0.10              | 0.10               | 0.25               | 1.00        | 4.00 |
| {Margarita, Pesto} → {Buffalo Chicken}   | 0.10              | 0.10               | 0.25               | 1.00        | 4.00 |
| {Margarita, Buffalo Chicken} → {Pesto}   | 0.10              | 0.10               | 0.25               | 1.00        | 4.00 |
| {Pesto, Buffalo Chicken} → {Margarita}   | 0.10              | 0.15               | 0.20               | 0.67        | 3.33 |
| {Pepperoni, BBQ Chicken} → {White Pizza} | 0.10              | 0.15               | 0.30               | 0.67        | 2.22 |
| {Pepperoni, White Pizza} → {BBQ Chicken} | 0.10              | 0.15               | 0.25               | 0.67        | 2.67 |
| {BBQ Chicken, White Pizza} → {Pepperoni} | 0.10              | 0.15               | 0.40               | 0.67        | 1.67 |
| {Four-Cheese, Sicilian} → {White Pizza}  | 0.10              | 0.15               | 0.30               | 0.67        | 2.22 |
| {Four-Cheese, White Pizza} → {Sicilian}  | 0.10              | 0.15               | 0.35               | 0.67        | 1.90 |
| {Sicilian, White Pizza} → {Four-Cheese}  | 0.10              | 0.10               | 0.35               | 1.00        | 2.86 |

We can now simplify the table removing all the entries that have confidence smaller than 0.9 or lift smaller than 1.0:

| Pizza Type                               | 3-itemset Support | Antecedent support | Consequent support | Confidence  | Lift |
| ---------------------------------------- | ----------------- | ------------------ | ------------------ | ----------- | ---- |
| {Pepperoni, Four-Cheese} → {Sicilian}    | 0.10              | 0.10               | 0.35               | 1.00        | 2.86 |
| {BBQ Chicken, Hawaiian} → {White Pizza}  | 0.10              | 0.10               | 0.30               | 1.00        | 3.33 |
| {White Pizza, Hawaiian} → {BBQ Chicken}  | 0.10              | 0.10               | 0.25               | 1.00        | 4.00 |
| {Margarita, Pesto} → {Buffalo Chicken}   | 0.10              | 0.10               | 0.25               | 1.00        | 4.00 |
| {Margarita, Buffalo Chicken} → {Pesto}   | 0.10              | 0.10               | 0.25               | 1.00        | 4.00 |
| {Sicilian, White Pizza} → {Four-Cheese}  | 0.10              | 0.10               | 0.35               | 1.00        | 2.86 |

# 2. Load and prepare the shopping baskets

In [4]:
# LEAVE AS-IS

# File names
INPUT_PRODUCTS = "instacart-products.csv"
INPUT_TRANSACTIONS = "instacart-transactions.csv.gz"

# Read into a dataframe
products = pd.read_csv(INPUT_PRODUCTS, delimiter=",")

# Set product_id as index, and drop column aisle_id
products = products.set_index('product_id').drop(columns=['aisle_id'])

products.head(100)

Unnamed: 0_level_0,product_name,department_id
product_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Chocolate Sandwich Cookies,19
2,All-Seasons Salt,13
3,Robust Golden Unsweetened Oolong Tea,7
4,Smart Ones Classic Favorites Mini Rigatoni Wit...,1
5,Green Chile Anytime Sauce,13
...,...,...
96,Sprinklez Confetti Fun Organic Toppings,13
97,Organic Chamomile Lemon Tea,7
98,2% Yellow American Cheese,16
99,Local Living Butter Lettuce,4


## 2.1. Select by department

In [5]:
# LEAVE AS-IS

DEPT_BAKERY = 3
DEPT_VEGGIES = 4
DEPT_ALCOHOL = 5
DEPT_WORLD = 6
DEPT_DRINKS = 7
DEPT_PETS = 8
DEPT_PHARMACY = 11
DEPT_CLEANING = 17
DEPT_BABIES = 18

<font size="+1" color="red">Replace this cell with your code for *select_from_departments*.</font>

In [6]:
def select_from_departments(df:pd.DataFrame, prod_IDs:list, dep_IDs:list=None):
    prod_IDs = np.array(prod_IDs) - 1
    df2 = df.iloc[prod_IDs]
    if dep_IDs != None:
        return df2[df2.department_id.isin(dep_IDs)]
    return df2

<font size="+1" color="red">Replace this cell with code to test your function with three different test cases. Each test case is a list of items and a list of 1, 2, or 3 departments.</font>

In [7]:
def show_output(df:pd.DataFrame):
    for i, row in df.iterrows():
        print(f'(id={i:4}) {row.product_name} (dept {row.department_id})')

def test_select(df:pd.DataFrame, prod_IDs:list, dep_IDs:list):
    print(f'Test case: \nProducts: {prod_IDs}\nDepartments:{dep_IDs}')
    
    print(f'\nInput products:')
    input_prods = select_from_departments(df, prod_IDs)
    show_output(input_prods)
    
    print(f'\nOutput products:')
    output_prods = select_from_departments(df, prod_IDs, dep_IDs)
    show_output(output_prods)

In [8]:
tests = [
    [[22, 26, 45, 54, 57, 71, 111, 112], [DEPT_BAKERY, DEPT_CLEANING]],
    [[2158, 5474, 6632, 5828, 4794, 7129, 3125, 1685], [DEPT_VEGGIES]],
    [[786, 7049, 6068, 4458, 1150, 902, 7349, 2028], [DEPT_ALCOHOL, DEPT_DRINKS, DEPT_BABIES]]
]

for t in tests:
    print('======================================================')
    test_select(products, t[0], t[1])

Test case: 
Products: [22, 26, 45, 54, 57, 71, 111, 112]
Departments:[3, 17]

Input products:
(id=  22) Fresh Breath Oral Rinse Mild Mint (dept 11)
(id=  26) Fancy Feast Trout Feast Flaked Wet Cat Food (dept 8)
(id=  45) European Cucumber (dept 4)
(id=  54) 24/7 Performance Cat Litter (dept 8)
(id=  57) Flat Toothpicks (dept 17)
(id=  71) Ultra 7 Inch Polypropylene Traditional Plates (dept 17)
(id= 111) Fabric Softener, Geranium Scent (dept 17)
(id= 112) Hot Tomatillo Salsa (dept 13)

Output products:
(id=  57) Flat Toothpicks (dept 17)
(id=  71) Ultra 7 Inch Polypropylene Traditional Plates (dept 17)
(id= 111) Fabric Softener, Geranium Scent (dept 17)
Test case: 
Products: [2158, 5474, 6632, 5828, 4794, 7129, 3125, 1685]
Departments:[4]

Input products:
(id=2158) #2 Cone White Coffee Filters (dept 7)
(id=5474) Crackers, Puffed, Lightly Salted Corn (dept 19)
(id=6632) Brown Rice Salmon Avocado Roll (dept 20)
(id=5828) Powder Fresh Roll-On Antiperspirant Deodorant (dept 11)
(id=4794) Pu

## 2.2. Read and filter transactions

<font size="+1" color="red">Replace this cell with your code to read transactions, keeping only items in DEPT_PHARMACY. Remember to stop after storing 5000 of the transactions read.</font>

In [9]:
def extract_transactions(filename, dept_IDs):
    transactions = []

    # Open a compressed file
    with gzip.open(filename, "rt") as inputfile:
        
        # Create a CSV reader
        reader = csv.reader(inputfile, delimiter=",")
        
        # Iterate through the CSV file
        for row in reader:
            
            # Convert to integers
            items = [int(x) for x in row]
            
            index = select_from_departments(products, items, dept_IDs).index.to_list()
            if len(index) != 0:
                transactions.append(index)

            if len(transactions)>5000:
                break
    
    return transactions

In [10]:
transactions = extract_transactions(INPUT_TRANSACTIONS, [DEPT_PHARMACY])

## 2.3. Extract association rules and comment on them

<font size="+1" color="red">Replace this cell with your code to extract association rules from the read transactions.</font>

In [11]:
results = list(apriori(transactions, min_support=0.0002, min_confidence=0.9, min_lift=1.0))
print_apyori_output(results, products, 'product_name')

Rules involving itemset [5584, 4898]
['Vitamin C 250 mg 60 Gummies'] => ['Vitamin D3 Gummies, 1000 IU, Great Wild Berry Taste!'] (support=0.0004, confidence=1.00, lift=2500.50)
['Vitamin D3 Gummies, 1000 IU, Great Wild Berry Taste!'] => ['Vitamin C 250 mg 60 Gummies'] (support=0.0004, confidence=1.00, lift=2500.50)

Rules involving itemset [23425, 5019]
['Nourish & Moisturize Shampoo'] => ['Nourish+ Moisturize Conditioner'] (support=0.0006, confidence=1.00, lift=1250.25)

Rules involving itemset [11007, 5663]
['Chocolate Energy Supplement'] => ['Chocolate Calming Supplement'] (support=0.0006, confidence=1.00, lift=1250.25)

Rules involving itemset [10979, 6876]
['Sheer Blonde Highlight Activating Conditioner'] => ['Sheer Blonde Highlight Activating Brightening Shampoo'] (support=0.0004, confidence=1.00, lift=1250.25)

Rules involving itemset [13899, 9951]
['Outlast Long Lasting Mint Mouthwash'] => ['Mint Glide Floss Picks'] (support=0.0004, confidence=1.00, lift=555.67)

Rules involvin

<font size="+1" color="red">Replace this cell with a brief commentary on what you would recommend to the shopping app considering the extracted association rules.</font>

I would recommend to the shopping app to promote, for instance: `Nourish+ Moisturize Conditioner` to the users that buy `Nourish & Moisturize Shampoo`, or `Chocolate Calming Supplement` to the users that buy both `Chocolate Sleep Supplement` and `Chocolate Energy Supplement`. Similarly I would do the same with all the extracted association rules (10 in total at `min_support=0.0002`, `min_confidence=0.9`, `min_lift=1.0`)

## 2.4. Extract association rules and comment on them (other departments)

<font size="+1" color="red">Replace this cell with code to select a different set of departments (at least two, not DEPT_PHARMACY) and extract transactions again. Avoid replicating code when possible.</font>

In [76]:
transactions = extract_transactions(INPUT_TRANSACTIONS, [DEPT_VEGGIES, DEPT_BAKERY])
results = list(apriori(transactions, min_support=0.0006, min_confidence=0.9, min_lift=1.0))
print_apyori_output(results, products, 'product_name')

Rules involving itemset [47626, 38988]
['White English Muffins'] => ['Large Lemon'] (support=0.0008, confidence=1.00, lift=17.36)

Rules involving itemset [1529, 24852, 47766]
['Parsley, Italian (Flat), New England Grown', 'Organic Avocado'] => ['Banana'] (support=0.0008, confidence=1.00, lift=5.24)

Rules involving itemset [21137, 40706, 5077]
['Organic Grape Tomatoes', '100% Whole Wheat Bread'] => ['Organic Strawberries'] (support=0.0008, confidence=1.00, lift=9.54)

Rules involving itemset [26209, 5450, 31717]
['Small Hass Avocado', 'Organic Cilantro'] => ['Limes'] (support=0.0008, confidence=1.00, lift=21.19)

Rules involving itemset [13176, 5876, 26940]
['Organic Lemon', 'Organic Large Green Asparagus'] => ['Bag of Organic Bananas'] (support=0.0008, confidence=1.00, lift=6.62)

Rules involving itemset [47209, 5876, 41220]
['Organic Lemon', 'Organic Romaine Lettuce'] => ['Organic Hass Avocado'] (support=0.0008, confidence=1.00, lift=11.77)

Rules involving itemset [13176, 31915, 81

<font size="+1" color="red">Replace this cell with your commentary on the obtained rules.</font>

In this extracted rules, we can see that there are many that are extracted exclusively from the VEGGIE department or the BAKERY department, however there are a few, such as:

`['White English Muffins'] => ['Large Lemon']`\
`['Organic Grape Tomatoes', '100% Whole Wheat Bread'] => ['Organic Strawberries']`

that are cross-department. The recommendations that are cross-department, such as the stated above might be a good idea to recommend in some cases, such as this VEGGIE/BAKERY case. However I would not recommend any application to recommend alcoholic beverages to users that buy diapers for their kids (would not recommend risky recommendations such as BABIES/ALCOHOL cross-department recommendations, if there were any).

# Extra exercise

For more learning and extra points, copy the function `print_apyori_output` to `print_apyori_output_diff_dept` and modify it to filter the obtained association rules so that you print only the ones involving products in different departments.

To be precise, this means rules in which there is at least a product in the *consequence* that belongs to a department that none of the products in the *antecedent* belongs to. Experiment with different combinations of departments, and try to discover interesting groups of products in different departments that are related to each other.

In [73]:
depts = {
    3: 'DEPT_BAKERY',
    4: 'DEPT_VEGGIES',
    5: 'DEPT_ALCOHOL',
    6: 'DEPT_WORLD',
    7: 'DEPT_DRINKS',
    8: 'DEPT_PETS',
    11: 'DEPT_PHARMACY',
    17: 'DEPT_CLEANING',
    18: 'DEPT_BABIES'
}

def print_apyori_output_diff_dept(association_results, info:pd.DataFrame=False, info_key=False, dept_dict=None):
    for relation_record in association_results:
        itemset = list(relation_record.items)
        
        # Check all parameters are set
        if info_key and info_key and dept_dict:
            # Check if there is more than one department_id for each itemset
            depts = info.loc[itemset].department_id.unique()
            if len(itemset) > 1 and len(depts) > 1:
                print("Rules involving itemset %s" % itemset)
                print(f'The following itemset contains products from: {[dept_dict[k] for k in depts]}')
            
                support = relation_record.support

                for rules in relation_record.ordered_statistics:
                    antecedent = list(rules.items_base)
                    consequent = list(rules.items_add)
                    
                    antecedent = [info.loc[x][info_key] for x in antecedent]
                    consequent = [info.loc[x][info_key] for x in consequent]
                    
                    confidence = rules.confidence
                    lift = rules.lift

                    print("%s => %s (support=%.4f, confidence=%.2f, lift=%.2f)" %
                        (antecedent, consequent, support, confidence, lift))
                print()

In [74]:
for dept_IDs in combinations(depts.keys(), 2):
    transactions = extract_transactions(INPUT_TRANSACTIONS, dept_IDs)
    results = list(apriori(transactions, min_support=0.0006, min_confidence=0.9, min_lift=1.0))
    print_apyori_output_diff_dept(results, products, 'product_name', depts)

Rules involving itemset [47626, 38988]
The following itemset contains products from: ['DEPT_VEGGIES', 'DEPT_BAKERY']
['White English Muffins'] => ['Large Lemon'] (support=0.0008, confidence=1.00, lift=17.36)

Rules involving itemset [21137, 40706, 5077]
The following itemset contains products from: ['DEPT_VEGGIES', 'DEPT_BAKERY']
['Organic Grape Tomatoes', '100% Whole Wheat Bread'] => ['Organic Strawberries'] (support=0.0008, confidence=1.00, lift=9.54)

Rules involving itemset [13176, 48628, 44359]
The following itemset contains products from: ['DEPT_VEGGIES', 'DEPT_BAKERY']
['Organic Whole Wheat Bread', 'Organic Small Bunch Celery'] => ['Bag of Organic Bananas'] (support=0.0008, confidence=1.00, lift=6.62)

Rules involving itemset [13176, 21137, 5077, 21903]
The following itemset contains products from: ['DEPT_VEGGIES', 'DEPT_BAKERY']
['Organic Strawberries', '100% Whole Wheat Bread', 'Organic Baby Spinach'] => ['Bag of Organic Bananas'] (support=0.0010, confidence=1.00, lift=6.62)



Interestingly, there are a ton of correlations between the BABIES and the DRINKS department (in comparison to other cross-department recommendations), apparently if you have a baby you automatically become a tea lover.

I also feel quite personally targeted with the `['Banana', 'Cold Brew Coffee'] => ['Organic Hass Avocado']` rule.

<font size="+2" color="#003300">I hereby declare that, except for the code provided by the course instructors, all of my code, report, and figures were produced by myself.</font>