# Associations rules

The discovery of associations is the search for links between several events. When we study the events that take place consecutively to one or more other events (analysis in time) we speak of a sequence.

## Objectives

- Link different products and better understand the cross-selling behaviors of customers.
- Quantify the existence of links between several products.
- Analyze the path of the customers in a store, on a website, etc
- Product highlighting or removal of a product

## Background

- The **support** index: measures the frequency of appearance of A and B on the same ticket (number of tickets with A and B/total number of tickets)
- The **confidence** index: probability of B appearing on tickets with A (number of tickets with A and B/number of tickets with A).
- The lever or **lift**: the relative weight of this association given the natural frequency of appearance of B.

## Steps
1. We will load sample transaction data
2. We are going to apply a Associations ML model in order to find the relevant association rules
3. The most important rules are saved in order to make them persistent

## Steps
1. We will load some transaction data
2. We are going to apply a Associations ML model in order to find the relevant association rules
3. The most important rules are saved in order to make them persistant

![](../images/associations.jpg)


In [None]:
# Installation of mlxtend for association rules
# Documentation : http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/
!pip install mlxtend wheel pandas

In [2]:
import sys
import datetime
import pandas as pd
print("You are running Python", sys.version)

You are running Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) 
[GCC 7.3.0]


In [3]:
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [4]:
import azureml.core
print("You are using Azure ML", azureml.core.VERSION)

You are using Azure ML 1.22.0


In [6]:
import logging
import os
import random

from matplotlib import pyplot as plt
from matplotlib.pyplot import imshow
import numpy as np
import pandas as pd
from sklearn import datasets

## 1. Data Access


In [9]:
transaction_file = "https://git.davewentzel.com/demos/MLOps-E2E/-/raw/master/Lab900/association_transactions.csv"


dfTransactions = pd.read_csv (transaction_file,sep=",",header=0)

In [10]:
dfTransactions.head(10)

Unnamed: 0,OrderID,Date,ClientID,ProductID,Description,Quantity,Price
0,536370,1/12/2010 8:45,12583.0,22728,ALARM CLOCK BAKELIKE PINK,24,3.75
1,536370,1/12/2010 8:45,12583.0,22727,ALARM CLOCK BAKELIKE RED,24,3.75
2,536370,1/12/2010 8:45,12583.0,22726,ALARM CLOCK BAKELIKE GREEN,12,3.75
3,536370,1/12/2010 8:45,12583.0,21724,PANDA AND BUNNIES STICKER SHEET,12,0.85
4,536370,1/12/2010 8:45,12583.0,21883,STARS GIFT TAPE,24,0.65
5,536370,1/12/2010 8:45,12583.0,10002,INFLATABLE POLITICAL GLOBE,48,0.85
6,536370,1/12/2010 8:45,12583.0,21791,VINTAGE HEADS AND TAILS CARD GAME,24,1.25
7,536370,1/12/2010 8:45,12583.0,21035,SET/2 RED RETROSPOT TEA TOWELS,18,2.95
8,536370,1/12/2010 8:45,12583.0,22326,ROUND SNACK BOXES SET OF4 WOODLAND,24,2.95
9,536370,1/12/2010 8:45,12583.0,22629,SPACEBOY LUNCH BOX,24,1.95


In [11]:
dfTransactions.shape

(8556, 7)

## 2. Data Engineering

In [12]:
dfTransactions['Description'] = dfTransactions['Description'].str.strip()
dfTransactions.dropna(axis=0, subset=['OrderID'], inplace=True)
dfTransactions['OrderID'] = dfTransactions['OrderID'].astype('str')
#dfTransactions = dfTransactions[~dfTransactions['OrderID'].str.contains('C')]

In [16]:
basket = (dfTransactions
          .groupby(['OrderID', 'Description'])
          ['Quantity'].sum()
          .unstack()   #this is basically a pivot of the Product
          .reset_index()
          .fillna(0)
          .set_index('OrderID'))

In [17]:
basket.head(10)

Description,10 COLOUR SPACEBOY PEN,12 COLOURED PARTY BALLOONS,12 EGG HOUSE PAINTED WOOD,12 MESSAGE CARDS WITH ENVELOPES,12 PENCIL SMALL TUBE WOODLAND,12 PENCILS SMALL TUBE RED RETROSPOT,12 PENCILS SMALL TUBE SKULL,12 PENCILS TALL TUBE POSY,12 PENCILS TALL TUBE RED RETROSPOT,12 PENCILS TALL TUBE WOODLAND,...,WRAP VINTAGE PETALS DESIGN,YELLOW COAT RACK PARIS FASHION,YELLOW GIANT GARDEN THERMOMETER,YELLOW SHARK HELICOPTER,ZINC STAR T-LIGHT HOLDER,ZINC FOLKART SLEIGH BELLS,ZINC HERB GARDEN CONTAINER,ZINC METAL HEART DECORATION,ZINC T-LIGHT HOLDER STAR LARGE,ZINC T-LIGHT HOLDER STARS SMALL
OrderID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
536370,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
536852,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
536974,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
537065,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
537463,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
537468,24.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
537693,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
537897,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
537967,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
538008,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
# I don't want the sum, just the binary relationship
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

basket_sets = basket.applymap(encode_units)
basket_sets.drop('POSTAGE', inplace=True, axis=1)

In [19]:
basket_sets.head(10)

Description,10 COLOUR SPACEBOY PEN,12 COLOURED PARTY BALLOONS,12 EGG HOUSE PAINTED WOOD,12 MESSAGE CARDS WITH ENVELOPES,12 PENCIL SMALL TUBE WOODLAND,12 PENCILS SMALL TUBE RED RETROSPOT,12 PENCILS SMALL TUBE SKULL,12 PENCILS TALL TUBE POSY,12 PENCILS TALL TUBE RED RETROSPOT,12 PENCILS TALL TUBE WOODLAND,...,WRAP VINTAGE PETALS DESIGN,YELLOW COAT RACK PARIS FASHION,YELLOW GIANT GARDEN THERMOMETER,YELLOW SHARK HELICOPTER,ZINC STAR T-LIGHT HOLDER,ZINC FOLKART SLEIGH BELLS,ZINC HERB GARDEN CONTAINER,ZINC METAL HEART DECORATION,ZINC T-LIGHT HOLDER STAR LARGE,ZINC T-LIGHT HOLDER STARS SMALL
OrderID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
536370,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
536852,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
536974,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
537065,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
537463,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
537468,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
537693,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
537897,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
537967,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
538008,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## 3. Associations rules

In [20]:
%%time
frequent_itemsets = apriori(basket_sets, min_support=0.07, use_colnames=True)

CPU times: user 23.1 ms, sys: 296 µs, total: 23.4 ms
Wall time: 35.5 ms


In [21]:
frequent_itemsets.head(20)

Unnamed: 0,support,itemsets
0,0.08243,(ALARM CLOCK BAKELIKE GREEN)
1,0.086768,(ALARM CLOCK BAKELIKE PINK)
2,0.08026,(ALARM CLOCK BAKELIKE RED)
3,0.084599,(DOLLY GIRL LUNCH BOX)
4,0.08243,(JUMBO BAG RED RETROSPOT)
5,0.106291,(LUNCH BAG APPLE DESIGN)
6,0.071584,(LUNCH BAG DOLLY GIRL DESIGN)
7,0.130152,(LUNCH BAG RED RETROSPOT)
8,0.101952,(LUNCH BAG SPACEBOY DESIGN)
9,0.099783,(LUNCH BAG WOODLAND)


In [22]:
frequent_itemsets.sort_values(by = 'support',  ascending = False)

Unnamed: 0,support,itemsets
16,0.160521,(RABBIT NIGHT LIGHT)
19,0.154013,(RED TOADSTOOL LED NIGHT LIGHT)
15,0.145336,(PLASTERS IN TIN WOODLAND ANIMALS)
13,0.143167,(PLASTERS IN TIN CIRCUS PARADE)
23,0.13449,(ROUND SNACK BOXES SET OF4 WOODLAND)
7,0.130152,(LUNCH BAG RED RETROSPOT)
10,0.121475,(LUNCH BOX WITH CUTLERY RETROSPOT)
25,0.117137,(SET/6 RED SPOTTY PAPER CUPS)
14,0.117137,(PLASTERS IN TIN SPACEBOY)
18,0.117137,(RED RETROSPOT MINI CASES)


In [23]:
%%time
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

CPU times: user 3.9 ms, sys: 3.62 ms, total: 7.52 ms
Wall time: 6.43 ms


In [24]:
rules.head(20)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(PLASTERS IN TIN SPACEBOY),(PLASTERS IN TIN CIRCUS PARADE),0.117137,0.143167,0.075922,0.648148,4.527217,0.059152,2.435209
1,(PLASTERS IN TIN CIRCUS PARADE),(PLASTERS IN TIN SPACEBOY),0.143167,0.117137,0.075922,0.530303,4.527217,0.059152,1.879645
2,(PLASTERS IN TIN CIRCUS PARADE),(PLASTERS IN TIN WOODLAND ANIMALS),0.143167,0.145336,0.086768,0.606061,4.170059,0.065961,2.169531
3,(PLASTERS IN TIN WOODLAND ANIMALS),(PLASTERS IN TIN CIRCUS PARADE),0.145336,0.143167,0.086768,0.597015,4.170059,0.065961,2.126215
4,(PLASTERS IN TIN SPACEBOY),(PLASTERS IN TIN WOODLAND ANIMALS),0.117137,0.145336,0.088937,0.759259,5.224157,0.071913,3.550142
5,(PLASTERS IN TIN WOODLAND ANIMALS),(PLASTERS IN TIN SPACEBOY),0.145336,0.117137,0.088937,0.61194,5.224157,0.071913,2.275071
6,(SET/6 RED SPOTTY PAPER CUPS),(SET/20 RED RETROSPOT PAPER NAPKINS),0.117137,0.112798,0.086768,0.740741,6.566952,0.073555,3.422064
7,(SET/20 RED RETROSPOT PAPER NAPKINS),(SET/6 RED SPOTTY PAPER CUPS),0.112798,0.117137,0.086768,0.769231,6.566952,0.073555,3.825741
8,(SET/6 RED SPOTTY PAPER PLATES),(SET/20 RED RETROSPOT PAPER NAPKINS),0.10846,0.112798,0.086768,0.8,7.092308,0.074534,4.436009
9,(SET/20 RED RETROSPOT PAPER NAPKINS),(SET/6 RED SPOTTY PAPER PLATES),0.112798,0.10846,0.086768,0.769231,7.092308,0.074534,3.863341


In [25]:
rules.describe(include='all')

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
count,18,18,18.0,18.0,18.0,18.0,18.0,18.0,18.0
unique,9,9,,,,,,,
top,(SET/6 RED SPOTTY PAPER PLATES),(SET/6 RED SPOTTY PAPER CUPS),,,,,,,
freq,3,3,,,,,,,
mean,,,0.116896,0.116896,0.087009,0.760863,6.699165,0.073223,8.156028
std,,,0.017558,0.017558,0.007171,0.131357,1.674692,0.008337,10.937579
min,,,0.086768,0.086768,0.075922,0.530303,4.170059,0.059152,1.879645
25%,,,0.10846,0.10846,0.084599,0.666667,5.224157,0.071913,2.648316
50%,,,0.114967,0.114967,0.086768,0.764245,7.092308,0.073555,3.704628
75%,,,0.117137,0.117137,0.086768,0.809375,8.195556,0.074534,4.657809


In [26]:
rules.sort_values(by = 'lift',  ascending = False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
16,(SET/6 RED SPOTTY PAPER PLATES),"(SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO...",0.10846,0.086768,0.084599,0.78,8.9895,0.075188,4.151055
13,"(SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO...",(SET/6 RED SPOTTY PAPER PLATES),0.086768,0.10846,0.084599,0.975,8.9895,0.075188,35.661605
15,(SET/6 RED SPOTTY PAPER CUPS),"(SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RET...",0.117137,0.086768,0.084599,0.722222,8.323611,0.074435,3.287636
14,"(SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RET...",(SET/6 RED SPOTTY PAPER CUPS),0.086768,0.117137,0.084599,0.975,8.323611,0.074435,35.314534
11,(SET/6 RED SPOTTY PAPER PLATES),(SET/6 RED SPOTTY PAPER CUPS),0.10846,0.117137,0.104121,0.96,8.195556,0.091417,22.071584
10,(SET/6 RED SPOTTY PAPER CUPS),(SET/6 RED SPOTTY PAPER PLATES),0.117137,0.10846,0.104121,0.888889,8.195556,0.091417,8.023861
17,(SET/20 RED RETROSPOT PAPER NAPKINS),"(SET/6 RED SPOTTY PAPER CUPS, SET/6 RED SPOTTY...",0.112798,0.104121,0.084599,0.75,7.203125,0.072854,3.583514
12,"(SET/6 RED SPOTTY PAPER CUPS, SET/6 RED SPOTTY...",(SET/20 RED RETROSPOT PAPER NAPKINS),0.104121,0.112798,0.084599,0.8125,7.203125,0.072854,4.731743
9,(SET/20 RED RETROSPOT PAPER NAPKINS),(SET/6 RED SPOTTY PAPER PLATES),0.112798,0.10846,0.086768,0.769231,7.092308,0.074534,3.863341
8,(SET/6 RED SPOTTY PAPER PLATES),(SET/20 RED RETROSPOT PAPER NAPKINS),0.10846,0.112798,0.086768,0.8,7.092308,0.074534,4.436009


In [27]:
rules[rules['antecedents'] == {'PLASTERS IN TIN CIRCUS PARADE'}]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
1,(PLASTERS IN TIN CIRCUS PARADE),(PLASTERS IN TIN SPACEBOY),0.143167,0.117137,0.075922,0.530303,4.527217,0.059152,1.879645
2,(PLASTERS IN TIN CIRCUS PARADE),(PLASTERS IN TIN WOODLAND ANIMALS),0.143167,0.145336,0.086768,0.606061,4.170059,0.065961,2.169531


In [None]:
rules["antecedent_len"] = rules["antecedents"].apply(lambda x: len(x))
rules

## 4. Let's identify the most relevant association rules

In [28]:
%%time
rules[ (rules['lift'] >= 1) &
       (rules['confidence'] >= 0.6) ]

CPU times: user 2.29 ms, sys: 235 µs, total: 2.53 ms
Wall time: 2.36 ms


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(PLASTERS IN TIN SPACEBOY),(PLASTERS IN TIN CIRCUS PARADE),0.117137,0.143167,0.075922,0.648148,4.527217,0.059152,2.435209
2,(PLASTERS IN TIN CIRCUS PARADE),(PLASTERS IN TIN WOODLAND ANIMALS),0.143167,0.145336,0.086768,0.606061,4.170059,0.065961,2.169531
4,(PLASTERS IN TIN SPACEBOY),(PLASTERS IN TIN WOODLAND ANIMALS),0.117137,0.145336,0.088937,0.759259,5.224157,0.071913,3.550142
5,(PLASTERS IN TIN WOODLAND ANIMALS),(PLASTERS IN TIN SPACEBOY),0.145336,0.117137,0.088937,0.61194,5.224157,0.071913,2.275071
6,(SET/6 RED SPOTTY PAPER CUPS),(SET/20 RED RETROSPOT PAPER NAPKINS),0.117137,0.112798,0.086768,0.740741,6.566952,0.073555,3.422064
7,(SET/20 RED RETROSPOT PAPER NAPKINS),(SET/6 RED SPOTTY PAPER CUPS),0.112798,0.117137,0.086768,0.769231,6.566952,0.073555,3.825741
8,(SET/6 RED SPOTTY PAPER PLATES),(SET/20 RED RETROSPOT PAPER NAPKINS),0.10846,0.112798,0.086768,0.8,7.092308,0.074534,4.436009
9,(SET/20 RED RETROSPOT PAPER NAPKINS),(SET/6 RED SPOTTY PAPER PLATES),0.112798,0.10846,0.086768,0.769231,7.092308,0.074534,3.863341
10,(SET/6 RED SPOTTY PAPER CUPS),(SET/6 RED SPOTTY PAPER PLATES),0.117137,0.10846,0.104121,0.888889,8.195556,0.091417,8.023861
11,(SET/6 RED SPOTTY PAPER PLATES),(SET/6 RED SPOTTY PAPER CUPS),0.10846,0.117137,0.104121,0.96,8.195556,0.091417,22.071584


### Let's take all the rules with a lift > 2 and a confidence > 0.7

In [29]:
mylift = 2
myconfidence = 0.7

In [30]:
rules[ (rules['lift'] >= mylift) &
       (rules['confidence'] >= myconfidence) ]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
4,(PLASTERS IN TIN SPACEBOY),(PLASTERS IN TIN WOODLAND ANIMALS),0.117137,0.145336,0.088937,0.759259,5.224157,0.071913,3.550142
6,(SET/6 RED SPOTTY PAPER CUPS),(SET/20 RED RETROSPOT PAPER NAPKINS),0.117137,0.112798,0.086768,0.740741,6.566952,0.073555,3.422064
7,(SET/20 RED RETROSPOT PAPER NAPKINS),(SET/6 RED SPOTTY PAPER CUPS),0.112798,0.117137,0.086768,0.769231,6.566952,0.073555,3.825741
8,(SET/6 RED SPOTTY PAPER PLATES),(SET/20 RED RETROSPOT PAPER NAPKINS),0.10846,0.112798,0.086768,0.8,7.092308,0.074534,4.436009
9,(SET/20 RED RETROSPOT PAPER NAPKINS),(SET/6 RED SPOTTY PAPER PLATES),0.112798,0.10846,0.086768,0.769231,7.092308,0.074534,3.863341
10,(SET/6 RED SPOTTY PAPER CUPS),(SET/6 RED SPOTTY PAPER PLATES),0.117137,0.10846,0.104121,0.888889,8.195556,0.091417,8.023861
11,(SET/6 RED SPOTTY PAPER PLATES),(SET/6 RED SPOTTY PAPER CUPS),0.10846,0.117137,0.104121,0.96,8.195556,0.091417,22.071584
12,"(SET/6 RED SPOTTY PAPER CUPS, SET/6 RED SPOTTY...",(SET/20 RED RETROSPOT PAPER NAPKINS),0.104121,0.112798,0.084599,0.8125,7.203125,0.072854,4.731743
13,"(SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO...",(SET/6 RED SPOTTY PAPER PLATES),0.086768,0.10846,0.084599,0.975,8.9895,0.075188,35.661605
14,"(SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RET...",(SET/6 RED SPOTTY PAPER CUPS),0.086768,0.117137,0.084599,0.975,8.323611,0.074435,35.314534


## 5. Exportation of rules

In [None]:
dfrules=rules[ (rules['lift'] >= mylift) &
       (rules['confidence'] >= myconfidence) ]

dfrules.to_csv(r'myassociationrules.csv')

In [None]:
%ls myassociationrules.csv -l

> The rules are saved into a local CSV file.

In [None]:
 run.complete()

> End of notebook