# Association Rules and Lift Analysis

## Part I: Research Question

Can we identify key associations of customer purchases through market basket analysis?

We will answer this using market basket analysis.

The goal is to perform a market basket analysis to analyze customer data to identify key associations of your customer purchases, ultimately allowing better business and strategic decision-making.

## Part II: Method Justification

Market Basket Analysis (MBA) is a data mining technique that allows us to analyze what customers buy, how & why they buy it, and what they buy together.

The underlying assumption in market basket analysis is that joint occurrence of two or more products in most baskets imply that these products are complements in purchase, therefore, purchase of one will lead to purchase of others.

One transaction in the dataset is:
Apple Lightning to Digital AV Adapter,TP-Link AC1750 Smart WiFi Router,Apple Pencil
We can already make an assumption based on the first purchase that they have an apple device, so seeing the Apple Pencil being purchased at the end correlates with the assumption thay they have an apple device and multiple apple products would be purchased together.

## Part III: Data Preparation

In [53]:
# Importing packages to be used
import numpy as np
# numpy for data analysis functionality
import pandas as pd
# pandas for dataframes
from mlxtend.frequent_patterns import apriori
from apyori import apriori
# Apriori Algorithm
from mlxtend.frequent_patterns import association_rules
# Import ElbowVisualizer
from yellowbrick.cluster import KElbowVisualizer
import matplotlib.pyplot as plt
%matplotlib inline
# matplotlib for plotting
import seaborn as sns
# seaborn for extra plotting functionality

In [54]:
# Importing dataset
df = pd.read_csv('Churn Data\churn_clean2.csv')

In [55]:
# Showing data
df

Unnamed: 0,Item01,Item02,Item03,Item04,Item05,Item06,Item07,Item08,Item09,Item10,Item11,Item12,Item13,Item14,Item15,Item16,Item17,Item18,Item19,Item20
0,,,,,,,,,,,,,,,,,,,,
1,Logitech M510 Wireless mouse,HP 63 Ink,HP 65 ink,nonda USB C to USB Adapter,10ft iPHone Charger Cable,HP 902XL ink,Creative Pebble 2.0 Speakers,Cleaning Gel Universal Dust Cleaner,Micro Center 32GB Memory card,YUNSONG 3pack 6ft Nylon Lightning Cable,TopMate C5 Laptop Cooler pad,Apple USB-C Charger cable,HyperX Cloud Stinger Headset,TONOR USB Gaming Microphone,Dust-Off Compressed Gas 2 pack,3A USB Type C Cable 3 pack 6FT,HOVAMP iPhone charger,SanDisk Ultra 128GB card,FEEL2NICE 5 pack 10ft Lighning cable,FEIYOLD Blue light Blocking Glasses
2,,,,,,,,,,,,,,,,,,,,
3,Apple Lightning to Digital AV Adapter,TP-Link AC1750 Smart WiFi Router,Apple Pencil,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14997,Falcon Dust Off Compressed Gas,,,,,,,,,,,,,,,,,,,
14998,,,,,,,,,,,,,,,,,,,,
14999,HP 63XL Ink,Apple USB-C Charger cable,,,,,,,,,,,,,,,,,,
15000,,,,,,,,,,,,,,,,,,,,


In [56]:
# Checking for null values
df.isna().any()

Item01    True
Item02    True
Item03    True
Item04    True
Item05    True
Item06    True
Item07    True
Item08    True
Item09    True
Item10    True
Item11    True
Item12    True
Item13    True
Item14    True
Item15    True
Item16    True
Item17    True
Item18    True
Item19    True
Item20    True
dtype: bool

True, meaning each column has null values

In [57]:
# Converting null to 0
newdf = df.fillna(0)

In [58]:
# Checking for null values
newdf.isna().any()

Item01    False
Item02    False
Item03    False
Item04    False
Item05    False
Item06    False
Item07    False
Item08    False
Item09    False
Item10    False
Item11    False
Item12    False
Item13    False
Item14    False
Item15    False
Item16    False
Item17    False
Item18    False
Item19    False
Item20    False
dtype: bool

Now false, meaning there are no longer any nulls

In [59]:
# export file to csv
newdf.to_csv('cleaned_data3.csv', index = False)

## Part IV: Analysis

In [60]:
# Converting dataframe into lists
transactions = []
for i in range(0, 15001):
    transactions.append([str(newdf.values[i,j]) for j in range(0, 20)])

In [61]:
# Running apriori algorithm
rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)

In [62]:
# Turning results from apriori into a list
result = list(rules)
result

[RelationRecord(items=frozenset({'Dust-Off Compressed Gas 2 pack', '10ft iPHone Charger Cable 2 Pack'}), support=0.011532564495700287, ordered_statistics=[OrderedStatistic(items_base=frozenset({'10ft iPHone Charger Cable 2 Pack'}), items_add=frozenset({'Dust-Off Compressed Gas 2 pack'}), confidence=0.45646437994722955, lift=3.829654453908496)]),
 RelationRecord(items=frozenset({'10ft iPHone Charger Cable 2 Pack', 'Screen Mom Screen Cleaner kit'}), support=0.0075994933671088595, ordered_statistics=[OrderedStatistic(items_base=frozenset({'10ft iPHone Charger Cable 2 Pack'}), items_add=frozenset({'Screen Mom Screen Cleaner kit'}), confidence=0.3007915567282322, lift=4.642154467572234)]),
 RelationRecord(items=frozenset({'VIVO Dual LCD Monitor Desk mount', '10ft iPHone Charger Cable 2 Pack'}), support=0.007132857809479368, ordered_statistics=[OrderedStatistic(items_base=frozenset({'10ft iPHone Charger Cable 2 Pack'}), items_add=frozenset({'VIVO Dual LCD Monitor Desk mount'}), confidence=0.

In [63]:
# Turning list into a new dataframe
df = pd.DataFrame(columns=('Items','Antecedent','Consequent','Support','Confidence','Lift'))

Support =[]
Confidence = []
Lift = []
Items = []
Antecedent = []
Consequent=[]

for RelationRecord in result:
    for ordered_stat in RelationRecord.ordered_statistics:
        Support.append(RelationRecord.support)
        Items.append(RelationRecord.items)
        Antecedent.append(ordered_stat.items_base)
        Consequent.append(ordered_stat.items_add)
        Confidence.append(ordered_stat.confidence)
        Lift.append(ordered_stat.lift)

df['Items'] = list(map(set, Items))                                   
df['Antecedent'] = list(map(set, Antecedent))
df['Consequent'] = list(map(set, Consequent))
df['Support'] = Support
df['Confidence'] = Confidence
df['Lift']= Lift

In [64]:
# Showing new dataframe
df

Unnamed: 0,Items,Antecedent,Consequent,Support,Confidence,Lift
0,"{Dust-Off Compressed Gas 2 pack, 10ft iPHone C...",{10ft iPHone Charger Cable 2 Pack},{Dust-Off Compressed Gas 2 pack},0.011533,0.456464,3.829654
1,"{10ft iPHone Charger Cable 2 Pack, Screen Mom ...",{10ft iPHone Charger Cable 2 Pack},{Screen Mom Screen Cleaner kit},0.007599,0.300792,4.642154
2,"{VIVO Dual LCD Monitor Desk mount, 10ft iPHone...",{10ft iPHone Charger Cable 2 Pack},{VIVO Dual LCD Monitor Desk mount},0.007133,0.282322,3.242811
3,"{3A USB Type C Cable 3 pack 6FT, Dust-Off Comp...",{3A USB Type C Cable 3 pack 6FT},{Dust-Off Compressed Gas 2 pack},0.008533,0.401254,3.366449
4,"{HP 61 ink, 3A USB Type C Cable 3 pack 6FT}",{3A USB Type C Cable 3 pack 6FT},{HP 61 ink},0.005333,0.250784,3.061030
...,...,...,...,...,...,...
827,"{SanDisk Ultra 64GB card, Stylus Pen for iPad,...","{Stylus Pen for iPad, VIVO Dual LCD Monitor De...",{SanDisk Ultra 64GB card},0.003200,0.253968,5.169305
828,"{USB 2.0 Printer cable, VIVO Dual LCD Monitor ...","{USB 2.0 Printer cable, Screen Mom Screen Clea...","{VIVO Dual LCD Monitor Desk mount, 0}",0.003200,0.269663,3.097407
829,"{USB 2.0 Printer cable, VIVO Dual LCD Monitor ...","{USB 2.0 Printer cable, VIVO Dual LCD Monitor ...","{Screen Mom Screen Cleaner kit, 0}",0.003200,0.231884,3.578696
830,"{USB 2.0 Printer cable, VIVO Dual LCD Monitor ...","{USB 2.0 Printer cable, Screen Mom Screen Clea...",{VIVO Dual LCD Monitor Desk mount},0.003200,0.269663,3.097407


Now we have a table of every transaction, the antecedent and consequent, and the values for support, confidence, and lift.

In [65]:
# Showing the top rules by Support
df.sort_values(by ='Support', ascending = False, inplace = True)
df.head(10)

Unnamed: 0,Items,Antecedent,Consequent,Support,Confidence,Lift
38,"{Dust-Off Compressed Gas 2 pack, Screen Mom Sc...",{Screen Mom Screen Cleaner kit},{Dust-Off Compressed Gas 2 pack},0.023998,0.37037,3.107341
37,"{Dust-Off Compressed Gas 2 pack, Screen Mom Sc...",{Dust-Off Compressed Gas 2 pack},{Screen Mom Screen Cleaner kit},0.023998,0.201342,3.107341
162,"{Screen Mom Screen Cleaner kit, Dust-Off Compr...","{Screen Mom Screen Cleaner kit, 0}",{Dust-Off Compressed Gas 2 pack},0.023998,0.37037,3.107341
161,"{Screen Mom Screen Cleaner kit, Dust-Off Compr...","{Dust-Off Compressed Gas 2 pack, 0}",{Screen Mom Screen Cleaner kit},0.023998,0.201455,3.10908
160,"{Screen Mom Screen Cleaner kit, Dust-Off Compr...",{Screen Mom Screen Cleaner kit},"{Dust-Off Compressed Gas 2 pack, 0}",0.023998,0.37037,3.10908
159,"{Screen Mom Screen Cleaner kit, Dust-Off Compr...",{Dust-Off Compressed Gas 2 pack},"{Screen Mom Screen Cleaner kit, 0}",0.023998,0.201342,3.107341
36,"{SanDisk Ultra 64GB card, Dust-Off Compressed ...",{SanDisk Ultra 64GB card},{Dust-Off Compressed Gas 2 pack},0.020465,0.416554,3.49481
157,"{SanDisk Ultra 64GB card, Dust-Off Compressed ...",{SanDisk Ultra 64GB card},"{Dust-Off Compressed Gas 2 pack, 0}",0.020465,0.416554,3.496766
158,"{SanDisk Ultra 64GB card, Dust-Off Compressed ...","{SanDisk Ultra 64GB card, 0}",{Dust-Off Compressed Gas 2 pack},0.020465,0.416554,3.49481
237,"{SanDisk Ultra 64GB card, VIVO Dual LCD Monito...",{SanDisk Ultra 64GB card},"{VIVO Dual LCD Monitor Desk mount, 0}",0.019599,0.398915,4.582019


In [66]:
# Showing the top rules by Confidence
df.sort_values(by ='Confidence', ascending = False, inplace = True)
df.head(10)

Unnamed: 0,Items,Antecedent,Consequent,Support,Confidence,Lift
328,"{Nylon Braided Lightning to USB cable, FEIYOLD...","{Nylon Braided Lightning to USB cable, FEIYOLD...",{Dust-Off Compressed Gas 2 pack},0.003266,0.576471,4.836485
589,"{Nylon Braided Lightning to USB cable, FEIYOLD...","{Nylon Braided Lightning to USB cable, FEIYOLD...",{Dust-Off Compressed Gas 2 pack},0.003266,0.576471,4.836485
587,"{Nylon Braided Lightning to USB cable, FEIYOLD...","{Nylon Braided Lightning to USB cable, FEIYOLD...","{Dust-Off Compressed Gas 2 pack, 0}",0.003266,0.576471,4.839192
453,"{Screen Mom Screen Cleaner kit, Dust-Off Compr...","{Screen Mom Screen Cleaner kit, 10ft iPHone Ch...",{Dust-Off Compressed Gas 2 pack},0.004266,0.561404,4.710075
260,"{Dust-Off Compressed Gas 2 pack, 10ft iPHone C...","{10ft iPHone Charger Cable 2 Pack, Screen Mom ...",{Dust-Off Compressed Gas 2 pack},0.004266,0.561404,4.710075
451,"{Screen Mom Screen Cleaner kit, Dust-Off Compr...","{10ft iPHone Charger Cable 2 Pack, Screen Mom ...","{Dust-Off Compressed Gas 2 pack, 0}",0.004266,0.561404,4.712711
288,"{Dust-Off Compressed Gas 2 pack, Apple Pencil,...","{Apple Pencil, Premium Nylon USB Cable}",{Dust-Off Compressed Gas 2 pack},0.0032,0.545455,4.576266
508,"{Premium Nylon USB Cable, Dust-Off Compressed ...","{Apple Pencil, Premium Nylon USB Cable}","{Dust-Off Compressed Gas 2 pack, 0}",0.0032,0.545455,4.578827
510,"{Premium Nylon USB Cable, Dust-Off Compressed ...","{Premium Nylon USB Cable, Apple Pencil, 0}",{Dust-Off Compressed Gas 2 pack},0.0032,0.545455,4.576266
368,"{Nylon Braided Lightning to USB cable, SanDisk...","{Nylon Braided Lightning to USB cable, SanDisk...",{Dust-Off Compressed Gas 2 pack},0.0046,0.543307,4.558249


In [67]:
# Showing the top rules by Lift
df.sort_values(by ='Lift', ascending = False, inplace = True)
df.head(10)

Unnamed: 0,Items,Antecedent,Consequent,Support,Confidence,Lift
103,"{FEIYOLD Blue light Blocking Glasses, Anker 2-...",{Anker 2-in-1 USB Card Reader},"{FEIYOLD Blue light Blocking Glasses, 0}",0.004,0.271493,8.260993
104,"{FEIYOLD Blue light Blocking Glasses, Anker 2-...","{Anker 2-in-1 USB Card Reader, 0}",{FEIYOLD Blue light Blocking Glasses},0.004,0.271493,8.244271
8,"{FEIYOLD Blue light Blocking Glasses, Anker 2-...",{Anker 2-in-1 USB Card Reader},{FEIYOLD Blue light Blocking Glasses},0.004,0.271493,8.244271
817,"{SanDisk Ultra 64GB card, SanDisk 128GB Ultra ...","{SanDisk 128GB Ultra microSDXC card, VIVO Dual...",{SanDisk Ultra 64GB card},0.0032,0.393443,8.008186
815,"{SanDisk Ultra 64GB card, SanDisk 128GB Ultra ...","{VIVO Dual LCD Monitor Desk mount, SanDisk 128...","{SanDisk Ultra 64GB card, 0}",0.0032,0.393443,8.008186
442,"{SanDisk Ultra 64GB card, VIVO Dual LCD Monito...","{VIVO Dual LCD Monitor Desk mount, SanDisk 128...",{SanDisk Ultra 64GB card},0.0032,0.393443,8.008186
694,"{SanDisk Ultra 64GB card, SanDisk 128GB Ultra ...","{SanDisk 128GB Ultra microSDXC card, Dust-Off ...",{SanDisk Ultra 64GB card},0.003333,0.390625,7.950835
380,"{SanDisk Ultra 64GB card, Dust-Off Compressed ...","{Dust-Off Compressed Gas 2 pack, SanDisk 128GB...",{SanDisk Ultra 64GB card},0.003333,0.390625,7.950835
692,"{SanDisk Ultra 64GB card, SanDisk 128GB Ultra ...","{Dust-Off Compressed Gas 2 pack, SanDisk 128GB...","{SanDisk Ultra 64GB card, 0}",0.003333,0.390625,7.950835
472,"{Nylon Braided Lightning to USB cable, Anker U...","{Nylon Braided Lightning to USB cable, VIVO Du...","{Anker USB C to HDMI Adapter, 0}",0.003333,0.239234,6.995626


The top 3 rules are:

Support: 	{Screen Mom Screen Cleaner kit, Dust-Off Compr...	{Dust-Off Compressed Gas 2 pack}	{Screen Mom Screen Cleaner kit, 0}	0.023998	0.201342	3.107341

Confidence: {Nylon Braided Lightning to USB cable, FEIYOLD...	{Nylon Braided Lightning to USB cable, FEIYOLD...	{Dust-Off Compressed Gas 2 pack, 0}	0.003266	0.576471	4.839192

Lift: {FEIYOLD Blue light Blocking Glasses, Anker 2-...	{Anker 2-in-1 USB Card Reader}	{FEIYOLD Blue light Blocking Glasses, 0}	0.004000	0.271493	8.260993

## Part V: Data Summary and Implications

Support is a measure says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears. For example, our top support is 0.023998, or appearing in about 2.4% of all transactions.

Confidence is a measure that says how likely item Y is purchased when item X is purchased. This is measured by the proportion of transactions with item X, in which item Y also appears. Our top confidence is 0.576471, item y is purchased 57% of the time when item x is also purchased.

Lift is a measure that says how likely item Y is purchased when item X is purchased, while controlling for how popular item Y is. A lift value greater than 1 means that item Y is likely to be bought if item X is bought, while a value less than 1 means that item Y is unlikely to be bought if item X is bought. Our top lift is 8.260993 meaning there is a very likely chance that item y will be bought along with item x.

The practical significance of this report shows that associations between item purchases can be found using market basket analysis and that a company could run this type of report and find significance between purchases. They could then use that data to make business decisions.

A course of action an organization could take from this report is finding items that generally get sold together, such as compressed air and screen cleaner kits, and offer a deal or discount when a customer purchases both items. From the report, we already know that it is fairly likely that a customer will purchase both items, but offering a discount when both are purchased would most likely convert even more customers to buy both products, rather than just choosing one.