# Teleco Market Basket - Association Rules and Lift Analysis

#### A1:Proposal of Question

Can we predict related and future purchases based on the purchase of 1 item? This will give us insights for loss leaders, bundling or product placement. 

#### A2:Defined Goal

The goal of this effort will be to find at least 1 sku(item)  that has strong relationships to other items and discovering what are the strongest relationships in order to make recommendations on what are the best items to pair for bundling, loss leaders, or product placement. 

## Part II: Market Basket Justification

Market basket analysis answers the question, "What are the chances a customer will be one product based on the the purchases of other products they are making." Example of this in action would be; A customer walks into a store and buys frozen chicken breasts, how likely are they to buy tennis shoes, minced garlic, pot sticker, beer, milk etc. 

By understanding stronger associations buyers for a store can work out coupons, loss leaders, product placement or potentially experiment with gaps in inventory. They can also review selling more by bundling items. For example if butter, sugar, and flour have strong assocative  relations and it is found that a weaker seller and association of condensed milk is made, there could be a bundle deal made for all four to help in selling more condensed milk. 


OUTCOME
The ulitmate outcome is finding the relationship between 1 item and relationships to other items. This will be done with 6 steps. 
- 1) Identify the ANTECEDENT(the initial purchase)
- 2) Obtain the CONSEQUENT(SKU purchased after first purchase)
- 3) Display FREQUENCY (how often two items appear side by side in transaction)
- 4) Present the SUPPORT or PROBABILITY(ratio of transactions)
- 5) Collect CONFIDENCE SCORE (ratio SUPPORT of SKU(s) to SUPPORT of SKU(s)  this can be done by dividing the frequency of ANTECEDENT combinations with the CONSEQUENT observation. 
- 6) Calculate LIFT(probablity increase due to CONSEQUENT purchase or ratio of the transactions containing SKU(s)to SKU(s) if they were random and independent.

Visual example of above reference


| ANTECEDENT      | CONSEQUENT | FREQUENCY | SUPPORT | CONFIDENCE | LIFT |
| ----------- | ----------- |-----------|-----------|-----------|-----------|
| CHICKEN      | GARLIC       | 8      | .42 |.8|1.3|



#### B2:Transaction Example

|ITEM ID     | ITEM DESCRIPTION
| ----------- | ----------- |
|Item01|	Logitech M510 Wireless mouse|
|Item02|	HP 63 Ink|
|Item03|	HP 65 ink|
|Item04|	nonda USB C to USB Adapter|
|Item05|	10ft iPHone Charger Cable|
|Item06|	HP 902XL ink|
|Item07|	Creative Pebble 2.0 Speakers|
|Item08|	Cleaning Gel Universal Dust Cleaner|
|Item09|	Micro Center 32GB Memory card|
|Item10|	YUNSONG 3pack 6ft Nylon Lightning Cable|
|Item11|	TopMate C5 Laptop Cooler pad|
|Item12|	Apple USB-C Charger cable|
|Item13|	HyperX Cloud Stinger Headset|
|Item14|	TONOR USB Gaming Microphone|
|Item15|	Dust-Off Compressed Gas 2 pack|
|Item16|	3A USB Type C Cable 3 pack 6FT|
|Item17|	HOVAMP iPhone charger|
|Item18|	SanDisk Ultra 128GB card|
|Item19|	FEEL2NICE 5 pack 10ft Lighning cable|
|Item20|	FEIYOLD Blue light Blocking Glasses|


#### B3:Market Basket Assumption

subset of a frequent itemset is also frequent so if item a and b are frequently purchased together it is assumed that is item c is used as a subset of a and b that is is a frequently purchased item. 

## Part III: Data Preparation and Analysis 

#### C1:Transforming the Dataset

Import my common libraries

In [1]:
#### LOAD LIBRARIES
##### system
import sys, os
from time import time

##### Data
import pandas as pd
import numpy as np


###### stats
import scipy.stats as stats
import math
import statsmodels.api as sm

##### Visualization
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import rcParams
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams.update({'figure.max_open_warning': 0})

#### PreProcessing Test train split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_validate, train_test_split, StratifiedShuffleSplit, cross_val_score
from sklearn.feature_selection import SelectKBest, SelectPercentile, f_classif
from sklearn.pipeline import Pipeline, make_pipeline, FeatureUnion

#### Evaluation of Models
from sklearn.metrics import accuracy_score, precision_score, f1_score
from sklearn.metrics import mean_squared_error as MSE

load data

In [2]:
file = pd.read_csv('C://Users//msmorris//Desktop//CHURN_VIZ//task3//teleco_market_basket.csv')
file.head()

Unnamed: 0,Item01,Item02,Item03,Item04,Item05,Item06,Item07,Item08,Item09,Item10,Item11,Item12,Item13,Item14,Item15,Item16,Item17,Item18,Item19,Item20
0,,,,,,,,,,,,,,,,,,,,
1,Logitech M510 Wireless mouse,HP 63 Ink,HP 65 ink,nonda USB C to USB Adapter,10ft iPHone Charger Cable,HP 902XL ink,Creative Pebble 2.0 Speakers,Cleaning Gel Universal Dust Cleaner,Micro Center 32GB Memory card,YUNSONG 3pack 6ft Nylon Lightning Cable,TopMate C5 Laptop Cooler pad,Apple USB-C Charger cable,HyperX Cloud Stinger Headset,TONOR USB Gaming Microphone,Dust-Off Compressed Gas 2 pack,3A USB Type C Cable 3 pack 6FT,HOVAMP iPhone charger,SanDisk Ultra 128GB card,FEEL2NICE 5 pack 10ft Lighning cable,FEIYOLD Blue light Blocking Glasses
2,,,,,,,,,,,,,,,,,,,,
3,Apple Lightning to Digital AV Adapter,TP-Link AC1750 Smart WiFi Router,Apple Pencil,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,


In [3]:
file.dropna(how='all', inplace=True)
file


Unnamed: 0,Item01,Item02,Item03,Item04,Item05,Item06,Item07,Item08,Item09,Item10,Item11,Item12,Item13,Item14,Item15,Item16,Item17,Item18,Item19,Item20
1,Logitech M510 Wireless mouse,HP 63 Ink,HP 65 ink,nonda USB C to USB Adapter,10ft iPHone Charger Cable,HP 902XL ink,Creative Pebble 2.0 Speakers,Cleaning Gel Universal Dust Cleaner,Micro Center 32GB Memory card,YUNSONG 3pack 6ft Nylon Lightning Cable,TopMate C5 Laptop Cooler pad,Apple USB-C Charger cable,HyperX Cloud Stinger Headset,TONOR USB Gaming Microphone,Dust-Off Compressed Gas 2 pack,3A USB Type C Cable 3 pack 6FT,HOVAMP iPhone charger,SanDisk Ultra 128GB card,FEEL2NICE 5 pack 10ft Lighning cable,FEIYOLD Blue light Blocking Glasses
3,Apple Lightning to Digital AV Adapter,TP-Link AC1750 Smart WiFi Router,Apple Pencil,,,,,,,,,,,,,,,,,
5,UNEN Mfi Certified 5-pack Lightning Cable,,,,,,,,,,,,,,,,,,,
7,Cat8 Ethernet Cable,HP 65 ink,,,,,,,,,,,,,,,,,,
9,Dust-Off Compressed Gas 2 pack,Screen Mom Screen Cleaner kit,Moread HDMI to VGA Adapter,HP 62XL Tri-Color ink,Apple USB-C Charger cable,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14993,SanDisk 32GB Ultra SDHC card,Vsco 70 pack stickers,SanDisk 128GB microSDXC card,,,,,,,,,,,,,,,,,
14995,Apple Lightning to Digital AV Adapter,Nylon Braided Lightning to USB cable,Apple Pencil,USB 2.0 Printer cable,ARRIS SURFboard SB8200 Cable Modem,Apple USB-C Charger cable,,,,,,,,,,,,,,
14997,Falcon Dust Off Compressed Gas,,,,,,,,,,,,,,,,,,,
14999,HP 63XL Ink,Apple USB-C Charger cable,,,,,,,,,,,,,,,,,,


Remove blanks

In [4]:
file= file.stack().groupby(level=0).apply(list).tolist()

In [5]:
from mlxtend.preprocessing import TransactionEncoder

encoder = TransactionEncoder().fit(file)
mba = encoder.transform(file)
mba = pd.DataFrame(mba, columns = encoder.columns_)
mba_clean = mba
mba_clean


Unnamed: 0,10ft iPHone Charger Cable,10ft iPHone Charger Cable 2 Pack,3 pack Nylon Braided Lightning Cable,3A USB Type C Cable 3 pack 6FT,5pack Nylon Braided USB C cables,ARRIS SURFboard SB8200 Cable Modem,Anker 2-in-1 USB Card Reader,Anker 4-port USB hub,Anker USB C to HDMI Adapter,Apple Lightning to Digital AV Adapter,...,hP 65 Tri-color ink,iFixit Pro Tech Toolkit,iPhone 11 case,iPhone 12 Charger cable,iPhone 12 Pro case,iPhone 12 case,iPhone Charger Cable Anker 6ft,iPhone SE case,nonda USB C to USB Adapter,seenda Wireless mouse
0,True,False,False,True,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
1,False,False,False,False,False,False,False,False,False,True,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7497,False,False,False,False,False,True,False,False,False,True,...,False,False,False,False,False,False,False,False,False,False
7498,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7499,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [6]:
mba_clean.to_csv('d212_t3.csv')

#### C2:Code Execution

In [8]:
from mlxtend.frequent_patterns import apriori, association_rules
# from apyori import apriori -- too long of processing time
# setting min to 60% of total transactions
mba_a = apriori(mba, min_support=0.001)
mba_a 



Unnamed: 0,support,itemsets
0,0.009065,(0)
1,0.050527,(1)
2,0.005199,(2)
3,0.042528,(3)
4,0.019064,(4)
...,...,...
6773,0.001067,"(96, 67, 102, 90, 28)"
6774,0.001067,"(102, 76, 89, 91, 28)"
6775,0.001067,"(98, 102, 90, 91, 28)"
6776,0.001600,"(67, 36, 102, 89, 90)"


#### C3:Association Rules Table
    The submission includes accurate values for the support, lift, and confidence of the association rules table.

In [9]:
mba_ar = association_rules(mba_a)
mba_ar


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,"(104, 1)",(28),0.002,0.238368,0.001866,0.933333,3.915511,0.00139,11.424477
1,"(4, 111)",(41),0.002666,0.079323,0.002533,0.95,11.976387,0.002321,18.413545
2,"(6, 22)",(28),0.001733,0.238368,0.001466,0.846154,3.549776,0.001053,4.950607
3,"(94, 6)",(90),0.0016,0.129583,0.001333,0.833333,6.430898,0.001126,5.222504
4,"(20, 31)",(102),0.001466,0.17411,0.0012,0.818182,4.69922,0.000945,4.542394
5,"(82, 22)",(28),0.001466,0.238368,0.0012,0.818182,3.432428,0.00085,4.188975
6,"(29, 86)",(28),0.0012,0.238368,0.001067,0.888889,3.729058,0.000781,6.854686
7,"(88, 69)",(28),0.001466,0.238368,0.0012,0.818182,3.432428,0.00085,4.188975
8,"(89, 82)",(28),0.001866,0.238368,0.0016,0.857143,3.595877,0.001155,5.331422
9,"(90, 111)",(54),0.001866,0.071457,0.0016,0.857143,11.995203,0.001466,6.4998


#### C4:Top Three Rules

In [10]:
support = mba_ar.sort_values('support')[:3]
confidence = mba_ar.sort_values('confidence', ascending=False)[:3]
lift = mba_ar.sort_values('lift', ascending=False)[0:3]
leverage = mba_ar.sort_values('leverage', ascending=False)[:3]
conviction = mba_ar.sort_values('conviction', ascending=False)[:3]

In [11]:
print('SUPPORT')
support

SUPPORT


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
25,"(71, 102, 22)",(12),0.0012,0.179709,0.001067,0.888889,4.946258,0.000851,7.382616
40,"(41, 98, 54)",(36),0.0012,0.163845,0.001067,0.888889,5.425188,0.00087,7.525397
39,"(28, 94, 92)",(90),0.001067,0.129583,0.001067,1.0,7.717078,0.000928,inf


In [12]:
print('CONFIDENCE')
confidence

CONFIDENCE


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
17,"(98, 4, 111)",(41),0.001067,0.079323,0.001067,1.0,12.606723,0.000982,inf
39,"(28, 94, 92)",(90),0.001067,0.129583,0.001067,1.0,7.717078,0.000928,inf
31,"(92, 54, 30)",(28),0.0012,0.238368,0.0012,1.0,4.19519,0.000914,inf


In [13]:
print('LIFT')
lift

LIFT


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
27,"(12, 28, 111)",(54),0.001466,0.071457,0.001333,0.909091,12.722185,0.001228,10.213971
17,"(98, 4, 111)",(41),0.001067,0.079323,0.001067,1.0,12.606723,0.000982,inf
9,"(90, 111)",(54),0.001866,0.071457,0.0016,0.857143,11.995203,0.001466,6.4998


In [14]:
print('LEVERAGE')
leverage

LEVERAGE


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
1,"(4, 111)",(41),0.002666,0.079323,0.002533,0.95,11.976387,0.002321,18.413545
19,"(8, 67, 30)",(102),0.002533,0.17411,0.002133,0.842105,4.836624,0.001692,5.230636
9,"(90, 111)",(54),0.001866,0.071457,0.0016,0.857143,11.995203,0.001466,6.4998


In [15]:
print('CONVICTION')
conviction

CONVICTION


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
17,"(98, 4, 111)",(41),0.001067,0.079323,0.001067,1.0,12.606723,0.000982,inf
39,"(28, 94, 92)",(90),0.001067,0.129583,0.001067,1.0,7.717078,0.000928,inf
31,"(92, 54, 30)",(28),0.0012,0.238368,0.0012,1.0,4.19519,0.000914,inf


## Part IV: Data Summary and Implications

#### D1:Significance of Support, Lift, and Confidence Summary

#### SUPPORT
Support results in how popular an item is or shows up in the data. If any item only shows up 1 time out of 5000 it wont give much weight to our analysis. 
Our top 3 items in this case are
- (SanDisk Ultra 128GB card, FEEL2NICE 5 pack 10.)
- (Brother Genuine High Yield Toner Cartridge, F...)
- (Cat8 Ethernet Cable, SanDisk Extreme 256GB card)

#### LIFT
Lift and confidence are similiar in nature but Lift controls for how popular a purchased item may be. 
Lift gives us insight into how likely an items is purchased when another item is purchase in other words
how likely is it that someone will buy X if they purchased Y, controlling for popularity of both x and y
Our top 3 lift scores are
- (Screen Mom Screen Cleaner kit, iPhone 11 case)
- (iPhone 11 case, 5pack Nylon Braided USB C cab..)
- (Anker 2-in-1 USB Card Reader, TP-Link AC1750 ...)

#### CONFIDENCE
Finally Confidence
This shows how likely it is a person will buy item x if they have purchased item y.
- (iPhone 11 case, 5pack Nylon Braided USB C cab...)
- (iPhone 11 case, 5pack Nylon Braided USB C cab..)
- (SanDisk Ultra 128GB card, FEEL2NICE 5 pack 10...)


#### D2:Practical Significance of Findings

We can see from the above data that there are some relationships that can be made for example in our top 3 for lift 
There is the relationship made for:

- (Screen Mom Screen Cleaner kit, iPhone 11 case)
- (iPhone 11 case, 5pack Nylon Braided USB C cab..)
- (Anker 2-in-1 USB Card Reader, TP-Link AC1750 ...)

This process can be automated and used to review not only top 3 but top 10, percentiles. Hidden relationships within our less popular product and it can all be automated into a dashboard for ease of use and reporting. 


#### D3:Course of Action

An obvious recommendation is to create a "customers who bought this also bought this" recommendation engine. I think this an opportunity however for our internal buyers to create bundle packages, loss leaders and review options for product placement overall to intrique shoppers and give them better value while increasing sales. 

This can be done by automating the above process and give recommendation to our internal buyers on products that work well together and recommendations based on subets frequency if the product would be a candiate for a bundle, product placement, or loss leader. 