# MARKET BASKET ANALYSIS
## BY DWI SMARADAHANA INDRALOKA
***

## Import Library

In [1]:
import pandas as pd
from mlxtend.preprocessing import OnehotTransactions
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

## Upload Dataset

In [2]:
data = pd.read_csv("Untitled form.csv")
data.head()

Unnamed: 0,Timestamp,Name,Item 1,Item 2,Item 3
0,2019/09/17 8:58:22 AM GMT+7,Firdaus Adi Nugroho,HP,Racket,Watch
1,2019/09/17 8:58:24 AM GMT+7,faizah,HP,Camera,Watch
2,2019/09/17 8:58:30 AM GMT+7,andrem,Watch,Camera,Music Pad
3,2019/09/17 8:58:30 AM GMT+7,laili,Camera,Watch,Mouse
4,2019/09/17 8:58:33 AM GMT+7,Tara,HP,Watch,Music Pad


## Drop Unused Columns

In [3]:
data.drop(["Timestamp", "Name"], axis = 1, inplace = True)
data.head()

Unnamed: 0,Item 1,Item 2,Item 3
0,HP,Racket,Watch
1,HP,Camera,Watch
2,Watch,Camera,Music Pad
3,Camera,Watch,Mouse
4,HP,Watch,Music Pad


In [4]:
data.shape

(24, 3)

## NaN Value

In [5]:
data.isna().sum()

Item 1    0
Item 2    0
Item 3    5
dtype: int64

In [6]:
data = data.fillna("NaN")

## Make New Column that Combined Item 1, Item 2 and Item 3 into List

In [7]:
data["New"] = data[["Item 1", "Item 2", "Item 3"]].values.tolist()
data.head()

Unnamed: 0,Item 1,Item 2,Item 3,New
0,HP,Racket,Watch,"[HP, Racket, Watch]"
1,HP,Camera,Watch,"[HP, Camera, Watch]"
2,Watch,Camera,Music Pad,"[Watch, Camera, Music Pad]"
3,Camera,Watch,Mouse,"[Camera, Watch, Mouse]"
4,HP,Watch,Music Pad,"[HP, Watch, Music Pad]"


## Transform Dataset with One Hot Transaction

In [8]:
oht = OnehotTransactions()
oht_ary = oht.fit(data["New"]).transform(data["New"])
data1 = pd.DataFrame(oht_ary, columns = oht.columns_)
data1.head()



Unnamed: 0,Bag,Camera,Guitar,HP,Mouse,Music Pad,NaN,Racket,Router,Soap,Watch
0,False,False,False,True,False,False,False,True,False,False,True
1,False,True,False,True,False,False,False,False,False,False,True
2,False,True,False,False,False,True,False,False,False,False,True
3,False,True,False,False,True,False,False,False,False,False,True
4,False,False,False,True,False,True,False,False,False,False,True


In [9]:
data1.shape

(24, 11)

## Modelling

## 1. Min Support = 0.1, Min Confidence = 0.1, Min Lift = 1

* Apriori

In [10]:
frequent_itemsets = apriori(data1, min_support = 0.1, use_colnames = True)
frequent_itemsets = pd.DataFrame(frequent_itemsets)
frequent_itemsets = frequent_itemsets.reindex(columns = ["itemsets", "support"])
frequent_itemsets

Unnamed: 0,itemsets,support
0,(Bag),0.125
1,(Camera),0.666667
2,(Guitar),0.333333
3,(HP),0.166667
4,(Music Pad),0.291667
5,(NaN),0.208333
6,(Racket),0.25
7,(Soap),0.208333
8,(Watch),0.625
9,"(Camera, Guitar)",0.208333


* Association Rules

In [11]:
rules = association_rules(frequent_itemsets, metric = "lift", min_threshold = 1)
rules = rules[(rules["confidence"] >= 0.1)]
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Camera),(Music Pad),0.666667,0.291667,0.208333,0.3125,1.071429,0.01388889,1.030303
1,(Music Pad),(Camera),0.291667,0.666667,0.208333,0.714286,1.071429,0.01388889,1.166667
2,(Camera),(Watch),0.666667,0.625,0.416667,0.625,1.0,5.5511150000000004e-17,1.0
3,(Watch),(Camera),0.625,0.666667,0.416667,0.666667,1.0,5.5511150000000004e-17,1.0
4,(Guitar),(Racket),0.333333,0.25,0.125,0.375,1.5,0.04166667,1.2
5,(Racket),(Guitar),0.25,0.333333,0.125,0.5,1.5,0.04166667,1.333333
6,(HP),(Watch),0.166667,0.625,0.166667,1.0,1.6,0.0625,inf
7,(Watch),(HP),0.625,0.166667,0.166667,0.266667,1.6,0.0625,1.136364
8,(Music Pad),(Watch),0.291667,0.625,0.208333,0.714286,1.142857,0.02604167,1.3125
9,(Watch),(Music Pad),0.625,0.291667,0.208333,0.333333,1.142857,0.02604167,1.0625


### 1.1 Results for Itemsets = 1

In [12]:
frequent_itemsets.iloc[0:9].sort_values("support", ascending = False)

Unnamed: 0,itemsets,support
1,(Camera),0.666667
8,(Watch),0.625
2,(Guitar),0.333333
4,(Music Pad),0.291667
6,(Racket),0.25
5,(NaN),0.208333
7,(Soap),0.208333
3,(HP),0.166667
0,(Bag),0.125


From the results above, we can see items that have a support value above 0.1 are Camera, Watch, Guitar, Music Pad, Racket, Soap, HP and Bag. The items that have high support values are Camera (0.666) and Watch (0.625), that means around 66.6% of the total number of transactions contained Camera items and around 62.5% of the total number of transactions contained Watch items.

### 1.2 Results for Itemsets = 2

In [13]:
rules["itemsets"] = rules[["antecedents", "consequents"]].values.tolist()
rules.loc[0:9, ["itemsets", "support", "confidence", "lift"]]

Unnamed: 0,itemsets,support,confidence,lift
0,"[(Camera), (Music Pad)]",0.208333,0.3125,1.071429
1,"[(Music Pad), (Camera)]",0.208333,0.714286,1.071429
2,"[(Camera), (Watch)]",0.416667,0.625,1.0
3,"[(Watch), (Camera)]",0.416667,0.666667,1.0
4,"[(Guitar), (Racket)]",0.125,0.375,1.5
5,"[(Racket), (Guitar)]",0.125,0.5,1.5
6,"[(HP), (Watch)]",0.166667,1.0,1.6
7,"[(Watch), (HP)]",0.166667,0.266667,1.6
8,"[(Music Pad), (Watch)]",0.208333,0.714286,1.142857
9,"[(Watch), (Music Pad)]",0.208333,0.333333,1.142857


From the results above, we can see itemsets with two items that have a confidence value above 0.1 are (Camera, Music Pad), (Camera, Watch), (Guitar, Racket), (HP, Watch) and (Music Pad, Watch). The two itemsets with two items that have high confidence values are (HP, Watch) (1.00), that means when the buyers buy HP then they will buy Watch.

### 1.3 Results for Itemsets = 3

In [14]:
rules["itemsets"] = rules[["antecedents", "consequents"]].values.tolist()
rules.loc[10:11, ["itemsets", "support", "confidence", "lift"]]

Unnamed: 0,itemsets,support,confidence,lift
10,"[(Camera, Watch), (Music Pad)]",0.125,0.3,1.028571
11,"[(Music Pad), (Camera, Watch)]",0.125,0.428571,1.028571


From the results above, we can see itemsets with three items that have a confidence value above 0.1 are only (Camera, Music Pad, Watch), where the support value is 0.125 and lift value 1.028.

## 2. Min Support = 0.25, Min Confidence = 0.25, Min Lift = 1

* Apriori

In [15]:
frequent_itemsets1 = apriori(data1, min_support = 0.25, use_colnames = True)
frequent_itemsets1 = pd.DataFrame(frequent_itemsets1)
frequent_itemsets1 = frequent_itemsets1.reindex(columns = ["itemsets", "support"])
frequent_itemsets1

Unnamed: 0,itemsets,support
0,(Camera),0.666667
1,(Guitar),0.333333
2,(Music Pad),0.291667
3,(Racket),0.25
4,(Watch),0.625
5,"(Camera, Watch)",0.416667


* Association Rules

In [16]:
rules1 = association_rules(frequent_itemsets1, metric = "lift", min_threshold = 1)
rules1 = rules1[(rules1["confidence"] >= 0.25)]
rules1

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Camera),(Watch),0.666667,0.625,0.416667,0.625,1.0,5.5511150000000004e-17,1.0
1,(Watch),(Camera),0.625,0.666667,0.416667,0.666667,1.0,5.5511150000000004e-17,1.0


### 2.1 Results for Itemsets = 1

In [17]:
frequent_itemsets1.iloc[0:5].sort_values("support", ascending = False)

Unnamed: 0,itemsets,support
0,(Camera),0.666667
4,(Watch),0.625
1,(Guitar),0.333333
2,(Music Pad),0.291667
3,(Racket),0.25


From the results above, we can see items that have a support value above 0.25 are Camera, Watch, Guitar and Music Pad. The items that have high support values are Camera (0.666) and Watch (0.625), that means around 66.6% of the total number of transactions contained Camera items and around 62.5% of the total number of transactions contained Watch items.

### 2.2 Results for Itemsets = 2

In [18]:
rules1["itemsets"] = rules1[["antecedents", "consequents"]].values.tolist()
rules1.loc[0:1, ["itemsets", "support", "confidence", "lift"]]

Unnamed: 0,itemsets,support,confidence,lift
0,"[(Camera), (Watch)]",0.416667,0.625,1.0
1,"[(Watch), (Camera)]",0.416667,0.666667,1.0


From the results above, we can see itemsets with two items that have a confidence value above 0.25 are only (Camera, Watch), where the support value is 0.416, confidence value is 0.66 and lift value 1.00, that means the probability of two items will be bought is 41.6% and when the buyers buy Watch then the probability that they will buy Camera is 66.6%.

## Conclusion

Based on the results of the Market Basket Analysis above, we can do several business strategies like arranging the layout where the items that are often purchased simultaneously can be placed in close places or making a purchase packages for items that are often purchased simultaneously. Besides that, we can also do strategies to increase the sales, we can give discounts on least-selling items when the buyer has bought most-selling items. for example, Camera is the most-selling items, so we can make a promo when a buyer buys a Camera then they will get a discounts fore buys a Router or Bag, where the Router and Bag itself are the least-selling items. With do that strategies, the sales of Router and Bag are expected can be further increased.

***
# THANK YOU