## Association Rule Mining in Retail Store

### Problem Statement:
 * What are the items that may be frequently purchased together?

### Objective:
* To know which items are frequently purchased together, keeping both item together will help to increase sales.


### Introduction
* Association rule mining is one of an important technique of data mining for knowledge discovery.
* The knowledge of the correlation between the items in the data transaction can use association rule mining.
* Retail store analysis is one of an application area of association rule mining technique.
* The possible percentage of the correlation of combined items gives the new knowledge. Therefore, it is a very helpful for determiner to take the decisions

### Analysis

In [4]:
## Importing Required Library

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

In [4]:
# Reading Csv file 
bread = pd.read_csv('Bread_Basket.csv')

In [5]:
## I have performed pivoting in Excel that why Missing cell is as NaN.
bread

Unnamed: 0,Row Labels,Adjustment,Afternoon with the baker,Alfajores,Argentina Night,Art Tray,Bacon,Baguette,Bakewell,Bare Popcorn,...,Tiffin,Toast,Truffles,Tshirt,Valentine's card,Vegan Feast,Vegan mincepie,Victorian Sponge,(blank),Grand Total
0,1,,,,,,,,,,...,,,,,,,,,,1
1,2,,,,,,,,,,...,,,,,,,,,,1
2,3,,,,,,,,,,...,,,,,,,,,,3
3,4,,,,,,,,,,...,,,,,,,,,,1
4,5,,,,,,,,,,...,,,,,,,,,,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9526,9680,,,,,,,,,,...,,,,,,,,,,1
9527,9681,,,,,,,,,,...,,,1.0,,,,,,,4
9528,9682,,,,,,,,,,...,,,,,,,,,,4
9529,9683,,,,,,,,,,...,,,,,,,,,,2


In [6]:
## As our data showing missing values instead 0, we can fill na with '0' as follow
## since Apriori Algorithm only work on Binary data we need to convert "NaN" Values to '0'
bread = bread.fillna(0)

In [7]:
## since 'Row Labels', 'Adjustment','Grand Total','(blank)' does not needed to include in model, we can remove it

bread = bread.drop(['Row Labels', 'Adjustment','Grand Total','(blank)'], axis = 1)


In [8]:
bread.head()

Unnamed: 0,Afternoon with the baker,Alfajores,Argentina Night,Art Tray,Bacon,Baguette,Bakewell,Bare Popcorn,Basket,Bowl Nic Pitt,...,The BART,The Nomad,Tiffin,Toast,Truffles,Tshirt,Valentine's card,Vegan Feast,Vegan mincepie,Victorian Sponge
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [9]:
## finding associate(item) which brought together with frequency greater than 1%
associate = apriori(df = bread, min_support= 0.01, use_colnames= True)

In [10]:
associate.head()

Unnamed: 0,support,itemsets
0,0.036093,(Alfajores)
1,0.015948,(Baguette)
2,0.32494,(Bread)
3,0.039765,(Brownie)
4,0.103137,(Cake)


In [11]:
## Sorting Associate by support in ascending order.
associate.sort_values(by = 'support').head()

Unnamed: 0,support,itemsets
31,0.010282,"(Bread, Alfajores)"
21,0.010387,(Salad)
11,0.010492,(Hearty & Seasonal)
33,0.010702,"(Bread, Brownie)"
57,0.010807,"(Coffee, Spanish Brunch)"


In [12]:
## Createing associate rule such that item brought with conditional probability(Confidence) more than 50% with corresponding item
asso_rule = association_rules(associate, min_threshold= 0.5)

In [13]:
asso_rule.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Alfajores),(Coffee),0.036093,0.475081,0.019515,0.540698,1.138116,0.002368,1.142861
1,(Cake),(Coffee),0.103137,0.475081,0.054349,0.526958,1.109196,0.00535,1.109667
2,(Cookies),(Coffee),0.054034,0.475081,0.028014,0.518447,1.09128,0.002343,1.090053
3,(Hot chocolate),(Coffee),0.057916,0.475081,0.029378,0.507246,1.067704,0.001863,1.065276
4,(Juice),(Coffee),0.038296,0.475081,0.02046,0.534247,1.124537,0.002266,1.127031


In [14]:
asso_rule.sort_values(by='lift', ascending= False).head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
11,(Toast),(Coffee),0.033365,0.475081,0.023502,0.704403,1.482699,0.007651,1.775789
10,(Spanish Brunch),(Coffee),0.018046,0.475081,0.010807,0.598837,1.260494,0.002233,1.308493
5,(Medialuna),(Coffee),0.061379,0.475081,0.034939,0.569231,1.198175,0.005779,1.218561
7,(Pastry),(Coffee),0.08551,0.475081,0.047214,0.552147,1.162216,0.00659,1.172079
0,(Alfajores),(Coffee),0.036093,0.475081,0.019515,0.540698,1.138116,0.002368,1.142861
4,(Juice),(Coffee),0.038296,0.475081,0.02046,0.534247,1.124537,0.002266,1.127031
6,(NONE),(Coffee),0.079005,0.475081,0.042073,0.532537,1.120938,0.004539,1.122908
8,(Sandwich),(Coffee),0.071346,0.475081,0.037981,0.532353,1.120551,0.004086,1.122468
1,(Cake),(Coffee),0.103137,0.475081,0.054349,0.526958,1.109196,0.00535,1.109667
9,(Scone),(Coffee),0.034309,0.475081,0.017941,0.522936,1.100729,0.001642,1.10031


In [3]:
# lift  = support(Toast And Coffee)/Support(Toast | coffee)
## Just checking Lift for one of item
print(f"Lift : {0.023502/(0.033365*0.475081)}")


Lift : 1.4826752253041542


## Conclusion
 * it has been observed that:
 * Toast has been brought 3.3% of all the transcaction
 * Coffee has been brought 47.5% of all the transcaction
 * Toast and Coffee has been brought together with confidence 70%
 * Toast and Coffee is strongly associate with highest lift of 1.48

### Reference 
* "https://github.com/viktree/curly-octo-chainsaw/blob/master/BreadBasket_DMS.csv"
* https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwj-4qW15J_qAhUwzTgGHQ5MCuUQFjAHegQICRAB&url=http%3A%2F%2Fwww.ijarcs.info%2Findex.php%2FIjarcs%2Farticle%2Fdownload%2F4564%2F4083&usg=AOvVaw0tJaQUepruvpCogDKbi7T3