# Market Basket Analysis

**Unsupervised ML Algorithm**

Important Metrics in Association rules :

Support: is an indication of how frequently the item set appears in the data set.
    
    supp(T-shirt⇒Trousers)=(3/7)=43%
    
Confidence:For a rule X⇒Y, confidence shows the percentage in which Y is bought with X. 
           It’s an indication of how often the rule has been found to be true.
           
     conf(Trousers⇒Belt)=(4/7)/(5/7)=80%
           
The lift of a rule is the ratio of the observed support to that expected if X and Y were independent

    lift(T-shirt⇒Trousers)= (3/7)/(4/7)(5/7)=1.05

In [63]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

In [50]:
#get dataset
df=pd.read_csv('Bread Basket.csv')
df.head()

Unnamed: 0,Transaction,Item,date_time,period_day,weekday_weekend
0,1,Bread,30-10-2016 09:58,morning,weekend
1,2,Scandinavian,30-10-2016 10:05,morning,weekend
2,2,Scandinavian,30-10-2016 10:05,morning,weekend
3,3,Hot chocolate,30-10-2016 10:07,morning,weekend
4,3,Jam,30-10-2016 10:07,morning,weekend


In [47]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20507 entries, 0 to 20506
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Transaction      20507 non-null  int64 
 1   Item             20507 non-null  object
 2   date_time        20507 non-null  object
 3   period_day       20507 non-null  object
 4   weekday_weekend  20507 non-null  object
dtypes: int64(1), object(4)
memory usage: 801.2+ KB


Dataset has collection of data for different time interval.
We need to convert it to understandable transactions

In [51]:
df = df.groupby('Transaction')['Item'].apply(list)
df = pd.DataFrame(df)

In [53]:
df.index

Int64Index([   1,    2,    3,    4,    5,    6,    7,    8,    9,   10,
            ...
            9674, 9676, 9677, 9678, 9679, 9680, 9681, 9682, 9683, 9684],
           dtype='int64', name='Transaction', length=9465)

We got 9684 transactions in dataset. Each transaction is list of items bought can be called as Basket.

In [59]:
data=list(df['Item'])
data

[['Bread'],
 ['Scandinavian', 'Scandinavian'],
 ['Hot chocolate', 'Jam', 'Cookies'],
 ['Muffin'],
 ['Coffee', 'Pastry', 'Bread'],
 ['Medialuna', 'Pastry', 'Muffin'],
 ['Medialuna', 'Pastry', 'Coffee', 'Tea'],
 ['Pastry', 'Bread'],
 ['Bread', 'Muffin'],
 ['Scandinavian', 'Medialuna'],
 ['Bread', 'Medialuna', 'Bread'],
 ['Jam', 'Coffee', 'Tartine', 'Pastry', 'Tea'],
 ['Basket', 'Bread', 'Coffee'],
 ['Bread', 'Medialuna', 'Pastry'],
 ['Mineral water', 'Scandinavian'],
 ['Bread', 'Medialuna', 'Coffee'],
 ['Hot chocolate'],
 ['Farm House'],
 ['Farm House', 'Bread'],
 ['Bread', 'Medialuna'],
 ['Coffee', 'Coffee', 'Medialuna', 'Bread'],
 ['Jam'],
 ['Scandinavian', 'Muffin'],
 ['Bread'],
 ['Scandinavian'],
 ['Fudge'],
 ['Scandinavian'],
 ['Coffee', 'Bread'],
 ['Bread', 'Jam'],
 ['Bread'],
 ['Basket'],
 ['Scandinavian', 'Muffin'],
 ['Coffee'],
 ['Coffee', 'Muffin'],
 ['Muffin', 'Scandinavian'],
 ['Tea', 'Bread'],
 ['Coffee', 'Bread'],
 ['Bread', 'Tea'],
 ['Scandinavian'],
 ['Juice', 'Tartine', 

Creating data using 'from mlxtend.preprocessing import TransactionEncoder' to prepare ready.

In [64]:
te = TransactionEncoder()
te_ary = te.fit(data).transform(data)
df = pd.DataFrame(te_ary, columns=te.columns_)
df.head()

Unnamed: 0,Adjustment,Afternoon with the baker,Alfajores,Argentina Night,Art Tray,Bacon,Baguette,Bakewell,Bare Popcorn,Basket,...,The BART,The Nomad,Tiffin,Toast,Truffles,Tshirt,Valentine's card,Vegan Feast,Vegan mincepie,Victorian Sponge
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [70]:
def convert_into_binary(x):
    if x > 0:
        return 1
    else:
        return 0
    
df = df.applymap(convert_into_binary)

For 94 items we created pivot table of transiction which is required format of apriori.

**Calling Apriori function and Mining Association rules**

In [72]:
#call apriori function and pass minimum support here we are passing 7%. 
# means 7 times in total number of transaction that item was present.

frequent_itemsets = apriori(df, min_support=0.03, use_colnames=True)

In [73]:
frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.036344,(Alfajores)
1,0.327205,(Bread)
2,0.040042,(Brownie)
3,0.103856,(Cake)
4,0.478394,(Coffee)


In [99]:
# we have association rules which need to put on frequent itemset. 
# here we are setting based on lift and has minimum lift as 1
rules_mlxtend = association_rules(frequent_itemsets, metric="lift", min_threshold=0.05)
rules_mlxtend.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Coffee),(Bread),0.478394,0.327205,0.090016,0.188163,0.575059,-0.066517,0.828731
1,(Bread),(Coffee),0.327205,0.478394,0.090016,0.275105,0.575059,-0.066517,0.719561
2,(Coffee),(Cake),0.478394,0.103856,0.054728,0.114399,1.101515,0.005044,1.011905
3,(Cake),(Coffee),0.103856,0.478394,0.054728,0.526958,1.101515,0.005044,1.102664
4,(Coffee),(Medialuna),0.478394,0.061807,0.035182,0.073542,1.189878,0.005614,1.012667


In [100]:
print(rules_mlxtend[rules_mlxtend['confidence']==max(rules_mlxtend['confidence'])])

print(rules_mlxtend[rules_mlxtend['lift']==max(rules_mlxtend['lift'])])

   antecedents consequents  antecedent support  consequent support   support  \
5  (Medialuna)    (Coffee)            0.061807            0.478394  0.035182   

   confidence      lift  leverage  conviction  
5    0.569231  1.189878  0.005614    1.210871  
   antecedents  consequents  antecedent support  consequent support   support  \
4     (Coffee)  (Medialuna)            0.478394            0.061807  0.035182   
5  (Medialuna)     (Coffee)            0.061807            0.478394  0.035182   

   confidence      lift  leverage  conviction  
4    0.073542  1.189878  0.005614    1.012667  
5    0.569231  1.189878  0.005614    1.210871  


Conclusion : We can observe maximum lift and maximum confidence. And can use to filter out rules as per requirement.

In [101]:
# menu = ['Adjustment', 'Afternoon with the baker', 'Alfajores',
#        'Argentina Night', 'Art Tray', 'Bacon', 'Baguette', 'Bakewell',
#        'Bare Popcorn', 'Basket', 'Bowl Nic Pitt', 'Bread', 'Bread Pudding',
#        'Brioche and salami', 'Brownie', 'Cake', 'Caramel bites',
#        'Cherry me Dried fruit', 'Chicken Stew', 'Chicken sand',
#        'Chimichurri Oil', 'Chocolates', 'Christmas common', 'Coffee',
#        'Coffee granules ', 'Coke', 'Cookies', 'Crepes', 'Crisps',
#        'Drinking chocolate spoons ', 'Duck egg', 'Dulce de Leche', 'Eggs',
#        "Ella's Kitchen Pouches", 'Empanadas', 'Extra Salami or Feta',
#        'Fairy Doors', 'Farm House', 'Focaccia', 'Frittata', 'Fudge',
#        'Gift voucher', 'Gingerbread syrup', 'Granola', 'Hack the stack',
#        'Half slice Monster ', 'Hearty & Seasonal', 'Honey', 'Hot chocolate',
#        'Jam', 'Jammie Dodgers', 'Juice', 'Keeping It Local', 'Kids biscuit',
#        'Lemon and coconut', 'Medialuna', 'Mighty Protein', 'Mineral water',
#        'Mortimer', 'Muesli', 'Muffin', 'My-5 Fruit Shoot', 'Nomad bag',
#        'Olum & polenta', 'Panatone', 'Pastry', 'Pick and Mix Bowls', 'Pintxos',
#        'Polenta', 'Postcard', 'Raspberry shortbread sandwich', 'Raw bars',
#        'Salad', 'Sandwich', 'Scandinavian', 'Scone', 'Siblings', 'Smoothies',
#        'Soup', 'Spanish Brunch', 'Spread', 'Tacos/Fajita', 'Tartine', 'Tea',
#        'The BART', 'The Nomad', 'Tiffin', 'Toast', 'Truffles', 'Tshirt',
#        "Valentine's card", 'Vegan Feast', 'Vegan mincepie',
#        'Victorian Sponge'] 

#Convert frozenset objects into string
rules_mlxtend["antecedents"] = rules_mlxtend["antecedents"].apply(lambda x: ', '.join(list(x))).astype("unicode")
rules_mlxtend["consequents"] = rules_mlxtend["consequents"].apply(lambda x: ', '.join(list(x))).astype("unicode")



In [116]:
bought_item = "Bread"


recommended_item = rules_mlxtend['consequents'].loc[(rules_mlxtend["antecedents"] == bought_item) & rules_mlxtend['lift'] >=1]
recommended_item = str(recommended_item).split("Name")[0].replace(" ","").replace("\n","")[1:]

print("You Have bought",bought_item,"... Are you missing ",recommended_item," !!!")

You Have bought Bread ... Are you missing  Coffee  !!!
