# HOW TO USE ML AND APRORI ALGORITHM TO ENHANCE MARKET BASKET ANALYSIS

Market basket analysis is a data mining technique, which aids in finding items that buyers desire to buy.

Questions such as: 
Which items are bought together? 
If a user buys an item X, which item is he/she likely to buy next?

## Importing the libraries that will be used

In [93]:
import pandas as pd
import numpy as np
from apyori import apriori
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

## Loading the dataset

In [94]:
df = pd.read_csv('test.csv',  encoding='latin1')
df.head()

Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk


In [95]:
df.shape

(38765, 3)

In [96]:
df.columns

Index(['Member_number', 'Date', 'itemDescription'], dtype='object')

In [97]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38765 entries, 0 to 38764
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Member_number    38765 non-null  int64 
 1   Date             38765 non-null  object
 2   itemDescription  38765 non-null  object
dtypes: int64(1), object(2)
memory usage: 908.7+ KB


Association rules: These are widely used to analyze retail related data with an intention to identify strong patterns discovered in transaction data

In [98]:
basket = (df.groupby(['Member_number', 'Date'])['itemDescription']
          .apply(list)
          .reset_index(name='Products'))
basket

Unnamed: 0,Member_number,Date,Products
0,1000,15-03-2015,"[sausage, whole milk, semi-finished bread, yog..."
1,1000,24-06-2014,"[whole milk, pastry, salty snack]"
2,1000,24-07-2015,"[canned beer, misc. beverages]"
3,1000,25-11-2015,"[sausage, hygiene articles]"
4,1000,27-05-2015,"[soda, pickled vegetables]"
...,...,...,...
14958,4999,24-01-2015,"[tropical fruit, berries, other vegetables, yo..."
14959,4999,26-12-2015,"[bottled water, herbs]"
14960,5000,09-03-2014,"[fruit/vegetable juice, onions]"
14961,5000,10-02-2015,"[soda, root vegetables, semi-finished bread]"


In [110]:
# Convert the products to a one-hot encoded matrix
basket_encoded = basket['Products'].str.join('|').str.get_dummies()
basket_encoded.tail()

Unnamed: 0,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
14958,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,1,0
14959,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
14960,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
14961,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
14962,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [103]:
frequent_itemsets = apriori(basket_encoded, min_support=0.01, use_colnames=True)
frequent_itemsets



Unnamed: 0,support,itemsets
0,0.021386,(UHT-milk)
1,0.033950,(beef)
2,0.021787,(berries)
3,0.016574,(beverages)
4,0.045312,(bottled beer)
...,...,...
64,0.010559,"(rolls/buns, other vegetables)"
65,0.014837,"(other vegetables, whole milk)"
66,0.013968,"(rolls/buns, whole milk)"
67,0.011629,"(soda, whole milk)"


In [104]:
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=0.0)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(rolls/buns),(other vegetables),0.110005,0.122101,0.010559,0.09599,0.786154,-0.002872,0.971117,-0.234091
1,(other vegetables),(rolls/buns),0.122101,0.110005,0.010559,0.086481,0.786154,-0.002872,0.974249,-0.236553
2,(other vegetables),(whole milk),0.122101,0.157923,0.014837,0.121511,0.76943,-0.004446,0.958551,-0.254477
3,(whole milk),(other vegetables),0.157923,0.122101,0.014837,0.093948,0.76943,-0.004446,0.968928,-0.262461
4,(rolls/buns),(whole milk),0.110005,0.157923,0.013968,0.126974,0.804028,-0.003404,0.96455,-0.214986
5,(whole milk),(rolls/buns),0.157923,0.110005,0.013968,0.088447,0.804028,-0.003404,0.97635,-0.224474
6,(soda),(whole milk),0.097106,0.157923,0.011629,0.119752,0.758296,-0.003707,0.956636,-0.260917
7,(whole milk),(soda),0.157923,0.097106,0.011629,0.073635,0.758296,-0.003707,0.974663,-0.274587
8,(yogurt),(whole milk),0.085879,0.157923,0.011161,0.129961,0.82294,-0.002401,0.967861,-0.190525
9,(whole milk),(yogurt),0.157923,0.085879,0.011161,0.070673,0.82294,-0.002401,0.983638,-0.203508


In [108]:
rule = rules[['lift', 'leverage']].max()
rule

lift        0.822940
leverage   -0.002401
dtype: float64

# Insights derived from this analysis

Judging based on the confidence metric, the highest is 0.95990. This means that a customer who buys rolls/buns is likely to buy other vegetables. 

The confidence metric isn't always reliable. Using this dataset, other vegetables has a high support and this means it occurs in many transactions done. This will not make a business impact if we decide to sell rolls/buns with other vegetables.

A better alternative metric is either lift or leverage.

From this dataset, the highest lift and leverage is: 0.822940 and -0.002401. These values can be found in the metric 8 and 9. 
A good business decision that can be derived from this market basket analysis is this: Customers who buys yoghurt are more likely to buy whole milk together. It's a good business decison to promote Whole milk together to maximize sales and profit