<a href="https://colab.research.google.com/github/OptimusJet/OptimusJet/blob/main/Market_Basket_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Market Basket Analysis

#### Source of this notebook is the following: https://medium.com/analytics-vidhya/association-analysis-in-python-2b955d0180c

#### Market Basket Analysis or Association Rules mining is used to discover associations among items sold in a small grocery store  
#### Data file is available here:https://gist.github.com/Harsh-Git-Hub/2979ec48043928ad9033d8469928e751

In [None]:
# If necessary, install the mlxtend package
# To install, in Conda prompt, type and run the following command:
# conda install -c conda-forge mlxtend
# or type this command: pip instal mlxtend

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt

In [None]:
df1 = pd.read_csv('store_data.csv', sep=',')
df1.head()


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [None]:
# Extract unique values from the first columns
items = (df1['0'].unique())
items

array(['shrimp', 'burgers', 'chutney', 'turkey', 'mineral water',
       'low fat yogurt', 'whole wheat pasta', 'soup', 'frozen vegetables',
       'french fries', 'eggs', 'cookies', 'spaghetti', 'meatballs',
       'red wine', 'rice', 'parmesan cheese', 'ground beef',
       'sparkling water', 'herb & pepper', 'pickles', 'energy bar',
       'fresh tuna', 'escalope', 'avocado', 'tomato sauce',
       'clothes accessories', 'energy drink', 'chocolate',
       'grated cheese', 'yogurt cake', 'mint', 'asparagus', 'champagne',
       'ham', 'muffins', 'french wine', 'chicken', 'pasta', 'tomatoes',
       'pancakes', 'frozen smoothie', 'carrots', 'yams', 'shallot',
       'butter', 'light mayo', 'pepper', 'candy bars', 'cooking oil',
       'milk', 'green tea', 'bug spray', 'oil', 'olive oil', 'salmon',
       'cake', 'almonds', 'salt', 'strong cheese', 'hot dogs', 'pet food',
       'whole wheat rice', 'antioxydant juice', 'honey', 'sandwich',
       'salad', 'magazines', 'protein bar', '

## One-hot encoding the data frame

One-hot encoding refers to transforming categorical data into zero and 1 values. The following code transforms the initial data frame that contains categorical data values (product names) to numbers (zero or one)

In [None]:
# From df1, create a list of documents, each document containing
# a sales transaction as key-value pairs

encoded_vals = []
for index, row in df1.iterrows():
    labels = {}
    uncommons = list(set(items) - set(row))
    commons = list(set(items).intersection(row))
    for uc in uncommons:
        labels[uc] = 0
    for com in commons:
        labels[com] = 1
    encoded_vals.append(labels)
    encoded_vals[0]


# one-hot encoded data frame
ohe_df = pd.DataFrame(encoded_vals)

In [None]:
# after the one hot encoding process done in the previous cell
# the data frame looks like below

ohe_df.head(8)

Unnamed: 0,rice,grated cheese,tomatoes,energy bar,french fries,melons,magazines,mushroom cream sauce,whole wheat pasta,extra dark chocolate,...,whole weat flour,tomato juice,salmon,antioxydant juice,salad,honey,cottage cheese,vegetables mix,green grapes,green tea
0,0,0,0,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,1,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
freq_items = apriori(ohe_df, min_support=0.05, use_colnames=True)
freq_items.head(7)

Unnamed: 0,support,itemsets
0,0.052393,(grated cheese)
1,0.068391,(tomatoes)
2,0.170911,(french fries)
3,0.095321,(frozen vegetables)
4,0.095054,(pancakes)
5,0.087188,(burgers)
6,0.062525,(turkey)


In [None]:
rules = association_rules(freq_items, metric="confidence", min_threshold=0.20)
rules.sort_values ('confidence', ascending = False).head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
5,(spaghetti),(mineral water),0.17411,0.238368,0.059725,0.343032,1.439085,0.018223,1.159314
3,(chocolate),(mineral water),0.163845,0.238368,0.05266,0.3214,1.348332,0.013604,1.122357
1,(eggs),(mineral water),0.179709,0.238368,0.050927,0.283383,1.188845,0.00809,1.062815
4,(mineral water),(spaghetti),0.238368,0.17411,0.059725,0.250559,1.439085,0.018223,1.102008
2,(mineral water),(chocolate),0.238368,0.163845,0.05266,0.220917,1.348332,0.013604,1.073256
0,(mineral water),(eggs),0.238368,0.179709,0.050927,0.213647,1.188845,0.00809,1.043158


# Full Name: Nam Do


# Interpret the results from the modified analysis: 


*   If a customer buys mineral water, the store should recommend spaghetti, chocolate and eggs to the customer. Compare these 3 recommended products,*the best rule for the purpose of cross-selling a product with mineral water is*:  Mineral water -> Spaghetti.


Explain: the rule Mineral Water --> Spaghetti 


*   When considering the values of the best rules, the support value of the best rule is the largest out of the 3 choices, indicating a higher probability of mineral water and spaghetti products than the other options. (0.059 > 0.053 >0.051)
*   Also, the confidence of the best rule determines the highest probability that spaghetti can appear with mineral water when compared to mineral water vs eggs or chocolates. (25%>22%>21%)
*   the Lift for the rule Mineral Water --> Spaghetti is greater than one, then the rule offers better ability than does a random chance to predict purchase of Spaghetti given that the customer has already purchased a Mineral Water. 
















