# Market Basket Analysis

- Market basket analysis is a data mining technique used by retailers to increase sales by better understanding customer purchasing patterns. 
- It involves analyzing large data sets, such as purchase history, to reveal product groupings, as well as products that are likely to be purchased together.

In [1]:
import pandas as pd
import numpy as np
from apyori import apriori

In [3]:
st_df=pd.read_csv(r"Market_Basket_Optimisation.csv",header=None)
st_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,butter,light mayo,fresh bread,,,,,,,,,,,,,,,,,
7497,burgers,frozen vegetables,eggs,french fries,magazines,green tea,,,,,,,,,,,,,,
7498,chicken,,,,,,,,,,,,,,,,,,,
7499,escalope,green tea,,,,,,,,,,,,,,,,,,


In [4]:
st_df.shape

(7501, 20)

In [5]:
#converting dataframe into list of lists
l=[]
for i in range(1,st_df.shape[0]):
    l.append([str(st_df.values[i,j]) for j in range(0,st_df.shape[1])])


> Apriori Algorithm:

Apriori Algorithm is a widely-used and well-known Association Rule algorithm and is a popular algorithm used in market basket analysis. It is also considered accurate and overtop AIS and SETM algorithms. It helps to find frequent itemsets in transactions and identifies association rules between these items. The limitation of the Apriori Algorithm is frequent itemset generation. It needs to scan the database many times which leads to increased time and reduce performance as it is a computationally costly step because of a huge database. It uses the concept of Confidence, Support.

In [6]:
#applying apriori algorithm
association_rules = apriori(l, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)

Association rules are normally written like this: {Diapers} -> {Beer} which means that there is a strong relationship between customers that purchased diapers and also purchased beer in the same transaction.

In the above example, the {Diaper} is the antecedent and the {Beer} is the consequent. Both antecedents and consequents can have multiple items. In other words, {Diaper, Gum} -> {Beer, Chips} is a valid rule.

Support is the relative frequency that the rules show up. In many instances, you may want to look for high support in order to make sure it is a useful relationship. However, there may be instances where a low support is useful if you are trying to find “hidden” relationships.

Confidence is a measure of the reliability of the rule. A confidence of .5 in the above example would mean that in 50% of the cases where Diaper and Gum were purchased, the purchase also included Beer and Chips. For product recommendation, a 50% confidence may be perfectly acceptable but in a medical situation, this level may not be high enough.

Lift is the ratio of the observed support to that expected if the two rules were independent (see wikipedia). The basic rule of thumb is that a lift value close to 1 means the rules were completely independent. Lift values > 1 are generally more “interesting” and could be indicative of a useful rule pattern.

### View the association rules

In [7]:
for i in range(0, len(association_results)):
    print(association_results[i][0])

frozenset({'chicken', 'light cream'})
frozenset({'mushroom cream sauce', 'escalope'})
frozenset({'pasta', 'escalope'})
frozenset({'herb & pepper', 'ground beef'})
frozenset({'ground beef', 'tomato sauce'})
frozenset({'olive oil', 'whole wheat pasta'})
frozenset({'pasta', 'shrimp'})
frozenset({'nan', 'chicken', 'light cream'})
frozenset({'frozen vegetables', 'shrimp', 'chocolate'})
frozenset({'spaghetti', 'cooking oil', 'ground beef'})
frozenset({'mushroom cream sauce', 'nan', 'escalope'})
frozenset({'nan', 'pasta', 'escalope'})
frozenset({'spaghetti', 'frozen vegetables', 'ground beef'})
frozenset({'frozen vegetables', 'olive oil', 'milk'})
frozenset({'frozen vegetables', 'shrimp', 'mineral water'})
frozenset({'spaghetti', 'frozen vegetables', 'olive oil'})
frozenset({'spaghetti', 'frozen vegetables', 'shrimp'})
frozenset({'spaghetti', 'frozen vegetables', 'tomatoes'})
frozenset({'spaghetti', 'grated cheese', 'ground beef'})
frozenset({'herb & pepper', 'ground beef', 'mineral water'})


In [8]:
Rule=[]
Support=[]
Confidence=[]
lift=[]

for item in association_results:
    # first index of the inner list
    # Contains base item and add item
    pair = item[0]
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])
    Rule_str=str(items[0]) + " -> " + str(items[1])
    Rule.append(Rule_str)
    
    # second index of the inner list
    print("Support: " + str(item[1]))
    Support_str=str(item[1])
    Support.append(Support_str)
    
    # third index of the list located at 0th position
    # of the third index of the inner list
    print("Confidence: " + str(item[2][0][2]))
    Confidence_str=str(item[2][0][2])
    Confidence.append(Confidence_str)
    
    print("Lift: " + str(item[2][0][3]))
    Lift_str=str(item[2][0][3])
    lift.append(Lift_str)
    print("-----------------------------------------------------")

Rule: chicken -> light cream
Support: 0.004533333333333334
Confidence: 0.2905982905982906
Lift: 4.843304843304844
-----------------------------------------------------
Rule: mushroom cream sauce -> escalope
Support: 0.005733333333333333
Confidence: 0.30069930069930073
Lift: 3.7903273197390845
-----------------------------------------------------
Rule: pasta -> escalope
Support: 0.005866666666666667
Confidence: 0.37288135593220345
Lift: 4.700185158809287
-----------------------------------------------------
Rule: herb & pepper -> ground beef
Support: 0.016
Confidence: 0.3234501347708895
Lift: 3.2915549671393096
-----------------------------------------------------
Rule: ground beef -> tomato sauce
Support: 0.005333333333333333
Confidence: 0.37735849056603776
Lift: 3.840147461662528
-----------------------------------------------------
Rule: olive oil -> whole wheat pasta
Support: 0.008
Confidence: 0.2714932126696833
Lift: 4.130221288078346
-----------------------------------------------

In [9]:
results_df=pd.DataFrame(columns=['Rule','Support','Confidence','lift'])

In [10]:
results_df['Rule']=Rule
results_df['Support']=Support
results_df['Confidence']=Confidence
results_df['lift']=lift

In [11]:
results_df['Rule']=results_df['Rule'].str.replace('nan ->','')

In [12]:
results_df['Rule']=results_df['Rule'].str.replace('-> nan','')

In [13]:
results_df

Unnamed: 0,Rule,Support,Confidence,lift
0,chicken -> light cream,0.0045333333333333,0.2905982905982906,4.843304843304844
1,mushroom cream sauce -> escalope,0.0057333333333333,0.3006993006993007,3.790327319739085
2,pasta -> escalope,0.0058666666666666,0.3728813559322034,4.700185158809287
3,herb & pepper -> ground beef,0.016,0.3234501347708895,3.2915549671393096
4,ground beef -> tomato sauce,0.0053333333333333,0.3773584905660377,3.840147461662528
5,olive oil -> whole wheat pasta,0.008,0.2714932126696833,4.130221288078346
6,pasta -> shrimp,0.0050666666666666,0.3220338983050848,4.514493901473151
7,chicken,0.0045333333333333,0.2905982905982906,4.843304843304844
8,frozen vegetables -> shrimp,0.0053333333333333,0.2325581395348837,3.260160834601174
9,spaghetti -> cooking oil,0.0048,0.5714285714285714,3.281557646029315
