# Market Basket analysis using Apriori algorithm 

## Objective : Understanding Association rules and its metrics using apriori algorithm on a dataset containing transactions with various products

## Market Basket Analysis
Is a process to analyze customer-buying habits by finding associations among the different items that customers place in their shopping baskets. The discovery of such associations can help retailers develop marketing strategies by gaining insight into which items are frequently purchased together by customers. For instance, market basket analysis may help managers optimize different store layouts. If customers who purchase milk also tend to buy bread at the same time, then placing the milk close or opposite to bread may help to increase the sales of both of these items.

Market Basket analysis can be done by identifying associations between products  and is called association rule mining.


## Association Rule(Affinity analysis)

Association rules are used to find relationships between objects which are frequently used together. It implies that if an item A occurs, then there is a probability of item B to occur as well. 

### Applications of Association Rule

1. Classification
2. Cross-merchandising
3. Physical/logical placement of products within categories
4. Promotional programs
5. Recommendations
6. Clickstream Analysis

#### Different statistical algorithms have been developed to implement association rule mining, and Apriori is one such algorithm. In this article we will study the theory behind the Apriori algorithm and will later implement Apriori algorithm in Python.

It uses the following criteria to measure the precision of the rule

1. Support - Support is the popularity(frequency of occurence) of an item. It can be calculated by number of transactions containing the item to total number of transactions.
    Support(A) = (Transactions containing item A)/(Total Transactions)  
               = 75/7501 =1% 

2. Confidence - Likelihood of occurence of item B if item A occurs(Conditional Proability).
    Confidence(A→B) = (Transactions containing both (A and B))/(Transaction containing item A) 

3. Lift - Increase in the ratio of occurence of item B if item A occurs.
    Lift(A -> B) = Confidence(a -> B) / Support(B)
    A Lift of 1 means there is no association between products A and B. Lift of greater than 1 means products A and B are more likely to be bought together. Finally, Lift of less than 1 refers to the case where two products are unlikely to be bought together.

## Implementation in Python

In [None]:
#Install apyori package before importing
'''
conda install --yes apyori
OR
pip install --yes apyori
'''

## Import packages

In [6]:
# Import necessary libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from apyori import apriori

## Import dataset

In [10]:
# Read the dataset using pandas into a dataframe with name "store_data"
store_data = pd.read_csv("store_data.csv",header = None)

In [11]:
store_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [38]:
store_data.shape

(7501, 20)

7501 indicates total number of transactions with different items bought together
20 idicates number of columns to display items

## Data Preprocessing

The Apriori library requires our dataset to be in the form of a list of lists. So the whole dataset is a big list and each transaction in the dataset is an inner list within the outer big list. 
[
    [transaction1],
    [transaction2],
    .
    .
    [transaction7501]
]
    
   
Convert our pandas dataframe into a list of lists as follows:


In [12]:
records = []
for i in range(0,7501):
    records.append([str(store_data.values[i,j]) for j in range(0,20)])

## Apriori algorithm

Parameters of apriori:
list of lists -> **records**
**min_support** : probability value to select the items with support values greater than the value specified by the parameter
**min_confidence** : probability value to filter rules with greater confidence than the specified threshold
**min_lift** : minimum lift value to shortlist the list of rules
**min_length** : minimum number of items you want in your rules


In [27]:
association_rules = apriori(records, min_support = 0.0045, min_confidence = .2, min_lift = 3, min_length = 2)
# Convert above rules into a list of rules
association_results = list(association_rules)

In [21]:
# Total number of rules generated
print(len(association_results))

48


In [20]:
# First rule
print(association_results[0])

RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)])


## Display the list of rules

In [37]:
for item in association_results:
    pair = item[0]
    items = [x for x in pair]
    print("Rule :"+ str(items[0]) + "->" + str(items[1]) )
    print("Support : {}".format(item[1]))
    print("Confidence : {}".format(item[2][0][2]))
    print("List : {}".format(item[2][0][3]))
    print("\n-------------------------------------------------\n")

    
  

Rule :light cream->chicken
Support : 0.004532728969470737
Confidence : 0.29059829059829057
List : 4.84395061728395

-------------------------------------------------

Rule :mushroom cream sauce->escalope
Support : 0.005732568990801226
Confidence : 0.3006993006993007
List : 3.790832696715049

-------------------------------------------------

Rule :escalope->pasta
Support : 0.005865884548726837
Confidence : 0.3728813559322034
List : 4.700811850163794

-------------------------------------------------

Rule :ground beef->herb & pepper
Support : 0.015997866951073192
Confidence : 0.3234501347708895
List : 3.2919938411349285

-------------------------------------------------

Rule :ground beef->tomato sauce
Support : 0.005332622317024397
Confidence : 0.3773584905660377
List : 3.840659481324083

-------------------------------------------------

Rule :olive oil->whole wheat pasta
Support : 0.007998933475536596
Confidence : 0.2714932126696833
List : 4.122410097642296

------------------------