*Market Basket Analysis using the Apriori method*

We need to import the required libraries. Python provides the apyori as an API which needs to be imported to run the apriori algorithm. 

In [4]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori

Now we are reading the dataset that is downloaded from Kaggle. As there is no header in the dataset and the first row contains the first transaction, that is why we have mentioned header = None here.

In [7]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


Once we have read the dataset, we need to get the list of items in each transaction. SO we will run two loops here. One for the total number of transactions, and other for the total number of columns in each transaction. This list will work as a training set from where we can generate the list of association rules.

In [10]:
#Getting the list of transactions from the dataset
transactions = []
for i in range(0, 7501):
    transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])

Now once we are ready with the list of items in our training set, we need to run the apriori algorithm which will learn the list of association rules from the training set. Suppose we want to find the association of items with a product which is sold at least 3 times a day. So, the minimum support here will be 3 items per day multiplied by 7 days of weak and divided by the total number of transactions. That means (3*7)/7501 =  0.00279. So the equivalent 0.003 is taken here as support. Now let us we are looking for a 30% confidence in the association rule so we have kept 0.3 as the minimum confidence. The minimum lift is taken as 3 and the minimum length is considered as 2 because we want to find an association between a minimum of two items. These hyperparameters can be tuned depending on the business requirements. 

In [20]:
# Training Apriori algorithm on the dataset
rule_list = apriori(transactions, min_support = 0.003, min_confidence = 0.3, min_lift = 3, min_length = 2)

After executing the above line of code, we have generated the list of association rules between the items of the retail. To see these rules, the below line of code needs to be executed.

In [21]:
# Visualizing the list of rules
results = list(rule_list)
#filtering the results for viewing
results=results[:5]
for i in results:
    print('\n')
    print(i)
    print('**********') 



RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)])
**********


RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)])
**********


RelationRecord(items=frozenset({'ground beef', 'herb & pepper'}), support=0.015997866951073192, ordered_statistics=[OrderedStatistic(items_base=frozenset({'herb & pepper'}), items_add=frozenset({'ground beef'}), confidence=0.3234501347708895, lift=3.2919938411349285)])
**********


RelationRecord(items=frozenset({'tomato sauce', 'ground beef'}), support=0.005332622317024397, ordered_statistics=[OrderedStatistic(items_base=frozenset({'tomato sau

As we can see in the above output, there are rules generated along with confidence. The first rule indicates an association between mushroom cream sauce and escalope with a confidence of 30%. The next rule shows an association between escalope and pasta with a confidence of 37.28%. There are 102 rules generated in this experiment. The number of generated rules depends on the values of hyperparameters. We can increase the minimum confidence value and find the rules accordingly.

So, this is a way of market basket analysis association rule learning. In this experiment, we have used the apriori algorithms. We can also use other algorithms such as Eclat and FP-Growth for the same purpose. 