<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Performing-a-Market-Basket-Analysis-with-apyori" data-toc-modified-id="Performing-a-Market-Basket-Analysis-with-apyori-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Performing a Market Basket Analysis with apyori</a></span><ul class="toc-item"><li><span><a href="#Installing-the-Modules" data-toc-modified-id="Installing-the-Modules-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Installing the Modules</a></span></li><li><span><a href="#Importing-the-required-libraries" data-toc-modified-id="Importing-the-required-libraries-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Importing the required libraries</a></span></li><li><span><a href="#Preparing-the-Dataset" data-toc-modified-id="Preparing-the-Dataset-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Preparing the Dataset</a></span></li><li><span><a href="#Data-Proprocessing" data-toc-modified-id="Data-Proprocessing-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Data Proprocessing</a></span></li><li><span><a href="#Applying-Apriori" data-toc-modified-id="Applying-Apriori-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Applying Apriori</a></span></li><li><span><a href="#Viewing-the-Results" data-toc-modified-id="Viewing-the-Results-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Viewing the Results</a></span></li></ul></li></ul></div>


# Performing a Market Basket Analysis with apyori 


* [apyori 1.1.2 Documentation ](https://pypi.org/project/apyori/)





## Installing the Modules

In [2]:
!pip install apyori

import apyori
from apyori import apriori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25ldone
[?25h  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=165dfd13dd5c334f984b3bcd9dcd3145c1508e4b6ec172a5271bcfb02e0287e0
  Stored in directory: /Users/surekha/Library/Caches/pip/wheels/cb/f6/e1/57973c631d27efd1a2f375bd6a83b2a616c4021f24aab84080
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


## Importing the required libraries

In [4]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from apyori import apriori

## Preparing the Dataset

In [2]:
#Now let's import the dataset and see what we're working with. 

store_data = pd.read_csv('store_data.csv')

store_data.head()

Unnamed: 0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
0,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
1,chutney,,,,,,,,,,,,,,,,,,,
2,turkey,avocado,,,,,,,,,,,,,,,,,,
3,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
4,low fat yogurt,,,,,,,,,,,,,,,,,,,


A snippet of the dataset is shown in the above screenshot. 

If you carefully look at the data, we can see that the header is actually the first transaction. Each row corresponds to a transaction and each column corresponds to an item purchased in that specific transaction. 

The NaN tells us that the item represented by the column was not purchased in that specific transaction.

In this dataset there is no header row. But by default, pd.read_csv function treats first row as header. 

To get rid of this problem, add header=None option to pd.read_csv function, as shown below:

In [3]:
store_data = pd.read_csv('store_data.csv', header=None)

store_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


Now we will use the Apriori algorithm to find out which items are commonly sold together, so that store owner can take action to place the related items together or advertise 
them together in order to have increased profit.

In [4]:
store_data.shape

(7501, 20)

## Data Proprocessing

The Apriori library we are going to use requires our dataset to be in the form of a list of lists, 
where the whole dataset is a big list and each transaction in the dataset is an inner list 
within the outer big list. 

Currently we have data in the form of a pandas dataframe. 

To convert our pandas dataframe into a list of lists, execute the following script:

In [None]:
records = []
for i in range(0, 7501):
    records.append([str(store_data.values[i,j]) for j in range(0, 20)])

In [13]:
records[:3]

[['shrimp',
  'almonds',
  'avocado',
  'vegetables mix',
  'green grapes',
  'whole weat flour',
  'yams',
  'cottage cheese',
  'energy drink',
  'tomato juice',
  'low fat yogurt',
  'green tea',
  'honey',
  'salad',
  'mineral water',
  'salmon',
  'antioxydant juice',
  'frozen smoothie',
  'spinach',
  'olive oil'],
 ['burgers',
  'meatballs',
  'eggs',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan'],
 ['chutney',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan']]

In [14]:
len(records)

7501

## Applying Apriori

The next step is to apply the Apriori algorithm on the dataset. 
To do so, we can use the apriori function that we imported from the apyori library.

```Python
apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
```

The apriori class requires some parameter values to work. 

1. The first parameter is the list of list that you want to extract rules from. 
2. The second parameter is the min_support parameter. This parameter is used to select the items with support values greater than the value specified by the parameter. 
3. Next, the min_confidence parameter filters those rules that have confidence greater than the confidence threshold specified by the parameter. 

4. Similarly, the min_lift parameter specifies the minimum lift value for the short listed rules. 
5. Finally, the min_length parameter specifies the minimum number of items that you want in your rules.

Let's suppose that we want rules for only those items that are purchased at least 5 times a day, or 7 x 5 = 35 times in one week, since our dataset is for a one-week time period. 

* The support for those items can be calculated as 35/7500 = 0.0045. 
* The minimum confidence for the rules is 20% or 0.2. 


Similarly, we specify the value for lift as 3 and finally min_length is 2 since we want at least two products in our rules. 

These values are mostly just arbitrarily chosen, so you can play with these values and see what difference it makes in the rules you get back out.


In [15]:
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)

## Viewing the Results

In [17]:
#Let's first find the total number of rules mined by the apriori class. 
#Execute the following script:

association_results

print(len(association_results))

48


There are 48 items. Each item corresponds to one rule.

In [18]:
#Let's print the first item in the association_rules list to see the first rule. 
#Execute the following script:

print(association_results[0])

RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)])


For instance from the first item, we can see that light cream and chicken are commonly bought together. This makes sense since people who purchase light cream are careful about what they eat hence they are more likely to buy chicken i.e. white meat instead of red meat i.e. beef. 
Or this could mean that light cream is commonly used in recipes for chicken.


* The **support value** for the first rule is 0.0045. 
This number is calculated by dividing the number of transactions containing light cream and chicken divided by total number of transactions. 


* The **confidence level** for the rule is 0.2905 which shows that out of all the transactions that contain light cream, 29.05% of the transactions also contain chicken. 


* Finally, the **lift** of 4.84 tells us that chicken is 4.84 times more likely to be bought by the customers who buy light cream compared to the default likelihood of the sale of chicken.

The following script displays the rule, the support, the confidence, and lift for each rule in a more clear way:

In [19]:
for item in association_results:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0] 
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Rule: chicken -> light cream
Support: 0.004532728969470737
Confidence: 0.29059829059829057
Lift: 4.84395061728395
Rule: mushroom cream sauce -> escalope
Support: 0.005732568990801226
Confidence: 0.3006993006993007
Lift: 3.790832696715049
Rule: escalope -> pasta
Support: 0.005865884548726837
Confidence: 0.3728813559322034
Lift: 4.700811850163794
Rule: herb & pepper -> ground beef
Support: 0.015997866951073192
Confidence: 0.3234501347708895
Lift: 3.2919938411349285
Rule: tomato sauce -> ground beef
Support: 0.005332622317024397
Confidence: 0.3773584905660377
Lift: 3.840659481324083
Rule: whole wheat pasta -> olive oil
Support: 0.007998933475536596
Confidence: 0.2714932126696833
Lift: 4.122410097642296
Rule: shrimp -> pasta
Support: 0.005065991201173177
Confidence: 0.3220338983050847
Lift: 4.506672147735896
Rule: chicken -> light cream
Support: 0.004532728969470737
Confidence: 0.29059829059829057
Lift: 4.84395061728395
Rule: chocolate -> frozen vegetables
Support: 0.005332622317024397
Con

    
    
Let's now discuss the second rule. 

The second rule states that mushroom cream sauce and escalope are bought frequently. 

* The support for mushroom cream sauce is 0.0057. 

* The confidence for this rule is 0.3006 which means that out of all the transactions containing mushroom, #30.06% of the transactions are likely to contain escalope as well. 

* Finally, lift of 3.79 shows that the escalope is 3.79 more likely to be bought by the customers that buy mushroom cream sauce, compared to its default sale.