Association rules analysis is a technique to uncover how items are associated to each other. There are three common ways to measure association.

Support: This says how popular an itemset is. It is the number of times an itemset appears in the database of transactions. In other words, it is the frequency of an itemset.
    
Confidence: This says how likely it is for item Y to be purchased when item X is purchased. It is expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.
    
Lift:  It is the ratio of expected confidence to observed confidence. It is described as confidence of Y when item X was already known (x/y) to the confidence of Y when item X is unknown.

**support = occurrance of item / total no of transactions**

**confidence = support ( X Union Y) / support(X)**

**lift = support (X Union Y) / support(X) * support(Y)**
    


In [2]:
# External package needed to be installed for Apriori algorithm
!pip install apyori

Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [None]:
# Import necessary packages
import pandas as pd
import numpy as np
from apyori import apriori

In [None]:
# Load the market basket dataset which is available in Google drive. The dataset does not have a separate header.
from google.colab import files
uploaded=files.upload()

Saving Market_Basket_Optimisation.csv to Market_Basket_Optimisation (1).csv


In [None]:
import io
df=pd.read_csv(io.BytesIO(uploaded['Market_Basket_Optimisation (1).csv']))
print(df)

                 a1                 a2           a3                a4  \
0            shrimp            almonds      avocado    vegetables mix   
1           burgers          meatballs         eggs               NaN   
2           chutney                NaN          NaN               NaN   
3            turkey            avocado          NaN               NaN   
4     mineral water               milk   energy bar  whole wheat rice   
...             ...                ...          ...               ...   
7496         butter         light mayo  fresh bread               NaN   
7497        burgers  frozen vegetables         eggs      french fries   
7498        chicken                NaN          NaN               NaN   
7499       escalope          green tea          NaN               NaN   
7500           eggs    frozen smoothie  yogurt cake    low fat yogurt   

                a5                a6    a7              a8            a9  \
0     green grapes  whole weat flour  yams  cot

In [None]:
# Find the total number of transactions
rc=df.shape[0]
print(rc)

7501


In [None]:
# Find the maximum number of items found in a transaction
max(df.count(axis='columns'))

20

In [None]:
# Display the first 5 rows of the dataset.
df.head(5)

Unnamed: 0,a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [None]:
# Replace empty values with 0.
df=df.fillna(0)

In [None]:
# Verify the correctness of the above step.

In [None]:
# For using Aprori agorithm, every transaction must be converted to a list
# Write code to create a nested list called transaction which will contain all the transactions
# An illustration is shown below:
# transaction = [['apple','almonds'],['apple'],['banana','apple']]
df.shape
list1=[]
for i in range(0,7501):
  list1.append([str(df.values[i,j])for j in range (0,20)])
list1

[['shrimp',
  'almonds',
  'avocado',
  'vegetables mix',
  'green grapes',
  'whole weat flour',
  'yams',
  'cottage cheese',
  'energy drink',
  'tomato juice',
  'low fat yogurt',
  'green tea',
  'honey',
  'salad',
  'mineral water',
  'salmon',
  'antioxydant juice',
  'frozen smoothie',
  'spinach',
  'olive oil'],
 ['burgers',
  'meatballs',
  'eggs',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0'],
 ['chutney',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0'],
 ['turkey',
  'avocado',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0'],
 ['mineral water',
  'milk',
  'energy bar',
  'whole wheat rice',
  'green tea',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0'],
 ['low fat yogurt',
  '0',
  '

In [None]:
# Verify the correctness of the above step

In [None]:
# Call apriori function by providing a support threshold of 0.003, a confidence threshold of 0.2 and a lift threshold of 3
# Store the rules generated in a variable called rules
rules=apriori(list1,support=0.003,confidence=0.2,lift=3)
rules=list(rules)
rules

[RelationRecord(items=frozenset({'0'}), support=0.9998666844420744, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'0'}), confidence=0.9998666844420744, lift=1.0)]),
 RelationRecord(items=frozenset({'chocolate'}), support=0.1638448206905746, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'chocolate'}), confidence=0.1638448206905746, lift=1.0)]),
 RelationRecord(items=frozenset({'eggs'}), support=0.17970937208372217, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'eggs'}), confidence=0.17970937208372217, lift=1.0)]),
 RelationRecord(items=frozenset({'french fries'}), support=0.1709105452606319, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'french fries'}), confidence=0.1709105452606319, lift=1.0)]),
 RelationRecord(items=frozenset({'green tea'}), support=0.13211571790427942, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=fr

In [None]:
# Report the data type of the rules variable

In [None]:
# Convert the set of rules to a list

In [None]:
# Convert the results to a pandas dataframe for further operation

In [None]:
# Display the first 5 rows of the dataframe created in the above step

In [None]:
# Keep support in a separate dataframe so we can use later

In [None]:
# Prepare a dataframe of transactions with support, confidence and lift values