Association rules analysis is a technique to uncover how items are associated to each other. There are three common ways to measure association.

Support: This says how popular an itemset is. It is the number of times an itemset appears in the database of transactions. In other words, it is the frequency of an itemset.
    
Confidence: This says how likely it is for item Y to be purchased when item X is purchased. It is expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.
    
Lift:  It is the ratio of expected confidence to observed confidence. It is described as confidence of Y when item X was already known (x/y) to the confidence of Y when item X is unknown.

**support = occurrance of item / total no of transactions**

**confidence = support ( X Union Y) / support(X)**

**lift = support (X Union Y) / support(X) * support(Y)**
    


In [None]:
# External package needed to be installed for Apriori algorithm
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l[?25hdone
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5954 sha256=b91d9c342c0d723a9afdb9e61b15f51121a3d0b432e63daa951354c6f1dda677
  Stored in directory: /root/.cache/pip/wheels/7f/49/e3/42c73b19a264de37129fadaa0c52f26cf50e87de08fb9804af
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [1]:
import pandas as pd
import numpy as np
from apyori import apriori

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [7]:
#loading market basket dataset

df = pd.read_csv('Market_Basket_Optimisation.csv', header=None)

In [8]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [9]:
#replacing empty value with 0.
df.fillna(0,inplace=True)

In [10]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,chutney,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,turkey,avocado,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,mineral water,milk,energy bar,whole wheat rice,green tea,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [11]:
#for using aprori need to convert data in list format..
# transaction = [['apple','almonds'],['apple'],['banana','apple']]....
transactions = []
for i in range(0,len(df)):
    transactions.append([str(df.values[i,j]) for j in range(0,20) if str(df.values[i,j])!='0'])
# why range(0, 20)?
# because a transaction contains a maximum of 20 items

In [12]:
transactions[0]

['shrimp',
 'almonds',
 'avocado',
 'vegetables mix',
 'green grapes',
 'whole weat flour',
 'yams',
 'cottage cheese',
 'energy drink',
 'tomato juice',
 'low fat yogurt',
 'green tea',
 'honey',
 'salad',
 'mineral water',
 'salmon',
 'antioxydant juice',
 'frozen smoothie',
 'spinach',
 'olive oil']

In [13]:
#apriori function requires: minimum support, confidence and lift, min length is combination of item; default is 2
rules = apriori(transactions, min_support=0.003, min_confidence=0.2, min_lift=3, min_length=2)

In [14]:
# all rules need to be converted in a list..
Results = list(rules)

In [15]:
#convert result in a dataframe for further operation...
df_results = pd.DataFrame(Results)

In [16]:
# as we see order statistics itself a list so need to be converted in proper format..
df_results.head()

Unnamed: 0,items,support,ordered_statistics
0,"(light cream, chicken)",0.004533,"[((light cream), (chicken), 0.2905982905982905..."
1,"(escalope, mushroom cream sauce)",0.005733,"[((mushroom cream sauce), (escalope), 0.300699..."
2,"(escalope, pasta)",0.005866,"[((pasta), (escalope), 0.3728813559322034, 4.7..."
3,"(honey, fromage blanc)",0.003333,"[((fromage blanc), (honey), 0.2450980392156863..."
4,"(herb & pepper, ground beef)",0.015998,"[((herb & pepper), (ground beef), 0.3234501347..."


In [17]:
# as we see order statistics itself a list so need to be converted in proper format..
df_results.tail()

Unnamed: 0,items,support,ordered_statistics
75,"(mineral water, spaghetti, olive oil, ground b...",0.003066,"[((olive oil, ground beef), (mineral water, sp..."
76,"(mineral water, spaghetti, pancakes, ground beef)",0.003066,"[((pancakes, ground beef), (mineral water, spa..."
77,"(mineral water, spaghetti, ground beef, tomatoes)",0.003066,"[((ground beef, tomatoes), (mineral water, spa..."
78,"(mineral water, spaghetti, milk, olive oil)",0.003333,"[((mineral water, milk, spaghetti), (olive oil..."
79,"(mineral water, milk, spaghetti, tomatoes)",0.003333,"[((milk, tomatoes), (mineral water, spaghetti)..."


In [18]:
#keep support in a separate data frame so we can use later..
support = df_results.support

In [23]:
df_results['ordered_statistics'][0]

[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]

In [None]:
#all four empty list which will contain lhs, rhs, confidence and lift respectively.

first_values = []
second_values = []
third_values = []
fourth_value = []

# loop number of rows time and append 1 by 1 value in a separate list.. first and second element was frozenset which need to be converted in list..
for i in range(df_results.shape[0]):
    single_list = df_results['ordered_statistics'][i][0]
    first_values.append(list(single_list[0]))
    second_values.append(list(single_list[1]))
    third_values.append(single_list[2])
    fourth_value.append(single_list[3])

In [None]:
#convert all four list into dataframe for further operation..
lhs = pd.DataFrame(first_values)
rhs= pd.DataFrame(second_values)
confidence=pd.DataFrame(third_values,columns=['Confidence'])
lift=pd.DataFrame(fourth_value,columns=['lift'])

In [None]:
#concat all list together in a single dataframe
df_final = pd.concat([lhs,rhs,support,confidence,lift], axis=1)

In [None]:
df_final.head()

Unnamed: 0,0,1,2,0.1,1.1,support,Confidence,lift
0,light cream,,,chicken,,0.004533,0.290598,4.843951
1,mushroom cream sauce,,,escalope,,0.005733,0.300699,3.790833
2,pasta,,,escalope,,0.005866,0.372881,4.700812
3,fromage blanc,,,honey,,0.003333,0.245098,5.164271
4,herb & pepper,,,ground beef,,0.015998,0.32345,3.291994


In [None]:
'''
 we have some of place only 1 item in lhs and some place 3 or more so we need to a proper represenation for user to understand.
 removing none with ' ' extra so when we combine three column in 1 then only 1 item will be there with spaces which is proper rather than none.
 example : coffee,none,none which converted to coffee, ,
'''
df_final.fillna(value=' ', inplace=True)

In [None]:
#set column name
df_final.columns = ['lhs',1,2,3,'rhs','support','confidence','lift']

In [None]:
#add all three column because those where the lhs itemset only
df_final['lhs'] = df_final['lhs']+str(", ")+df_final[1]+str(", ")+df_final[2]

In [None]:
#drop those 1,2 column because now we already appended to lhs column..
df_final.drop(columns=[1,2],inplace=True)

In [None]:
#this is final output.. you can sort based on the support lift and confidance..
df_final.head()

Unnamed: 0,lhs,3,rhs,support,confidence,lift
0,"light cream, ,",chicken,,0.004533,0.290598,4.843951
1,"mushroom cream sauce, ,",escalope,,0.005733,0.300699,3.790833
2,"pasta, ,",escalope,,0.005866,0.372881,4.700812
3,"fromage blanc, ,",honey,,0.003333,0.245098,5.164271
4,"herb & pepper, ,",ground beef,,0.015998,0.32345,3.291994
