# Apriori Association Rule:
The Apriori algorithm calculates rules that express probabilistic relationships between items in frequent itemsets For example, a rule derived from frequent itemsets containing A, B, and C might state that if A and B are included in a transaction, then C is likely to also be included.
## Grocery Store Data 

In [16]:
import pandas as pd
df=pd.read_csv('../input/supermarket/GroceryStoreDataSet.csv')
df.head()

Unnamed: 0,"MILK,BREAD,BISCUIT"
0,"BREAD,MILK,BISCUIT,CORNFLAKES"
1,"BREAD,TEA,BOURNVITA"
2,"JAM,MAGGI,BREAD,MILK"
3,"MAGGI,TEA,BISCUIT"
4,"BREAD,TEA,BOURNVITA"


In [13]:
print(df.shape)
df.columns

(19, 1)


Index(['MILK,BREAD,BISCUIT'], dtype='object')

Here we have products as named 'MILK,BREAD,BISCUIT' so lets rename this column and here we have only one column. 

In [17]:
df.rename(columns={'MILK,BREAD,BISCUIT':'Products'},inplace=True)

In [18]:
df

Unnamed: 0,Products
0,"BREAD,MILK,BISCUIT,CORNFLAKES"
1,"BREAD,TEA,BOURNVITA"
2,"JAM,MAGGI,BREAD,MILK"
3,"MAGGI,TEA,BISCUIT"
4,"BREAD,TEA,BOURNVITA"
5,"MAGGI,TEA,CORNFLAKES"
6,"MAGGI,BREAD,TEA,BISCUIT"
7,"JAM,MAGGI,BREAD,TEA"
8,"BREAD,MILK"
9,"COFFEE,COCK,BISCUIT,CORNFLAKES"


In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 1 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Products  19 non-null     object
dtypes: object(1)
memory usage: 280.0+ bytes


In [20]:
# Lets Split the data into seprate types or individual items 
data=list(df['Products'].apply(lambda x:x.split(',')))
data

[['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['JAM', 'MAGGI', 'BREAD', 'MILK'],
 ['MAGGI', 'TEA', 'BISCUIT'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['MAGGI', 'TEA', 'CORNFLAKES'],
 ['MAGGI', 'BREAD', 'TEA', 'BISCUIT'],
 ['JAM', 'MAGGI', 'BREAD', 'TEA'],
 ['BREAD', 'MILK'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'COCK'],
 ['BREAD', 'SUGER', 'BISCUIT'],
 ['COFFEE', 'SUGER', 'CORNFLAKES'],
 ['BREAD', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['TEA', 'MILK', 'COFFEE', 'CORNFLAKES']]

In [21]:
'''Now lets Encode database transaction data in form of a Python list of 
lists into a NumPy array, using Transaction Encoder function '''
from mlxtend.preprocessing import TransactionEncoder

In [28]:
# Here this function transforms the data from numpy to dataframe with the columns 
te=TransactionEncoder()
te_data=te.fit(data).transform(data)
df=pd.DataFrame(te_data,columns=te.columns_)
df

Unnamed: 0,BISCUIT,BOURNVITA,BREAD,COCK,COFFEE,CORNFLAKES,JAM,MAGGI,MILK,SUGER,TEA
0,True,False,True,False,False,True,False,False,True,False,False
1,False,True,True,False,False,False,False,False,False,False,True
2,False,False,True,False,False,False,True,True,True,False,False
3,True,False,False,False,False,False,False,True,False,False,True
4,False,True,True,False,False,False,False,False,False,False,True
5,False,False,False,False,False,True,False,True,False,False,True
6,True,False,True,False,False,False,False,True,False,False,True
7,False,False,True,False,False,False,True,True,False,False,True
8,False,False,True,False,False,False,False,False,True,False,False
9,True,False,False,True,True,True,False,False,False,False,False


In [29]:
from mlxtend.frequent_patterns import apriori

In [31]:
'''
Support for apriori algorithm calculation
Support refers to item’s frequency of occurrence 

'''
df1=apriori(df, min_support=0.01,use_colnames=True)
df1

Unnamed: 0,support,itemsets
0,0.315789,(BISCUIT)
1,0.210526,(BOURNVITA)
2,0.631579,(BREAD)
3,0.157895,(COCK)
4,0.421053,(COFFEE)
...,...,...
78,0.052632,"(BREAD, MAGGI, TEA, BISCUIT)"
79,0.105263,"(CORNFLAKES, COCK, COFFEE, BISCUIT)"
80,0.052632,"(BREAD, JAM, MAGGI, MILK)"
81,0.052632,"(BREAD, JAM, MAGGI, TEA)"


In [33]:
df1.sort_values(by='support',ascending=False)

Unnamed: 0,support,itemsets
2,0.631579,(BREAD)
4,0.421053,(COFFEE)
10,0.368421,(TEA)
0,0.315789,(BISCUIT)
5,0.315789,(CORNFLAKES)
...,...,...
55,0.052632,"(CORNFLAKES, MILK, BISCUIT)"
57,0.052632,"(BREAD, SUGER, BOURNVITA)"
17,0.052632,"(SUGER, BISCUIT)"
36,0.052632,"(TEA, COFFEE)"


In [34]:
df1['length'] = df1['itemsets'].apply(lambda x:len(x))
df1

Unnamed: 0,support,itemsets,length
0,0.315789,(BISCUIT),1
1,0.210526,(BOURNVITA),1
2,0.631579,(BREAD),1
3,0.157895,(COCK),1
4,0.421053,(COFFEE),1
...,...,...,...
78,0.052632,"(BREAD, MAGGI, TEA, BISCUIT)",4
79,0.105263,"(CORNFLAKES, COCK, COFFEE, BISCUIT)",4
80,0.052632,"(BREAD, JAM, MAGGI, MILK)",4
81,0.052632,"(BREAD, JAM, MAGGI, TEA)",4


In [35]:
df1[(df1['length']==2) & (df1['support']>=0.05)]

Unnamed: 0,support,itemsets,length
11,0.157895,"(BREAD, BISCUIT)",2
12,0.105263,"(COCK, BISCUIT)",2
13,0.105263,"(COFFEE, BISCUIT)",2
14,0.157895,"(CORNFLAKES, BISCUIT)",2
15,0.105263,"(MAGGI, BISCUIT)",2
16,0.052632,"(MILK, BISCUIT)",2
17,0.052632,"(SUGER, BISCUIT)",2
18,0.105263,"(TEA, BISCUIT)",2
19,0.157895,"(BREAD, BOURNVITA)",2
20,0.052632,"(COFFEE, BOURNVITA)",2


In [38]:
df1[(df1['length']==3) & (df1['support']>=0.1)]

Unnamed: 0,support,itemsets,length
52,0.105263,"(COCK, COFFEE, BISCUIT)",3
53,0.105263,"(CORNFLAKES, COCK, BISCUIT)",3
54,0.105263,"(CORNFLAKES, COFFEE, BISCUIT)",3
56,0.105263,"(MAGGI, TEA, BISCUIT)",3
58,0.105263,"(BREAD, TEA, BOURNVITA)",3
61,0.105263,"(BREAD, SUGER, COFFEE)",3
63,0.105263,"(BREAD, JAM, MAGGI)",3
67,0.105263,"(BREAD, MAGGI, TEA)",3
68,0.105263,"(CORNFLAKES, COFFEE, COCK)",3


Here we can obsereve that these items are most likely bought together most of the time or almost all the time, therefore these items should be next to each other's so that it might increase the customers purchase of this items and make more profits without distracting the customer's necessity. 

Rohan RK                                           
r.rohanrajendra@gmail.com                                                
+91 7975870924                                              
Thank You! :)