# Association Rules
- Association Rules Analysis has become familiar for analysis in the retail industry. It is also called Market Basket Analysis terms. This analysis is also used for advice. Personal recommendations in applications such as Spotify, Netflix, and Youtube can be given as examples.

- Association Rules are derived to understand which products go together.

- Once we get these types of association rules between various products, we can solve multiple business problem statements such as:

 1- Products to stock
 
 2- Promotion on various products
 
 3- Implementing strategies to arrange the products in store.

 4- Giving extra offers on products which are not getting sold.
 
 5- Building strategies to improve the customer feedbacks

# Apriori Algorithm
- The Apriori Algorithm, used for the first phase of the Association Rules, is the most popular and classical algorithm in the frequent old parts.
- Apriori algorithm is a classical approach to find frequent patterns and highly related products.
- The goal is to find combinations of products that are often bought together, which we call frequent itemsets. The technical term for the domain is Frequent Itemset Mining.
 The importance of Association rule is determined by three metrics:

    1.Support:This measure gives an idea of how frequent an itemset is in all the transactions.

    2.Confidence: This measure defines the likeliness of occurrence of consequent on the cart given that the cart already has the antecedents.
    
    3.Lift: Lift tells you how strong the association rule is.
    
    4.Leverage: With and without item A is in the transaction, mow much it affect item B?
    
    5.Conviction: Conviction helps to judge if the rule happened to be there by chance or not.
 

STEPS INVOLVED IN APRIORI ALGORITHM:

1. Compute the support value for each item:
        -The support is simply the number of transactions in which a specific product (or combination of products) occurs.
2. Deciding the support threshold
        -Selection of support threshold depends on domain knowledge and the dataset.
3. Selecting the one item set based on the support value.
4. Selecting two item set:
        -The next step is to do the same analysis, but now using pairs of products instead of individual products.
5. Repeat the same step for larger sets.
6. Generate association rule and calculate confidence.
7. Compute lift ratio.

In [2]:
#Install required library
!pip install mlxtend

Collecting mlxtend
  Downloading mlxtend-0.22.0-py2.py3-none-any.whl (1.4 MB)
                                              0.0/1.4 MB ? eta -:--:--
     ------                                   0.2/1.4 MB 6.9 MB/s eta 0:00:01
     --------------------                     0.7/1.4 MB 8.9 MB/s eta 0:00:01
     -------------------------                0.9/1.4 MB 7.8 MB/s eta 0:00:01
     ----------------------------             1.0/1.4 MB 6.7 MB/s eta 0:00:01
     ---------------------------------        1.1/1.4 MB 5.5 MB/s eta 0:00:01
     ---------------------------------------  1.4/1.4 MB 5.4 MB/s eta 0:00:01
     ---------------------------------------- 1.4/1.4 MB 4.8 MB/s eta 0:00:00
Installing collected packages: mlxtend
Successfully installed mlxtend-0.22.0


In [4]:
#Import libraries
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import association_rules,apriori

In [8]:
df = pd.read_csv("C:\\Users\\Priyanshu Chauhan\\Downloads\\Titanic.csv")
df

Unnamed: 0,Class,Gender,Age,Survived
0,3rd,Male,Child,No
1,3rd,Male,Child,No
2,3rd,Male,Child,No
3,3rd,Male,Child,No
4,3rd,Male,Child,No
...,...,...,...,...
2196,Crew,Female,Adult,Yes
2197,Crew,Female,Adult,Yes
2198,Crew,Female,Adult,Yes
2199,Crew,Female,Adult,Yes


In [9]:
pd.get_dummies(df)

Unnamed: 0,Class_1st,Class_2nd,Class_3rd,Class_Crew,Gender_Female,Gender_Male,Age_Adult,Age_Child,Survived_No,Survived_Yes
0,0,0,1,0,0,1,0,1,1,0
1,0,0,1,0,0,1,0,1,1,0
2,0,0,1,0,0,1,0,1,1,0
3,0,0,1,0,0,1,0,1,1,0
4,0,0,1,0,0,1,0,1,1,0
...,...,...,...,...,...,...,...,...,...,...
2196,0,0,0,1,1,0,1,0,0,1
2197,0,0,0,1,1,0,1,0,0,1
2198,0,0,0,1,1,0,1,0,0,1
2199,0,0,0,1,1,0,1,0,0,1


In [10]:
data = pd.get_dummies(df)
data

Unnamed: 0,Class_1st,Class_2nd,Class_3rd,Class_Crew,Gender_Female,Gender_Male,Age_Adult,Age_Child,Survived_No,Survived_Yes
0,0,0,1,0,0,1,0,1,1,0
1,0,0,1,0,0,1,0,1,1,0
2,0,0,1,0,0,1,0,1,1,0
3,0,0,1,0,0,1,0,1,1,0
4,0,0,1,0,0,1,0,1,1,0
...,...,...,...,...,...,...,...,...,...,...
2196,0,0,0,1,1,0,1,0,0,1
2197,0,0,0,1,1,0,1,0,0,1
2198,0,0,0,1,1,0,1,0,0,1
2199,0,0,0,1,1,0,1,0,0,1


In [11]:
#Build opriori algorithim
apriori(data, min_support = 0.2)

Unnamed: 0,support,itemsets
0,0.320763,(2)
1,0.40209,(3)
2,0.213539,(4)
3,0.786461,(5)
4,0.950477,(6)
5,0.676965,(8)
6,0.323035,(9)
7,0.231713,"(2, 5)"
8,0.284871,"(2, 6)"
9,0.239891,"(8, 2)"


In [12]:
scores = apriori(data, min_support = 0.2,use_colnames = True)
scores

Unnamed: 0,support,itemsets
0,0.320763,(Class_3rd)
1,0.40209,(Class_Crew)
2,0.213539,(Gender_Female)
3,0.786461,(Gender_Male)
4,0.950477,(Age_Adult)
5,0.676965,(Survived_No)
6,0.323035,(Survived_Yes)
7,0.231713,"(Gender_Male, Class_3rd)"
8,0.284871,"(Age_Adult, Class_3rd)"
9,0.239891,"(Class_3rd, Survived_No)"


In [13]:
#Building association rule

In [14]:
association_rules(scores)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Class_3rd),(Age_Adult),0.320763,0.950477,0.284871,0.888102,0.934375,-0.020008,0.442572,-0.093712
1,(Class_Crew),(Gender_Male),0.40209,0.786461,0.39164,0.974011,1.238474,0.075412,8.216621,0.322047
2,(Class_Crew),(Age_Adult),0.40209,0.950477,0.40209,1.0,1.052103,0.019913,inf,0.082827
3,(Gender_Male),(Age_Adult),0.786461,0.950477,0.757383,0.963027,1.013204,0.00987,1.339441,0.061028
4,(Survived_No),(Gender_Male),0.676965,0.786461,0.619718,0.915436,1.163995,0.087312,2.525187,0.436144
5,(Survived_No),(Age_Adult),0.676965,0.950477,0.653339,0.965101,1.015386,0.0099,1.419023,0.046906
6,(Survived_Yes),(Age_Adult),0.323035,0.950477,0.297138,0.919831,0.967757,-0.0099,0.617734,-0.046906
7,"(Gender_Male, Class_3rd)",(Age_Adult),0.231713,0.950477,0.209905,0.905882,0.953082,-0.010333,0.526181,-0.060217
8,"(Survived_No, Class_3rd)",(Age_Adult),0.239891,0.950477,0.216265,0.901515,0.948487,-0.011746,0.502848,-0.066686
9,"(Age_Adult, Class_Crew)",(Gender_Male),0.40209,0.786461,0.39164,0.974011,1.238474,0.075412,8.216621,0.322047


# Supermarket

In [15]:
df = pd.read_csv("C:\\Users\\Priyanshu Chauhan\\Downloads\\Supermarket.csv",index_col = 0)
df

Unnamed: 0_level_0,Products
ID,Unnamed: 1_level_1
1,"Milk,Bread,sauce"
2,"Milk,Tea powder,Bread"
3,"Bread,Jam,Butter"
4,"Bread,Butter"
5,"Maggie,Sauce"
6,"Maggie,Cheese,Sauce"
7,"Maggie,Cheese,Sauce"
8,"Peanut butter,Bread"
9,"Coffee,Sugar,Milk"
10,"Coffee,Milk"


In [16]:
df.iloc[0]

Products    Milk,Bread,sauce
Name: 1, dtype: object

In [17]:
df.iloc[[0]] #for tabuler formate

Unnamed: 0_level_0,Products
ID,Unnamed: 1_level_1
1,"Milk,Bread,sauce"


In [18]:
text = 'Milk,Bread,sauce'
text

'Milk,Bread,sauce'

In [19]:
text.split(',')

['Milk', 'Bread', 'sauce']

In [20]:
for i in df['Products']:
    print(i)

Milk,Bread,sauce
Milk,Tea powder,Bread
Bread,Jam,Butter
Bread,Butter
Maggie,Sauce
Maggie,Cheese,Sauce
Maggie,Cheese,Sauce
Peanut butter,Bread
Coffee,Sugar,Milk
Coffee,Milk
Maggie,Cheese,Sauce
Bread,Jam,Butter
Butter,Cheese
Maggie,Cheese,Sauce
Maggie,Bread 
Bread,Torch,Jam
Bread,Jam,Butter
Jam,Butter,Torch
Bread,Jam,Butter
CornFlakes,Milk,Bread


In [21]:
for i in df['Products']:
    print(i.split(',')) #one way

['Milk', 'Bread', 'sauce']
['Milk', 'Tea powder', 'Bread']
['Bread', 'Jam', 'Butter']
['Bread', 'Butter']
['Maggie', 'Sauce']
['Maggie', 'Cheese', 'Sauce']
['Maggie', 'Cheese', 'Sauce']
['Peanut butter', 'Bread']
['Coffee', 'Sugar', 'Milk']
['Coffee', 'Milk']
['Maggie', 'Cheese', 'Sauce']
['Bread', 'Jam', 'Butter']
['Butter', 'Cheese']
['Maggie', 'Cheese', 'Sauce']
['Maggie', 'Bread ']
['Bread', 'Torch', 'Jam']
['Bread', 'Jam', 'Butter']
['Jam', 'Butter', 'Torch']
['Bread', 'Jam', 'Butter']
['CornFlakes', 'Milk', 'Bread']


In [22]:
def txt_split(txt):
    return txt.split(',')

In [24]:
df['Products'].apply(txt_split) #2nd way

ID
1          [Milk, Bread, sauce]
2     [Milk, Tea powder, Bread]
3          [Bread, Jam, Butter]
4               [Bread, Butter]
5               [Maggie, Sauce]
6       [Maggie, Cheese, Sauce]
7       [Maggie, Cheese, Sauce]
8        [Peanut butter, Bread]
9         [Coffee, Sugar, Milk]
10               [Coffee, Milk]
11      [Maggie, Cheese, Sauce]
12         [Bread, Jam, Butter]
13             [Butter, Cheese]
14      [Maggie, Cheese, Sauce]
15             [Maggie, Bread ]
16          [Bread, Torch, Jam]
17         [Bread, Jam, Butter]
18         [Jam, Butter, Torch]
19         [Bread, Jam, Butter]
20    [CornFlakes, Milk, Bread]
Name: Products, dtype: object

In [25]:
#list comprehension 

[i.split(',') for i in df['Products']] # 3rd way 

[['Milk', 'Bread', 'sauce'],
 ['Milk', 'Tea powder', 'Bread'],
 ['Bread', 'Jam', 'Butter'],
 ['Bread', 'Butter'],
 ['Maggie', 'Sauce'],
 ['Maggie', 'Cheese', 'Sauce'],
 ['Maggie', 'Cheese', 'Sauce'],
 ['Peanut butter', 'Bread'],
 ['Coffee', 'Sugar', 'Milk'],
 ['Coffee', 'Milk'],
 ['Maggie', 'Cheese', 'Sauce'],
 ['Bread', 'Jam', 'Butter'],
 ['Butter', 'Cheese'],
 ['Maggie', 'Cheese', 'Sauce'],
 ['Maggie', 'Bread '],
 ['Bread', 'Torch', 'Jam'],
 ['Bread', 'Jam', 'Butter'],
 ['Jam', 'Butter', 'Torch'],
 ['Bread', 'Jam', 'Butter'],
 ['CornFlakes', 'Milk', 'Bread']]

In [26]:
data = [i.split(',') for i in df['Products']] 
data

[['Milk', 'Bread', 'sauce'],
 ['Milk', 'Tea powder', 'Bread'],
 ['Bread', 'Jam', 'Butter'],
 ['Bread', 'Butter'],
 ['Maggie', 'Sauce'],
 ['Maggie', 'Cheese', 'Sauce'],
 ['Maggie', 'Cheese', 'Sauce'],
 ['Peanut butter', 'Bread'],
 ['Coffee', 'Sugar', 'Milk'],
 ['Coffee', 'Milk'],
 ['Maggie', 'Cheese', 'Sauce'],
 ['Bread', 'Jam', 'Butter'],
 ['Butter', 'Cheese'],
 ['Maggie', 'Cheese', 'Sauce'],
 ['Maggie', 'Bread '],
 ['Bread', 'Torch', 'Jam'],
 ['Bread', 'Jam', 'Butter'],
 ['Jam', 'Butter', 'Torch'],
 ['Bread', 'Jam', 'Butter'],
 ['CornFlakes', 'Milk', 'Bread']]

In [27]:
from mlxtend.preprocessing import TransactionEncoder

In [28]:
te = TransactionEncoder()
encoded_df = te.fit_transform(data)  # for true and false

In [29]:
te.columns_

['Bread',
 'Bread ',
 'Butter',
 'Cheese',
 'Coffee',
 'CornFlakes',
 'Jam',
 'Maggie',
 'Milk',
 'Peanut butter',
 'Sauce',
 'Sugar',
 'Tea powder',
 'Torch',
 'sauce']

In [30]:
data = pd.DataFrame(encoded_df,columns = te.columns_)
data

Unnamed: 0,Bread,Bread.1,Butter,Cheese,Coffee,CornFlakes,Jam,Maggie,Milk,Peanut butter,Sauce,Sugar,Tea powder,Torch,sauce
0,True,False,False,False,False,False,False,False,True,False,False,False,False,False,True
1,True,False,False,False,False,False,False,False,True,False,False,False,True,False,False
2,True,False,True,False,False,False,True,False,False,False,False,False,False,False,False
3,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False
5,False,False,False,True,False,False,False,True,False,False,True,False,False,False,False
6,False,False,False,True,False,False,False,True,False,False,True,False,False,False,False
7,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False
8,False,False,False,False,True,False,False,False,True,False,False,True,False,False,False
9,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False


In [31]:
df.replace(True, 1,inplace = True)
df.replace(False,0,inplace = True)
data

Unnamed: 0,Bread,Bread.1,Butter,Cheese,Coffee,CornFlakes,Jam,Maggie,Milk,Peanut butter,Sauce,Sugar,Tea powder,Torch,sauce
0,True,False,False,False,False,False,False,False,True,False,False,False,False,False,True
1,True,False,False,False,False,False,False,False,True,False,False,False,True,False,False
2,True,False,True,False,False,False,True,False,False,False,False,False,False,False,False
3,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False
5,False,False,False,True,False,False,False,True,False,False,True,False,False,False,False
6,False,False,False,True,False,False,False,True,False,False,True,False,False,False,False
7,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False
8,False,False,False,False,True,False,False,False,True,False,False,True,False,False,False
9,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False


In [32]:
scores = apriori(data,min_support = 0.2, use_colnames=True)

In [33]:
scores

Unnamed: 0,support,itemsets
0,0.5,(Bread)
1,0.35,(Butter)
2,0.25,(Cheese)
3,0.3,(Jam)
4,0.3,(Maggie)
5,0.25,(Milk)
6,0.25,(Sauce)
7,0.25,"(Butter, Bread)"
8,0.25,"(Bread, Jam)"
9,0.25,"(Butter, Jam)"


In [34]:
association_rules(scores)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Jam),(Bread),0.3,0.5,0.25,0.833333,1.666667,0.1,3.0,0.571429
1,(Jam),(Butter),0.3,0.35,0.25,0.833333,2.380952,0.145,3.9,0.828571
2,(Cheese),(Maggie),0.25,0.3,0.2,0.8,2.666667,0.125,3.5,0.833333
3,(Sauce),(Cheese),0.25,0.25,0.2,0.8,3.2,0.1375,3.75,0.916667
4,(Cheese),(Sauce),0.25,0.25,0.2,0.8,3.2,0.1375,3.75,0.916667
5,(Sauce),(Maggie),0.25,0.3,0.25,1.0,3.333333,0.175,inf,0.933333
6,(Maggie),(Sauce),0.3,0.25,0.25,0.833333,3.333333,0.175,4.5,1.0
7,"(Butter, Bread)",(Jam),0.25,0.3,0.2,0.8,2.666667,0.125,3.5,0.833333
8,"(Butter, Jam)",(Bread),0.25,0.5,0.2,0.8,1.6,0.075,2.5,0.5
9,"(Bread, Jam)",(Butter),0.25,0.35,0.2,0.8,2.285714,0.1125,3.25,0.75
