## Association Rule

### Import Libraries

In [4]:
import pandas as pd
import numpy as np
#!pip install mlxtend
#from mlextend.frequent_patterns import apriori,association_rules
#from mlextend.preprocessing import TransactionEncoder

In [6]:
!pip install mlxtend

Collecting mlxtend
  Downloading mlxtend-0.21.0-py2.py3-none-any.whl (1.3 MB)
Installing collected packages: mlxtend
Successfully installed mlxtend-0.21.0


In [7]:
from mlxtend.frequent_patterns import apriori,association_rules
from mlxtend.preprocessing import TransactionEncoder

In [11]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

In [12]:
titanic=pd.read_csv("Titanic.csv")
titanic.head()

Unnamed: 0,Class,Gender,Age,Survived
0,3rd,Male,Child,No
1,3rd,Male,Child,No
2,3rd,Male,Child,No
3,3rd,Male,Child,No
4,3rd,Male,Child,No


### Pre-Processing

In [13]:
df=pd.get_dummies(titanic) #converting all the categorical into numerical columns
df.head()

Unnamed: 0,Class_1st,Class_2nd,Class_3rd,Class_Crew,Gender_Female,Gender_Male,Age_Adult,Age_Child,Survived_No,Survived_Yes
0,0,0,1,0,0,1,0,1,1,0
1,0,0,1,0,0,1,0,1,1,0
2,0,0,1,0,0,1,0,1,1,0
3,0,0,1,0,0,1,0,1,1,0
4,0,0,1,0,0,1,0,1,1,0


#### Under the class column, we have 4 values: 1st,2nd,3rd, and crew. So 4 columns have been created accordingly for "class" column. Similarly we had 2 values under "gender" column which is male, female. So 2 columns have been created accordingly for "gender" column. Similarly, its been done for "Age" & "Survived" column.

### Apriori Algorithm

#### In the above o/p, we have totally 10 columns. So among that 10 columns we have to identify the frequent item sets.

#### Calculating the frequent item sets :

In [16]:
frequent_itemsets = apriori(df,min_support=0.1,use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.14766,(Class_1st)
1,0.129487,(Class_2nd)
2,0.320763,(Class_3rd)
3,0.40209,(Class_Crew)
4,0.213539,(Gender_Female)
5,0.786461,(Gender_Male)
6,0.950477,(Age_Adult)
7,0.676965,(Survived_No)
8,0.323035,(Survived_Yes)
9,0.144934,"(Age_Adult, Class_1st)"


#### Displaying the records whose lift_ratio is greater than 0.7 :

In [21]:
rules=association_rules(frequent_itemsets,metric="lift",min_threshold=0.7)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Age_Adult),(Class_1st),0.950477,0.147660,0.144934,0.152486,1.032680,0.004587,1.005694
1,(Class_1st),(Age_Adult),0.147660,0.950477,0.144934,0.981538,1.032680,0.004587,2.682493
2,(Age_Adult),(Class_2nd),0.950477,0.129487,0.118582,0.124761,0.963505,-0.004492,0.994601
3,(Class_2nd),(Age_Adult),0.129487,0.950477,0.118582,0.915789,0.963505,-0.004492,0.588085
4,(Class_3rd),(Gender_Male),0.320763,0.786461,0.231713,0.722380,0.918520,-0.020555,0.769177
...,...,...,...,...,...,...,...,...,...
101,"(Class_Crew, Survived_No)","(Age_Adult, Gender_Male)",0.305770,0.757383,0.304407,0.995542,1.314450,0.072822,54.427079
102,(Age_Adult),"(Survived_No, Class_Crew, Gender_Male)",0.950477,0.304407,0.304407,0.320268,1.052103,0.015075,1.023334
103,(Gender_Male),"(Age_Adult, Class_Crew, Survived_No)",0.786461,0.305770,0.304407,0.387060,1.265851,0.063931,1.132622
104,(Class_Crew),"(Age_Adult, Survived_No, Gender_Male)",0.402090,0.603816,0.304407,0.757062,1.253795,0.061619,1.630802


#### Sorting the above records in descending order based on lift ratio :

In [35]:
rules.sort_values('lift',ascending=False)[0:50] #displaying only the first 50 records

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
64,"(Age_Adult, Gender_Female)",(Survived_Yes),0.193094,0.323035,0.143571,0.743529,2.301699,0.081195,2.639542
69,(Survived_Yes),"(Age_Adult, Gender_Female)",0.323035,0.193094,0.143571,0.444444,2.301699,0.081195,1.452431
19,(Survived_Yes),(Gender_Female),0.323035,0.213539,0.156293,0.483826,2.265745,0.087312,1.523634
18,(Gender_Female),(Survived_Yes),0.213539,0.323035,0.156293,0.731915,2.265745,0.087312,2.525187
68,(Gender_Female),"(Age_Adult, Survived_Yes)",0.213539,0.297138,0.143571,0.67234,2.262724,0.080121,2.145099
65,"(Age_Adult, Survived_Yes)",(Gender_Female),0.297138,0.213539,0.143571,0.48318,2.262724,0.080121,1.521732
96,"(Age_Adult, Gender_Male)","(Class_Crew, Survived_No)",0.757383,0.30577,0.304407,0.40192,1.31445,0.072822,1.160764
101,"(Class_Crew, Survived_No)","(Age_Adult, Gender_Male)",0.30577,0.757383,0.304407,0.995542,1.31445,0.072822,54.427079
47,"(Age_Adult, Gender_Male)",(Class_Crew),0.757383,0.40209,0.39164,0.517097,1.286022,0.087104,1.238157
50,(Class_Crew),"(Age_Adult, Gender_Male)",0.40209,0.757383,0.39164,0.974011,1.286022,0.087104,9.33548


#### In the above o/p, we have to identify only the actionable rules, so for that we have to display only those records whose lift ratio is greater than 1. 

#### Displaying those records whose lift ratio is greater than 1 :

In [36]:
rules[rules.lift>1]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Age_Adult),(Class_1st),0.950477,0.147660,0.144934,0.152486,1.032680,0.004587,1.005694
1,(Class_1st),(Age_Adult),0.147660,0.950477,0.144934,0.981538,1.032680,0.004587,2.682493
8,(Class_3rd),(Survived_No),0.320763,0.676965,0.239891,0.747875,1.104747,0.022745,1.281251
9,(Survived_No),(Class_3rd),0.676965,0.320763,0.239891,0.354362,1.104747,0.022745,1.052040
10,(Class_Crew),(Gender_Male),0.402090,0.786461,0.391640,0.974011,1.238474,0.075412,8.216621
...,...,...,...,...,...,...,...,...,...
101,"(Class_Crew, Survived_No)","(Age_Adult, Gender_Male)",0.305770,0.757383,0.304407,0.995542,1.314450,0.072822,54.427079
102,(Age_Adult),"(Survived_No, Class_Crew, Gender_Male)",0.950477,0.304407,0.304407,0.320268,1.052103,0.015075,1.023334
103,(Gender_Male),"(Age_Adult, Class_Crew, Survived_No)",0.786461,0.305770,0.304407,0.387060,1.265851,0.063931,1.132622
104,(Class_Crew),"(Age_Adult, Survived_No, Gender_Male)",0.402090,0.603816,0.304407,0.757062,1.253795,0.061619,1.630802


#### So,out of 106 rows which was there totally (in 21st cell) , now only 74 rows are there whose lift ratio is greater than 1.

#### Displaying those records whose lift ratio is greater than 1 and also whose confidence is greater than 0.75 :

In [37]:
rules[(rules.lift>1) & (rules.confidence>0.75)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
1,(Class_1st),(Age_Adult),0.14766,0.950477,0.144934,0.981538,1.03268,0.004587,2.682493
10,(Class_Crew),(Gender_Male),0.40209,0.786461,0.39164,0.974011,1.238474,0.075412,8.216621
13,(Class_Crew),(Age_Adult),0.40209,0.950477,0.40209,1.0,1.052103,0.019913,inf
14,(Class_Crew),(Survived_No),0.40209,0.676965,0.30577,0.760452,1.123325,0.033569,1.348519
20,(Age_Adult),(Gender_Male),0.950477,0.786461,0.757383,0.796845,1.013204,0.00987,1.051116
21,(Gender_Male),(Age_Adult),0.786461,0.950477,0.757383,0.963027,1.013204,0.00987,1.339441
22,(Gender_Male),(Survived_No),0.786461,0.676965,0.619718,0.787984,1.163995,0.087312,1.523634
23,(Survived_No),(Gender_Male),0.676965,0.786461,0.619718,0.915436,1.163995,0.087312,2.525187
25,(Survived_No),(Age_Adult),0.676965,0.950477,0.653339,0.965101,1.015386,0.0099,1.419023
34,"(Class_3rd, Gender_Male)",(Survived_No),0.231713,0.676965,0.191731,0.827451,1.222295,0.03487,1.872135


#### So,out of 74 rows which was there in the previous o/p, now nearly only 36 rows are there whose lift ratio is greater than 1 and also whose confidence is greater than 0.75. As we could see in 35th cell, most of the lift ratio values falls nearer to the value 1. So, in these kind of cases we can consider another paramter "confidence". Atlast accordingly, we can take the manual business decision.