# Association Rule Mining

## Background

The marketing department of a financial firm keeps records on customers, including demographic information and the type of accounts. A new product, ["Personal Equity Plan" (PEP)](https://www.investopedia.com/terms/p/pep.asp), was launched and advertised by mail to the firm's existing customers, and a record was kept as to whether that customer responded and bought the product. To better understand their customer base, the managers decided to use data mining techniques to build customer profiles based on the data the firm already has.

## Mini-Project Task

Apply an association rule mining algorithm to discover patterns in customer behavior (you should aim to get at least 20-30 strong rules after experimenting with the algorithm's parameters).

In addition, select the five most "interesting" rules and briefly write for each:
- an explanation of the pattern and why you believe it is interesting based on the business objectives of the company;
- any recommendations based on the discovered rule that might help the company to better understand behavior of its customers or in its marketing campaign.

These are not necessarily the top five rules you'd get from the association rules algorithm. In addition to having high support, lift and confidence, they should be rules that provide non-trivial, actionable knowledge for the given business scenario.

## Dataset Description

| Column       | Description                                                                       |
| ------------ | --------------------------------------------------------------------------------- |
| id           | a unique identification number                                                    |
| age          | age of customer in years (numeric)                                                |
| sex          | MALE / FEMALE                                                                     |
| region       | inner_city/rural/suburban/town                                                    |
| income       | income of customer (numeric)                                                      |
| married      | is the customer married (YES/NO)                                                  |
| children     | number of children (numeric)                                                      |
| car          | does the customer own a car (YES/NO)                                              |
| save_acct    | does the customer have a saving account (YES/NO)                                  |
| current_acct | does the customer have a current account (YES/NO)                                 |
| mortgage     | does the customer have a mortgage (YES/NO)                                        |
| pep          | did the customer buy a PEP (Personal Equity Plan) after the last mailing (YES/NO) |


**Acknowledgment**: The dataset and description is attributed to Prof. Bamshad Mobasher.

**Note**: the data URL is https://raw.githubusercontent.com/GUC-DM/W2020/main/data/bank_data.csv

## Importing Libraries \& Dataset

In [75]:
# Execute this cell and restart the kernel to install the package needed for association rule mining
!pip install mlxtend --upgrade

Requirement already up-to-date: mlxtend in /usr/local/lib/python3.6/dist-packages (0.18.0)


## Importing Libraries \& Dataset

In [76]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


%matplotlib inline
plt.style.use("seaborn")

try:
    df = pd.read_csv('bank_data.csv')
except:
    df = pd.read_csv('https://raw.githubusercontent.com/GUC-DM/W2020/main/data/bank_data.csv')
df.head()

Unnamed: 0,id,age,sex,region,income,married,children,car,save_act,current_act,mortgage,pep
0,ID12101,48,FEMALE,INNER_CITY,17546.0,NO,1,NO,NO,NO,NO,YES
1,ID12102,40,MALE,TOWN,30085.1,YES,3,YES,NO,YES,YES,NO
2,ID12103,51,FEMALE,INNER_CITY,16575.4,YES,0,YES,YES,YES,NO,NO
3,ID12104,23,FEMALE,TOWN,20375.4,YES,3,NO,NO,YES,NO,NO
4,ID12105,57,FEMALE,RURAL,50576.3,YES,0,NO,YES,NO,NO,NO


In [62]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 600 entries, 0 to 599
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   id           600 non-null    object 
 1   age          600 non-null    int64  
 2   sex          600 non-null    object 
 3   region       600 non-null    object 
 4   income       600 non-null    float64
 5   married      600 non-null    object 
 6   children     600 non-null    int64  
 7   car          600 non-null    object 
 8   save_act     600 non-null    object 
 9   current_act  600 non-null    object 
 10  mortgage     600 non-null    object 
 11  pep          600 non-null    object 
dtypes: float64(1), int64(2), object(9)
memory usage: 56.4+ KB


## Data Preperation

In [77]:
#Make a copy of the dataframe
dfcopy= df.copy();

#Change the binary values from Yes/No to 1/0
dfcopy.married.replace(('YES', 'NO'), (1, 0), inplace=True)
dfcopy.car.replace(('YES', 'NO'), (1, 0), inplace=True)
dfcopy.save_act.replace(('YES', 'NO'), (1, 0), inplace=True)
dfcopy.current_act.replace(('YES', 'NO'), (1, 0), inplace=True)
dfcopy.mortgage.replace(('YES', 'NO'), (1, 0), inplace=True)
dfcopy.pep.replace(('YES', 'NO'), (1, 0), inplace=True)
dfcopy.head()






Unnamed: 0,id,age,sex,region,income,married,children,car,save_act,current_act,mortgage,pep
0,ID12101,48,FEMALE,INNER_CITY,17546.0,0,1,0,0,0,0,1
1,ID12102,40,MALE,TOWN,30085.1,1,3,1,0,1,1,0
2,ID12103,51,FEMALE,INNER_CITY,16575.4,1,0,1,1,1,0,0
3,ID12104,23,FEMALE,TOWN,20375.4,1,3,0,0,1,0,0
4,ID12105,57,FEMALE,RURAL,50576.3,1,0,0,1,0,0,0


## Binning

In [78]:
#We add new coulmn to the datafram to insert in the income label
dfcopy["incomeLevel"] = np.nan

#Binning the income into 3 bins
pd.qcut(dfcopy['income'], q=3)
pd.qcut(dfcopy['income'], q=3).value_counts()


#Labelling the income bins into Low. Medium  and High
labelsofIncome = ['Low',"Medium", "High"]
dfcopy['incomeLevel'] = pd.qcut(dfcopy['income'], q=3, labels= labelsofIncome)
dfcopy.head()

#Binng the age into 5 bins
ageBins=[15,26,37,48,59,69]
labelsofAge= ['15-25','26-36','37-47','48-58', '59-69']
dfcopy['age']=pd.cut(dfcopy['age'], bins= ageBins,labels= labelsofAge)

dfcopy.head()

Unnamed: 0,id,age,sex,region,income,married,children,car,save_act,current_act,mortgage,pep,incomeLevel
0,ID12101,37-47,FEMALE,INNER_CITY,17546.0,0,1,0,0,0,0,1,Low
1,ID12102,37-47,MALE,TOWN,30085.1,1,3,1,0,1,1,0,Medium
2,ID12103,48-58,FEMALE,INNER_CITY,16575.4,1,0,1,1,1,0,0,Low
3,ID12104,15-25,FEMALE,TOWN,20375.4,1,3,0,0,1,0,0,Medium
4,ID12105,48-58,FEMALE,RURAL,50576.3,1,0,0,1,0,0,0,High


## Modelling

In [49]:
#Taking part of the table to apply the rule mining
df_copy = dfcopy.loc[:,'married': 'pep']
df_copy=df_copy.drop(columns="children")
df_copy.head()

Unnamed: 0,married,car,save_act,current_act,mortgage,pep
0,0,0,0,0,0,1
1,1,1,0,1,1,0
2,1,1,1,1,0,0
3,1,0,0,1,0,0
4,1,0,1,0,0,0


In [90]:
from mlxtend.frequent_patterns import fpgrowth, association_rules

freq_items = fpgrowth(df_copy, min_support=0.005, use_colnames=True)
freq_items.sort_values('support', ascending=True)

Unnamed: 0,support,itemsets
61,0.023333,"(pep, save_act, current_act, car, married, mor..."
60,0.030000,"(pep, save_act, current_act, car, mortgage)"
59,0.031667,"(pep, save_act, car, married, mortgage)"
57,0.040000,"(save_act, pep, car, mortgage)"
54,0.041667,"(pep, save_act, current_act, married, mortgage)"
...,...,...
3,0.493333,(car)
62,0.531667,"(save_act, current_act)"
2,0.660000,(married)
5,0.690000,(save_act)


In [96]:
association_rules(freq_items, metric='confidence', min_threshold=0.73).sort_values('confidence', ascending= True)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
6,"(married, car, pep)",(current_act),0.13,0.758333,0.095,0.730769,0.963652,-0.003583,0.897619
4,"(car, pep)",(current_act),0.23,0.758333,0.168333,0.731884,0.965122,-0.006083,0.901351
11,"(married, car)",(current_act),0.323333,0.758333,0.236667,0.731959,0.96522,-0.008528,0.901603
7,"(save_act, married, car, pep)",(current_act),0.093333,0.758333,0.068333,0.732143,0.965463,-0.002444,0.902222
16,(mortgage),(current_act),0.348333,0.758333,0.256667,0.736842,0.97166,-0.007486,0.918333
27,"(pep, save_act, car, married, mortgage)",(current_act),0.031667,0.758333,0.023333,0.736842,0.97166,-0.000681,0.918333
20,"(mortgage, pep)",(current_act),0.153333,0.758333,0.113333,0.73913,0.974677,-0.002944,0.926389
8,(married),(current_act),0.66,0.758333,0.488333,0.739899,0.975691,-0.012167,0.929126
12,"(married, car)",(save_act),0.323333,0.69,0.24,0.742268,1.075751,0.0169,1.2028
19,"(save_act, mortgage)",(current_act),0.24,0.758333,0.178333,0.743056,0.979853,-0.003667,0.940541


## Conclusion


Few interesting Assoicate mining rules:

*   if a person has a saving account, they are likey to have a current account (77% confidence)

*   if a person has a saving account AND PEP, they are likely to have a current account (78% confidence)

*   if a person has mortgage, they are likely to have a current account (73.6% confidence)


*   if a person is married they are likely to have a current account (73.9% confidence)
*   if a person has a car they are likely to have a current account (74% confidence)


*   if a person has a PEP they are likely to have a current account (77% confidence)


* 





Recomendations:


*   people who are married and has a car are best paired with a current account

*   people who have a saving account and has a car are best paired with a current account


*    people who are married and have mortgage are best paired with a current account 
*   people who have a car, are married and have a current account are best paired with a saving account

*   people who have mortgage AND are married AND have a car AND  PEP  are best paired with a current account











