# Application of Association Rule Learning

This notebook show how to build association rules and is structured as follows:
_______

**1. Import libraries and dataset**

The code blocks import the necessary standard librariers and the data from the datasource.

**2. Extend the finance_data by new column 'Kind of financial products'**

In this section each column entry is put into an array and then added as extra column to the whole dataframe. It is a necessary step to make the association rule algorithm work. It is a kind of preprocessing step.

**3. Apply the association rule learning algorithm**

The section consists of two subsections:

*3.1 Preprocess the data*:

Data is being standardized. The apriori algorithm is responsibel for finding frequent the item sets that frequently appear which is a precondition for deriving rules.

*3.2 Finde the rules*:

By setting appropriate metrics such as 'confidence' with a minimal threshold, rules can be extraced.

**4. Export the new data**

The codeblock exports the found rules as an excel sheet.

## 1. Import libraries and dataset

In [2]:
# import data analysis libaries
import pandas as pd

# import visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import rcParams
%matplotlib inline

In [4]:
# provide the data source --> must be an excel
dataset = '../Datasource/data_financeproducts.xlsx'
finance_data = pd.read_excel(dataset )
finance_data.head()

Unnamed: 0,Sex,Age,Net income,Amount of financial products,Stocks,Investment fonds,Bond issues,Savings certificate,Giro account,Day-to-day money account,Credit card,Fixed deposit account,Life insurance,Riester pension,Housing saving
0,male,28,1250-1499 €,3,no,no,no,no,yes,yes,yes,no,no,no,no
1,male,56,5000-5999 €,1,no,no,no,no,no,no,yes,no,no,no,no
2,male,28,1250-1499 €,3,no,no,no,no,yes,yes,yes,no,no,no,no
3,male,29,2000-2499 €,3,no,no,no,no,yes,no,yes,no,no,no,yes
4,male,29,2000-2499 €,2,no,no,no,no,yes,no,yes,no,no,no,no


In [62]:
# non financial columns are removed
finance_data = finance_data.drop(['Sex', 'Age', 'Net income', 'Amount of financial products'], axis = 1)
finance_data.head()

Unnamed: 0,Stocks,Investment fonds,Bond issues,Savings certificate,Giro account,Day-to-day money account,Credit card,Fixed deposit account,Life insurance,Riester pension,Housing saving
0,no,no,no,no,yes,yes,yes,no,no,no,no
1,no,no,no,no,no,no,yes,no,no,no,no
2,no,no,no,no,yes,yes,yes,no,no,no,no
3,no,no,no,no,yes,no,yes,no,no,no,yes
4,no,no,no,no,yes,no,yes,no,no,no,no


## 2. Extend the finance_data by new column 'Kind of financial products'

In [63]:
# array of items for conditional comparison
items = ['Stocks', 'Investment fonds', 'Bond issues', 'Savings certificate', 'Day-to-day money account', 
         'Fixed deposit account', 'Life insurance', 'Credit card', 'Giro account',
        'Riester pension', 'Housing saving']

# initialize an array for products which will be a new colung in finance_data
products = []

# iterate over all rows in finance_data
for index, row in finance_data.iterrows():
    
    # initialize an array for each row to capture chosen products 
    individual_item_set = []
    
    #iterate over all items and check if an item was chosen
    for pro in items:
        if row[pro]=='yes':
            individual_item_set.append(pro)
    
    # extend the products array with individual_item_set 
    products.append(individual_item_set)

# assign a new column to finance_data
finance_data['Kind of financial products'] = products


# save the new data depending on the 'save' flag
save = False
if save==True:
    finance_data.to_excel('data_financeproducts_final.xlsx')
products

[['Day-to-day money account', 'Credit card', 'Giro account'],
 ['Credit card'],
 ['Day-to-day money account', 'Credit card', 'Giro account'],
 ['Credit card', 'Giro account', 'Housing saving'],
 ['Credit card', 'Giro account'],
 ['Credit card', 'Giro account', 'Riester pension'],
 ['Stocks',
  'Day-to-day money account',
  'Fixed deposit account',
  'Credit card',
  'Giro account',
  'Housing saving'],
 ['Giro account'],
 ['Giro account'],
 ['Stocks',
  'Investment fonds',
  'Savings certificate',
  'Day-to-day money account',
  'Fixed deposit account',
  'Life insurance',
  'Credit card',
  'Giro account',
  'Riester pension'],
 ['Investment fonds',
  'Day-to-day money account',
  'Life insurance',
  'Credit card',
  'Giro account'],
 ['Life insurance', 'Credit card', 'Giro account'],
 ['Credit card', 'Giro account', 'Housing saving'],
 ['Day-to-day money account',
  'Fixed deposit account',
  'Credit card',
  'Giro account'],
 ['Investment fonds', 'Credit card', 'Giro account', 'Ries

In [64]:
# transform the itemsets into the correct format for the upcoming 
# applicaton of the apriori algorithm

# assign the column 'Kind of financial products' to a variable
items = finance_data['Kind of financial products']

# generate an array
analysis = []

# add each itemset as array to a new array containing all itemsets
for item in items.iteritems():
    if (len(item[1]) != 0):
        analysis.append(item[1])

## 3. Apply the association rule learning algorithm

### 3.1 Preprocess the data 

In [65]:
# import rule learning preprocessing libraries
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

# transform the data respectively standardize them
transactionencoder = TransactionEncoder()
transactionencoder_ary = transactionencoder.fit(analysis).transform(analysis)

# assign the encoded data to a new DataFrame and find unique names for column description
df = pd.DataFrame(transactionencoder_ary, columns=transactionencoder.columns_)

# find itemsets with the apriori algorithm
support = 0.15
frequent_itemsets = apriori(df, min_support=support, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.792308,[Credit card]
1,0.361538,[Day-to-day money account]
2,0.176923,[Fixed deposit account]
3,0.907692,[Giro account]
4,0.369231,[Housing saving]
5,0.161538,[Investment fonds]
6,0.323077,[Life insurance]
7,0.184615,[Riester pension]
8,0.169231,[Stocks]
9,0.353846,"[Credit card, Day-to-day money account]"


### 3.2 Find the rules

In [66]:
# import rule learning library
from mlxtend.frequent_patterns import association_rules

# find rules with a defined metric and a minimum of threshold
rules=association_rules(frequent_itemsets, metric="confidence", min_threshold=0.4)
rules

Unnamed: 0,antecedants,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Day-to-day money account),(Credit card),0.361538,0.792308,0.353846,0.978723,1.235282,0.067396,9.761538
1,(Credit card),(Day-to-day money account),0.792308,0.361538,0.353846,0.446602,1.235282,0.067396,1.153711
2,(Fixed deposit account),(Credit card),0.176923,0.792308,0.169231,0.956522,1.20726,0.029053,4.776923
3,(Credit card),(Giro account),0.792308,0.907692,0.730769,0.92233,1.016126,0.011598,1.188462
4,(Giro account),(Credit card),0.907692,0.792308,0.730769,0.805085,1.016126,0.011598,1.065552
5,(Housing saving),(Credit card),0.369231,0.792308,0.284615,0.770833,0.972896,-0.007929,0.906294
6,(Life insurance),(Credit card),0.323077,0.792308,0.269231,0.833333,1.05178,0.013254,1.246154
7,(Riester pension),(Credit card),0.184615,0.792308,0.161538,0.875,1.104369,0.015266,1.661538
8,(Stocks),(Credit card),0.169231,0.792308,0.161538,0.954545,1.204766,0.027456,4.569231
9,(Day-to-day money account),(Giro account),0.361538,0.907692,0.353846,0.978723,1.078255,0.02568,4.338462


## 4. Export the new built dataset

In [70]:
# export the itemsets and rules into a new files
frequent_itemsets.to_excel("Frequent itemsets.xlsx")
rules.to_excel("Computed association rules.xlsx")