# Introduction

In this section, I will demonstrate how to use association rule mining to produce some useful information from some transactional data set.

## Business objectives
One interesting business problem is to find any associated jobs that people are willing to take. 

Association rule mining (ARM) is a technique used to discover relationships among a large set of variables in a data set. It has been applied to a variety of industry settings and disciplines.

## Setup evaluation metrics
Setup measure so that the model can use to testify the association rule:

![ar_formula](./ar_formula.png)

# Experimental Environment Setup

## Common Configuration

In [1]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import os
# Make the random numbers predictable
np.random.seed(42)
import multiprocessing
cpu_count = multiprocessing.cpu_count()

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Set up some constant values about root directory and where to save the figures
PROJECT_ROOT_DIR = "."
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images")
os.makedirs(IMAGES_PATH, exist_ok=True)

# define a common-used fuction to save a plot figure
def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

# only logging error information
import logging
logging.getLogger().setLevel(logging.ERROR)
# disable warning message show-ip
import warnings
warnings.filterwarnings("ignore")

# Allow multiple output/display from one cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Python Library Import

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import pandas_profiling
# use a small python package for apriori algorithm
from apyori import apriori
# or use mlxtend package
# from mlxtend.frequent_patterns import apriori, association_rules

# Collecting the Data

## Load the data

In [3]:
aa = pd.read_csv('dummy2.csv', header="infer")   
aa.info()
aa.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39 entries, 0 to 38
Data columns (total 2 columns):
job1    39 non-null object
job2    39 non-null object
dtypes: object(2)
memory usage: 752.0+ bytes


Unnamed: 0,job1,job2
0,15-230 PC Technician,47-120 Electrician
1,15-230 PC Technician,19-140 Field Technician
2,18-300 Maintenance Tech/Mechanic,51-100 General Laborer/Production
3,18-560 Electronics Technician,51-140 Assembler - Light
4,18-560 Electronics Technician,51-210 Forklift/Heavy Machine Operator


In [4]:
#convert the records to list
records = []
for i in range(0,39):
    records.append([str(aa.values[i,j]) for j in range(0,2)])

In [5]:
records

[['15-230 PC Technician', '47-120 Electrician'],
 ['15-230 PC Technician', '19-140 Field Technician'],
 ['18-300 Maintenance Tech/Mechanic', '51-100 General Laborer/Production'],
 ['18-560 Electronics Technician', '51-140 Assembler - Light'],
 ['18-560 Electronics Technician', '51-210 Forklift/Heavy Machine Operator'],
 ['19-140 Field Technician', '47-120 Electrician'],
 ['19-140 Field Technician', '51-210 Forklift/Heavy Machine Operator'],
 ['47-120 Electrician', '19-140 Field Technician'],
 ['47-150 Drywall Finisher/Carpenter', '51-100 General Laborer/Production'],
 ['47-150 Drywall Finisher/Carpenter', '51-200 Machine Operator'],
 ['47-150 Drywall Finisher/Carpenter', '51-130 Material Handler/Packer'],
 ['51-100 General Laborer/Production', '51-200 Machine Operator'],
 ['51-100 General Laborer/Production', '51-130 Material Handler/Packer'],
 ['51-100 General Laborer/Production', '51-300 Machinist'],
 ['51-100 General Laborer/Production', '51-240 QA Technician/Inspector'],
 ['51-100 

# Association Rule Mining

In [6]:
def display_rules(association_results):
    for item in association_results:

        # first index of the inner list
        # Contains base item and add item
        item_sets = item[0] 
        items = [x for x in item_sets]
        # filter out any NaN items
        if (str(items[0]).lower()!= 'nan') and (str(items[1]).lower()!= 'nan'):
            print("Rule: " + items[0] + " -> " + items[1])

            #second index of the inner list
            print("Support: " + str(item[1]))

            #third index of the list located at 0th
            #of the third index of the inner list

            print("Confidence: " + str(item[2][0][2]))
            print("Lift: " + str(item[2][0][3]))
            print("=====================================")

In [7]:
association_rules = apriori(records, min_support=0.0045, min_confidence=0.39, min_lift=1.2, min_length=2)
association_results = list(association_rules)

In [8]:
print(len(association_results))

17


In [9]:
print(association_results[0])

RelationRecord(items=frozenset({'19-140 Field Technician', '15-230 PC Technician'}), support=0.02564102564102564, ordered_statistics=[OrderedStatistic(items_base=frozenset({'15-230 PC Technician'}), items_add=frozenset({'19-140 Field Technician'}), confidence=0.5, lift=3.9000000000000004)])


In [10]:
display_rules(association_results)

Rule: 19-140 Field Technician -> 15-230 PC Technician
Support: 0.02564102564102564
Confidence: 0.5
Lift: 3.9000000000000004
Rule: 15-230 PC Technician -> 47-120 Electrician
Support: 0.02564102564102564
Confidence: 0.5
Lift: 6.5
Rule: 18-300 Maintenance Tech/Mechanic -> 51-100 General Laborer/Production
Support: 0.02564102564102564
Confidence: 0.5
Lift: 1.5
Rule: 51-140 Assembler - Light -> 18-300 Maintenance Tech/Mechanic
Support: 0.02564102564102564
Confidence: 0.5
Lift: 2.1666666666666665
Rule: 51-140 Assembler - Light -> 18-560 Electronics Technician
Support: 0.02564102564102564
Confidence: 0.5
Lift: 2.1666666666666665
Rule: 51-210 Forklift/Heavy Machine Operator -> 18-560 Electronics Technician
Support: 0.02564102564102564
Confidence: 0.5
Lift: 4.875
Rule: 19-140 Field Technician -> 47-120 Electrician
Support: 0.05128205128205128
Confidence: 0.4
Lift: 5.2
Rule: 51-200 Machine Operator -> 51-100 General Laborer/Production
Support: 0.05128205128205128
Confidence: 0.5
Lift: 1.5
Rule: 

In [11]:
# increase min support threshold so that the rule searching/mining can be faster and produce less rule results
association_rules = apriori(records, min_support=0.05, min_confidence=0.39, min_lift=1.2, min_length=2)
association_results = list(association_rules)

In [12]:
print(len(association_results))

3


In [13]:
display_rules(association_results)

Rule: 19-140 Field Technician -> 47-120 Electrician
Support: 0.05128205128205128
Confidence: 0.4
Lift: 5.2
Rule: 51-200 Machine Operator -> 51-100 General Laborer/Production
Support: 0.05128205128205128
Confidence: 0.5
Lift: 1.5
Rule: 51-300 Machinist -> 51-100 General Laborer/Production
Support: 0.05128205128205128
Confidence: 0.5
Lift: 1.5
