## Association Rule Learning

*Prepared by:*
**Jude Michael Teves**  
Faculty, Software Technology Department  
College of Computer Studies - De La Salle University

This notebook shows how to perform Association Rule Learning using <a href="https://borgelt.net/pyfim.html">PyFIM</a>.

## Preliminaries

### Import libraries

We will be using the `PyFIM` library for the succeeding cells. This will allow us to do association rule learning. "PyFIM is an extension module that makes several frequent item set mining implementations available as functions in Python 3.10 or later. Currently apriori, eclat, fpgrowth, sam, relim, carpenter, ista, accretion and apriacc are available as functions, although the interfaces do not offer all of the options of the command line program." If this is not already installed in your environment, you may use the either of the following commands in your command line:

```conda install -c conda-forge pyfim``` or
```pip install pyfim```

In [None]:
!pip install pyfim

Collecting pyfim
  Downloading pyfim-6.28.tar.gz (357 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/357.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m357.3/357.3 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyfim
  Building wheel for pyfim (setup.py) ... [?25l[?25hdone
  Created wheel for pyfim: filename=pyfim-6.28-cp310-cp310-linux_x86_64.whl size=637318 sha256=c4719d98d237638ffb3fd580980e4564debf296675a09525c504ce82d409a5c9
  Stored in directory: /root/.cache/pip/wheels/96/0a/b3/c877bfa85c4cfe1baf3de4a89e1949382be09de5eabe49314f
Successfully built pyfim
Installing collected packages: pyfim
Successfully installed pyfim-6.28


In [None]:
import pandas as pd
from fim import arules, apriori, fpgrowth, eclat

### Prepare the dataset

We create a synthetic dataset containing only a few transactions and items for pedagogical purposes.

In [None]:
transactions = [['milk', 'coke', 'beer'],
                ['milk', 'pepsi', 'juice'],
                ['milk', 'beer'],
                ['coke', 'juice'],
                ['milk', 'pepsi', 'beer'],
                ['milk', 'coke', 'juice', 'beer'],
                ['coke', 'juice', 'beer'],
                ['coke', 'beer']]

## Frequent Itemset Mining

Now that we have the data, we can now identify the interesting/significant items. We will make use of PyFIM's implementation of various FIM algorithms.

For more details about the function, see documentation below:

In [None]:
help(apriori)

Help on built-in function apriori in module fim:

apriori(...)
    apriori (tracts, target='s', supp=10, zmin=1, zmax=None, report='a',
             eval='x', agg='x', thresh=10, prune=None, algo='b', mode='',
             border=None)
    Find frequent item sets with the Apriori algorithm.
    tracts  transaction database to mine (mandatory)
            The database must be an iterable of transactions;
            each transaction must be an iterable of items;
            each item must be a hashable object.
            If the database is a dictionary, the transactions are
            the keys, the values their (integer) multiplicities.
    target  type of frequent item sets to find     (default: s)
            s/a   sets/all   all     frequent item sets
            c     closed     closed  frequent item sets
            m     maximal    maximal frequent item sets
            g     gens       generators
            r     rules      association rules
    supp    minimum support of an i

### Set Threshold Values

In [None]:
supp = -4 # minimum support of an assoc. rule   (default: 10)
report = 'as'
# a - absolute item set  support (number of transactions)
# s - relative item set  support as a fraction

You can see the significant itemsets based on the minimum support threshold value we used in the output table below.

In [None]:
result = apriori(transactions, supp=supp, report=report)
colnames = ['itemset'] + ['support_absolute', 'support_relative']
df_result = pd.DataFrame(result, columns=colnames)
df_result = df_result.sort_values('support_absolute', ascending=False)
print(df_result.shape)
df_result

(6, 3)


Unnamed: 0,itemset,support_absolute,support_relative
5,"(beer,)",6,0.75
1,"(coke,)",5,0.625
3,"(milk,)",5,0.625
0,"(juice,)",4,0.5
2,"(coke, beer)",4,0.5
4,"(milk, beer)",4,0.5


We can also use another FIM algorithm such as fpgrowth.

In [None]:
result = fpgrowth(transactions, supp=supp, report=report)
colnames = ['itemset'] + ['support_absolute', 'support_relative']
df_result = pd.DataFrame(result, columns=colnames)
df_result = df_result.sort_values('support_absolute', ascending=False)
print(df_result.shape)
df_result

(7, 3)


Unnamed: 0,itemset,support_absolute,support_relative
0,"(beer,)",6,0.75
2,"(milk,)",5,0.625
4,"(coke,)",5,0.625
1,"(milk, beer)",4,0.5
3,"(coke, beer)",4,0.5
6,"(juice,)",4,0.5
5,"(juice, coke)",3,0.375


## Finding Significant Itemsets

The functions above only give us the itemsets based on the minimum support threshold. If we want the confidence as well, we can use the `arules` function.

In [None]:
help(arules)

Help on built-in function arules in module fim:

arules(...)
    arules (tracts, supp=10, conf=80, zmin=1, zmax=None, report='aC',
            eval='x', thresh=10, mode='', appear=None)
    Find association rules (simplified interface).
    tracts  transaction database to mine (mandatory)
            The database must be an iterable of transactions;
            each transaction must be an iterable of items;
            each item must be a hashable object.
            If the database is a dictionary, the transactions are
            the keys, the values their (integer) multiplicities.
    supp    minimum support    of an assoc. rule   (default: 10)
            (positive: percentage, negative: absolute number)
    conf    minimum confidence of an assoc. rule   (default: 80%)
    zmin    minimum number of items per rule       (default: 1)
    zmax    maximum number of items per rule       (default: no limit)
    report  values to report with a assoc. rule    (default: aC)
            a   

In [None]:
supp = -4 # minimum support of an assoc. rule   (default: 10)
conf = 50 # minimum confidence of an assoc. rule (default: 80%)
report = 'bxC'
# b - absolute body/antecedent item set  support (number of transactions)
# x - relative body/antecedent item set  support as a fraction
# C - rule confidence as a percentage

In [None]:
result = arules(transactions, supp=supp, conf=conf, report=report)
colnames = ['consequent', 'antecedent'] + ['support_absolute', 'support_relative', 'confidence_pct']
df_result = pd.DataFrame(result, columns=colnames)
df_result = df_result.sort_values('support_absolute', ascending=False)
print(df_result.shape)
df_result

(15, 5)


Unnamed: 0,consequent,antecedent,support_absolute,support_relative,confidence_pct
0,beer,(),8,1.0,75.0
3,milk,(),8,1.0,62.5
8,coke,(),8,1.0,62.5
14,juice,(),8,1.0,50.0
2,milk,"(beer,)",6,0.75,66.666667
5,coke,"(beer,)",6,0.75,66.666667
1,beer,"(milk,)",5,0.625,80.0
4,beer,"(coke,)",5,0.625,80.0
13,juice,"(coke,)",5,0.625,60.0
6,milk,"(coke, beer)",4,0.5,50.0


## Exercise

1. Find the `interesting` items in this dataset. Feel free to use any threshold value.
2. What would you recommend the owner of a grocery store given these association rules?

## End
<sup>made by **Jude Michael Teves**</sup> <br>
<sup>for comments, corrections, suggestions, please email:</sup><sup> <href>judemichaelteves@gmail.com</href> or <href>jude.teves@dlsu.edu.ph</href></sup><br>