In [1]:
library(arules)

Loading required package: Matrix

Attaching package: ‘arules’

The following objects are masked from ‘package:base’:

    abbreviate, write



In [2]:
data("AdultUCI")
AdultUCI[["fnlwgt"]] <- NULL
AdultUCI[["education-num"]] <- NULL

## map metric attributes
AdultUCI[[ "age"]] <- ordered(cut(AdultUCI[[ "age"]], c(15,25,45,65,100)),
  labels = c("Young", "Middle-aged", "Senior", "Old"))

AdultUCI[[ "hours-per-week"]] <- ordered(cut(AdultUCI[[ "hours-per-week"]],
  c(0,25,40,60,168)),
  labels = c("Part-time", "Full-time", "Over-time", "Workaholic"))

AdultUCI[[ "capital-gain"]] <- ordered(cut(AdultUCI[[ "capital-gain"]],
  c(-Inf,0,median(AdultUCI[[ "capital-gain"]][AdultUCI[[ "capital-gain"]]>0]),
  Inf)), labels = c("None", "Low", "High"))

AdultUCI[[ "capital-loss"]] <- ordered(cut(AdultUCI[[ "capital-loss"]],
  c(-Inf,0, median(AdultUCI[[ "capital-loss"]][AdultUCI[[ "capital-loss"]]>0]),
  Inf)), labels = c("None", "Low", "High"))

In [21]:
#AdultUCI[[ "capital-gain"]] <- NULL
#AdultUCI[[ "capital-loss"]] <- NULL

In [3]:
fsets <- apriori(AdultUCI, parameter = list(supp = 0.2))
fsets.top5 <- sort(fsets)[1:300]
inspect(fsets.top5)

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5     0.2      1
 maxlen target   ext
     10  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 9768 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[115 item(s), 48842 transaction(s)] done [0.03s].
sorting and recoding items ... [18 item(s)] done [0.01s].
creating transaction tree ... done [0.03s].
checking subsets of size 1 2 3 4 5 6 7 done [0.01s].
writing ... [1306 rule(s)] done [0.00s].
creating S4 object  ... done [0.01s].
      lhs                                    rhs                                   support confidence      lift
[1]   {}                                  => {capital-loss=None}                 0.9532779  0.9532779 1.0000000
[2]   {}                                  => {capi

In [26]:
## Mine frequent itemsets with Eclat.
fsets <- eclat(AdultUCI, parameter = list(supp = 0.2))

## Display the 5 itemsets with the highest support.
fsets.top5 <- sort(fsets)[1:5]
inspect(fsets.top5)

Eclat

parameter specification:
 tidLists support minlen maxlen            target   ext
    FALSE     0.2      1     10 frequent itemsets FALSE

algorithmic control:
 sparse sort verbose
      7   -2    TRUE

Absolute minimum support count: 9768 

create itemset ... 
set transactions ...[109 item(s), 48842 transaction(s)] done [0.04s].
sorting and recoding items ... [16 item(s)] done [0.01s].
creating bit matrix ... [16 row(s), 48842 column(s)] done [0.00s].
writing  ... [179 set(s)] done [0.00s].
Creating S4 object  ... done [0.00s].
    items                                     support  
[1] {native-country=United-States}            0.8974243
[2] {race=White}                              0.8550428
[3] {race=White,native-country=United-States} 0.7881127
[4] {workclass=Private}                       0.6941976
[5] {sex=Male}                                0.6684820


In [27]:

## Get the itemsets as a list
as(items(fsets.top5), "list")

## Get the itemsets as a binary matrix
as(items(fsets.top5), "matrix")

## Get the itemsets as a sparse matrix, a ngCMatrix from package Matrix.
## Warning: for efficiency reasons, the ngCMatrix you get is transposed 
as(items(fsets.top5), "ngCMatrix")

age=Young,age=Middle-aged,age=Senior,age=Old,workclass=Federal-gov,workclass=Local-gov,workclass=Never-worked,workclass=Private,workclass=Self-emp-inc,workclass=Self-emp-not-inc,...,native-country=Scotland,native-country=South,native-country=Taiwan,native-country=Thailand,native-country=Trinadad&Tobago,native-country=United-States,native-country=Vietnam,native-country=Yugoslavia,income=small,income=large
False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,False,False,False
False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,False,False,False
False,False,False,False,False,False,False,True,False,False,...,False,False,False,False,False,False,False,False,False,False
False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


109 x 5 sparse Matrix of class "ngCMatrix"
                                                   
age=Young                                 . . . . .
age=Middle-aged                           . . . . .
age=Senior                                . . . . .
age=Old                                   . . . . .
workclass=Federal-gov                     . . . . .
workclass=Local-gov                       . . . . .
workclass=Never-worked                    . . . . .
workclass=Private                         . . . | .
workclass=Self-emp-inc                    . . . . .
workclass=Self-emp-not-inc                . . . . .
workclass=State-gov                       . . . . .
workclass=Without-pay                     . . . . .
education=Preschool                       . . . . .
education=1st-4th                         . . . . .
education=5th-6th                         . . . . .
education=7th-8th                         . . . . .
education=9th                             . . . . .
education=10th       

In [11]:
rules <- apriori(AdultUCI,parameter = list(minlen=2, supp=0.30, conf=0.8))

rules.sorted <- sort(rules, by="lift")
inspect(rules.sorted)

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5     0.3      2
 maxlen target   ext
     10  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 14652 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[115 item(s), 48842 transaction(s)] done [0.05s].
sorting and recoding items ... [14 item(s)] done [0.01s].
creating transaction tree ... done [0.03s].
checking subsets of size 1 2 3 4 5 6 done [0.00s].
writing ... [504 rule(s)] done [0.00s].
creating S4 object  ... done [0.01s].
      lhs                                    rhs                                   support confidence      lift
[1]   {marital-status=Married-civ-spouse,                                                                      
       race=White,                                  

In [12]:
subset.matrix <- is.subset(rules.sorted,rules.sorted)
redundant <- apply(subset.matrix, 2, any)
rules.pruned <- rules.sorted[!redundant]