# Lecture 7.3 - Arules in R the `dplyr` way

### Review - Association Rules

Consider the rule $\{butter\} \rightarrow \{whole.milk\}$

  * $Support(\textrm{butter and milk}) = \frac{\textrm{# butter and milk transactions}}{\textrm{# total transactions}}$ 
  * $Support(\textrm{butter}) = \frac{\textrm{# butter transactions}}{\textrm{#
  total transactions}}$ 
  * $Confidence= \frac{Support(\textrm{butter and milk})}{Support(\textrm{butter})}$ 
  * $Lift= \frac{Confidence}{Support(\textrm{milk})}$ 
  

### Small example:  Compute the confidence and lift of {bread} -> {milk} 


<img width="350" src="https://github.com/WSU-DataScience/DSCI210_module_7_association_rules/raw/main/img/small_example.png">

Use `dyplr` to:  

  * mutate to compute joint transactions 
  * summarize to compute counts and percents 
  

### New example: investigate rule {butter} -> {milk} with `dplyr`

In [1]:
library(dplyr)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



In [2]:
groceries <- read.csv('https://github.com/WSU-DataScience/DSCI210_module_7_association_rules/raw/main/data/Groceries.csv')
head(groceries[, 1:4])

frankfurter,sausage,liver.loaf,ham
0,0,0,0
0,0,0,0
0,0,0,0
0,0,0,0
0,0,0,0
0,0,0,0


In [3]:
butter_milk <- groceries %>%
                select(butter, whole.milk)
head(butter_milk)

butter,whole.milk
0,0
0,0
0,1
0,0
0,1
1,1


#### Support(Butter): 2 steps

In [4]:
butter_milk %>%
  summarize(Nbutter = sum(butter)) %>% 
  mutate(support_butter = Nbutter/n())

Nbutter,support_butter
545,545


#### Support(Butter): all at once

In [5]:
butter_milk %>%
  summarize(support_butter = sum(butter)/n())

support_butter
0.05541434


#### Support of whole.milk

In [6]:
butter_milk %>%
  summarize(support_milk = sum(whole.milk)/n())

support_milk
0.255516


#### Support of {Butter and Milk}


Why `butter * whole.milk`? 

In [7]:
butter_milk %>%
  summarize(support_rule = sum(butter*whole.milk)/n())

support_rule
0.02755465


#### All together now (+ confidence and lift)

In [8]:
(groceries 
 %>% summarize(support_milk = sum(whole.milk)/n(),
               support_butter = sum(butter)/n(),
               support_rule = sum(butter*whole.milk)/n())
 %>% mutate(confidence = support_rule/support_butter,
            lift = confidence/support_milk))

support_milk,support_butter,support_rule,confidence,lift
0.255516,0.05541434,0.02755465,0.4972477,1.946053


#### Notes

* Must compute values before you use them
    * Supports before confidence
    * Confidence before lift

## <font color="red"> Exercise 7.3.1 </font>

Compute and interpret all interesting statistics for the rule $\{domestic\,eggs\}\rightarrow\{ham\}$

In [9]:
# Your code here

> Your interpretations here