<a href="https://colab.research.google.com/github/001ckk/Unsupervised-Learning-with-R/blob/main/Association.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Association Rules**

This section you are to create association rules that will allow you to identify relationships between variables in the dataset. You are provided with a separate dataset that comprises groups of items that will be associated with others. Just like in the other sections, you will also be required to provide insights for your analysis.


**Data Analysis Objectives**

Perform associative analysis on items sales data to identify relationships between items bought.


**Understanding context**

Carrefour Kenya seeks to undertake a project that will inform the marketing department on the most relevant marketing strategies that will result in the highest no. of sales (total price including tax).Associative analysis on sales data would be helpful in identifying customer purchase trends and behaviour.


**Experimental Design**

Problem definition
Data Loading
Exploratory data analysis
Implementation of the solution
Summary of findings

In [1]:
%load_ext rpy2.ipython

In [None]:
%%R
#Install necessary packages
install.packages('arules')
library(arules)
install.packages('arulesViz')
library(arulesViz)
install.packages("dplyr")
library(dplyr)

In [8]:
%%R
# Loading the data.
sales <- read.transactions('Supermarket_Sales_Dataset II.csv', sep = ',', header = TRUE)
sales
# There are 119 unique items.

transactions in sparse format with
 7500 transactions (rows) and
 119 items (columns)


In [10]:
%%R
# a statistical summary of the data 
summary(sales)
# this shows the frequency of the items bought.

transactions as itemMatrix in sparse format with
 7500 rows (elements/itemsets/transactions) and
 119 columns (items) and a density of 0.03287171 

most frequent items:
mineral water          eggs     spaghetti  french fries     chocolate 
         1787          1348          1306          1282          1229 
      (Other) 
        22386 

element (itemset/transaction) length distribution:
sizes
   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
1754 1358 1044  816  667  493  391  324  259  139  102   67   40   22   17    4 
  18   19 
   1    2 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   3.000   3.912   5.000  19.000 

includes extended item information - examples:
             labels
1           almonds
2 antioxydant juice
3         asparagus


In [19]:
%%R
support <- sort(itemFrequency(sales) * 100)
# % of items below 0.1
print(length(support[support < 0.1])/ 119 * 100)
# % of items below 0.2
print(length(support[support < 0.2])/ 119 * 100)

[1] 2.521008
[1] 3.361345


In [42]:
%%R
# A model based on association rules 1
# Support = 0.002 and Confidence = 0.6
r1 <- apriori(sales, parameter = list(supp = 0.002, conf = 0.6, maxlen=10))

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.6    0.1    1 none FALSE            TRUE       5   0.002      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 15 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[119 item(s), 7500 transaction(s)] done [0.01s].
sorting and recoding items ... [115 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.01s].
writing ... [57 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].


In [43]:
%%R
# Summary of rule1
summary(r1)

# support 0.002
# support 0.6
# 57 rules were generated 

set of 57 rules

rule length distribution (lhs + rhs):sizes
 3  4  5 
22 32  3 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.000   3.000   4.000   3.667   4.000   5.000 

summary of quality measures:
    support           confidence        coverage             lift       
 Min.   :0.002000   Min.   :0.6000   Min.   :0.002533   Min.   : 2.518  
 1st Qu.:0.002133   1st Qu.:0.6122   1st Qu.:0.003200   1st Qu.: 2.623  
 Median :0.002267   Median :0.6333   Median :0.003467   Median : 2.744  
 Mean   :0.002538   Mean   :0.6588   Mean   :0.003897   Mean   : 3.301  
 3rd Qu.:0.002800   3rd Qu.:0.6818   3rd Qu.:0.004133   3rd Qu.: 3.446  
 Max.   :0.005067   Max.   :0.9500   Max.   :0.008000   Max.   :11.975  
     count      
 Min.   :15.00  
 1st Qu.:16.00  
 Median :17.00  
 Mean   :19.04  
 3rd Qu.:21.00  
 Max.   :38.00  

mining info:
  data ntransactions support confidence
 sales          7500   0.002        0.6
                                                                   

In [44]:
%%R
# Sorting rule1 based on confidence
inspect(sort(r1, by="confidence", decreasing = T)[1:10])

     lhs                        rhs                 support confidence    coverage      lift count
[1]  {mushroom cream sauce,                                                                       
      pasta}                 => {escalope}      0.002533333  0.9500000 0.002666667 11.974790    19
[2]  {frozen vegetables,                                                                          
      olive oil,                                                                                  
      tomatoes}              => {spaghetti}     0.002133333  0.8421053 0.002533333  4.835980    16
[3]  {pancakes,                                                                                   
      soup,                                                                                       
      spaghetti}             => {mineral water} 0.002266667  0.7727273 0.002933333  3.243119    17
[4]  {frozen vegetables,                                                                          
      milk

In [45]:
%%R
# Support = 0.001 and Confidence = 0.8
r2 <- apriori(sales, parameter = list(supp = 0.001, conf = 0.8, maxlen=10))

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5   0.001      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 7 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[119 item(s), 7500 transaction(s)] done [0.01s].
sorting and recoding items ... [116 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.02s].
writing ... [73 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].


In [46]:
%%R
# Summary of rule2
summary(r2)

# support 0.001
# support 0.8
# 73 rules were generated 

set of 73 rules

rule length distribution (lhs + rhs):sizes
 3  4  5  6 
14 42 16  1 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.000   4.000   4.000   4.055   4.000   6.000 

summary of quality measures:
    support           confidence        coverage             lift       
 Min.   :0.001067   Min.   :0.8000   Min.   :0.001067   Min.   : 3.358  
 1st Qu.:0.001067   1st Qu.:0.8000   1st Qu.:0.001333   1st Qu.: 3.434  
 Median :0.001200   Median :0.8333   Median :0.001333   Median : 3.815  
 Mean   :0.001258   Mean   :0.8498   Mean   :0.001483   Mean   : 4.839  
 3rd Qu.:0.001333   3rd Qu.:0.8889   3rd Qu.:0.001600   3rd Qu.: 4.882  
 Max.   :0.002533   Max.   :1.0000   Max.   :0.002667   Max.   :12.744  
     count       
 Min.   : 8.000  
 1st Qu.: 8.000  
 Median : 9.000  
 Mean   : 9.438  
 3rd Qu.:10.000  
 Max.   :19.000  

mining info:
  data ntransactions support confidence
 sales          7500   0.001        0.8
                                                      

In [47]:
%%R
# Sort rules based on confidence
inspect(sort(r2, by='confidence', decreasing = T)[1:10])

     lhs                        rhs                 support confidence    coverage      lift count
[1]  {french fries,                                                                               
      mushroom cream sauce,                                                                       
      pasta}                 => {escalope}      0.001066667  1.0000000 0.001066667 12.605042     8
[2]  {ground beef,                                                                                
      light cream,                                                                                
      olive oil}             => {mineral water} 0.001200000  1.0000000 0.001200000  4.196978     9
[3]  {cake,                                                                                       
      meatballs,                                                                                  
      mineral water}         => {milk}          0.001066667  1.0000000 0.001066667  7.716049     8
[4]  {cake

In [48]:
%%R
# Creating a short association rule
short_rules <- apriori(sales, parameter = list(supp = 0.001, conf = 0.8, maxlen=4))

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5   0.001      1
 maxlen target  ext
      4  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 7 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[119 item(s), 7500 transaction(s)] done [0.01s].
sorting and recoding items ... [116 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.01s].
writing ... [56 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].


In [49]:
%%R
# Sort rules based on confidence
inspect(sort(short_rules, by='confidence', decreasing = T)[1:20])

     lhs                        rhs                 support confidence    coverage      lift count
[1]  {french fries,                                                                               
      mushroom cream sauce,                                                                       
      pasta}                 => {escalope}      0.001066667  1.0000000 0.001066667 12.605042     8
[2]  {ground beef,                                                                                
      light cream,                                                                                
      olive oil}             => {mineral water} 0.001200000  1.0000000 0.001200000  4.196978     9
[3]  {cake,                                                                                       
      meatballs,                                                                                  
      mineral water}         => {milk}          0.001066667  1.0000000 0.001066667  7.716049     8
[4]  {cake

In [50]:
%%R
# Get association rules associated to spaghetti
spaghetti_ruleset <- apriori(sales, parameter = list(supp=0.001, conf=0.8), appearance = list(default="lhs", rhs="spaghetti"))

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5   0.001      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 7 

set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[119 item(s), 7500 transaction(s)] done [0.01s].
sorting and recoding items ... [116 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.02s].
writing ... [16 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].


In [51]:
%%R
# Sort rule

inspect(spaghetti_ruleset)

#Items most bought before spaghetti are shrimp and salmon.

     lhs                     rhs             support confidence    coverage     lift count
[1]  {bacon,                                                                              
      pancakes}           => {spaghetti} 0.001733333  0.8125000 0.002133333 4.665965    13
[2]  {chicken,                                                                            
      protein bar}        => {spaghetti} 0.001200000  0.8181818 0.001466667 4.698594     9
[3]  {green tea,                                                                          
      ground beef,                                                                        
      tomato sauce}       => {spaghetti} 0.001333333  0.8333333 0.001600000 4.785605    10
[4]  {light cream,                                                                        
      mineral water,                                                                      
      shrimp}             => {spaghetti} 0.001066667  0.8888889 0.001200000 5.104645     8

In [40]:
%%R
# Get association rules associated to shrimp
shrimp_ruleset <- apriori(sales, parameter = list(supp=0.001, conf=0.8), appearance = list(default="lhs", rhs="shrimp"))

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5   0.001      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 7 

set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[119 item(s), 7500 transaction(s)] done [0.01s].
sorting and recoding items ... [116 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.02s].
writing ... [2 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].


In [41]:
%%R
# Sort rule

inspect(shrimp_ruleset)
#Customers purchase pasta before purchasing shrimp


    lhs                             rhs      support     confidence coverage   
[1] {milk, pasta}                => {shrimp} 0.001600000 0.8571429  0.001866667
[2] {eggs, mineral water, pasta} => {shrimp} 0.001333333 0.9090909  0.001466667
    lift     count
[1] 12.01602 12   
[2] 12.74427 10   


**Summary of findings**

*   Package mushroom sauce with spaghetti.
*   Stock pasta with mushroom cream sauce in the same section or next to each other could increase sales.
*   Arrange pasta and meat together, they are bought one after the other 






