# Week 4: Learning Objectives

Towards the end of this lesson, you should be able to:
- write Python codes for association rule mining
- define different Python built-in functions


# Question 1

1.	A supermarket transaction database is given as follows. Suppose that the minimum support threshold is 43% (i.e. 3 transactions) and the minimum confidence is 90%:

TID|ITEMS
---|-----
100|1, 2, 3, 4
200|2, 4, 5
300|1, 2, 4, 5
400|1
500|1, 2, 3, 4, 5
600|1, 5
700|2, 3 

(a)	Find all frequent itemsets using the Apriori algorithm.

(b)	Find all Boolean association rules that satisfy the minimum confidence and the minimum support thresholds.

<center>
$\operatorname{conf}(A \rightarrow B)=\frac{\sup (A \cup B)}{\sup (A)}=\frac{\frac{A \cup B}{N}}{\frac{A}{N}}=\frac{A \cup B}{A}$
</center>

# Question 2

The following contingency table summarizes supermarket transaction data, where hot dogs refer to the transactions containing hot dogs, $\overline{hotdogs}$ refer to the transactions which do not contain hot dogs, hamburgers refer to the transactions containing hamburgers, and $\overline{hamburgers}$ refer to the transactions which do not contain hamburgers.

|           | hotdogs | $\overline{hotdogs}$ | $\sum{row}$ |
|-----------|--------|--------|-------|
| hamberger | 2000 |500|2500
|$\overline{hamberger}$ | 1000 |1500|2500
|$\sum{col}$ | 3000|1500|5000 

(a)	Suppose that the association rule “hot dogs $\implies$ hamburgers" is mined. Given a minimum support threshold of 25% and a minimum confidence threshold of 50%, is this association rule strong?

(b)	Based on the given data, is the purchase of hot dogs independent of the purchase of hamburgers? If not, what kind of correlation relationship exists between the two? 

## Import the Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori

## Importing the Dataset

In [None]:
store_data = pd.read_csv('store_data.csv', header=None)
store_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


## Data Preprocessing

The Apriori library we are going to use requires our dataset to be in the form of a list of lists, where the whole dataset is a big list and each transaction in the dataset is an inner list within the outer big list. Currently we have data in the form of a pandas dataframe. To convert our pandas dataframe into a list of lists, execute the following script:

In [None]:
records = []
for i in range(0, 7501):
    records.append([str(store_data.values[i,j]) for j in range(0, 20)])

## Applying Apriori Algorithm

You must set the parameters in the apriori algorithm. <br>


In [None]:
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)

## Check the output

In [None]:
len(association_results)

48

In [None]:
print(association_results[0])

RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)])


## Output in better presentation

In [None]:
cnt =0

for item in association_results:
    cnt += 1
    # first index of the inner list
    # Contains base item and add item
    pair = item[0] 
    items = [x for x in pair]
    print("(Rule " + str(cnt) + ") " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(round(item[1],3)))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(round(item[2][0][2],4)))
    print("Lift: " + str(round(item[2][0][3],4)))
    print("=====================================")

(Rule 1) light cream -> chicken
Support: 0.005
Confidence: 0.2906
Lift: 4.844
(Rule 2) mushroom cream sauce -> escalope
Support: 0.006
Confidence: 0.3007
Lift: 3.7908
(Rule 3) escalope -> pasta
Support: 0.006
Confidence: 0.3729
Lift: 4.7008
(Rule 4) herb & pepper -> ground beef
Support: 0.016
Confidence: 0.3235
Lift: 3.292
(Rule 5) tomato sauce -> ground beef
Support: 0.005
Confidence: 0.3774
Lift: 3.8407
(Rule 6) olive oil -> whole wheat pasta
Support: 0.008
Confidence: 0.2715
Lift: 4.1224
(Rule 7) shrimp -> pasta
Support: 0.005
Confidence: 0.322
Lift: 4.5067
(Rule 8) nan -> light cream
Support: 0.005
Confidence: 0.2906
Lift: 4.844
(Rule 9) chocolate -> frozen vegetables
Support: 0.005
Confidence: 0.2326
Lift: 3.2545
(Rule 10) cooking oil -> spaghetti
Support: 0.005
Confidence: 0.5714
Lift: 3.282
(Rule 11) mushroom cream sauce -> escalope
Support: 0.006
Confidence: 0.3007
Lift: 3.7908
(Rule 12) escalope -> nan
Support: 0.006
Confidence: 0.3729
Lift: 4.7008
(Rule 13) frozen vegetables 