### Association rules

- People who but this also buy...
- People who watch this also watch...

### Apriori algorithm

- Support: Proportion of observations from sample space which follow a rule. 

> Example: Support(M) = (no. of users who watched movie M) / (total no. of users)

- Confidence: Proportion of observation which follow the rules, from a sample of people who follow one of the rules.

> Example: Confidence(M1->M2) = (no. of users who watched movies M1 and M2 / no. of users who watched movie M1)

- Lift: confidence/support. Lift is the metric of measuring the relavence of an association rule. Determines how strong the rule is.

> Example: Confidence(M1->M2)/Support(M2)


#### Steps:

 1. Set a minimum support and confidence. We are not going to consider rule below this support. <br>
 2. Take all subsets in transaction having higher support than minimum support. <br>
 3. Take all the rules of these subsets having higher confidence than minimum confidence. <br>
 4. Sort the rules by decreasing lift. <br>

### Importing libraries and dataset

In [21]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math

In [8]:
# indicates that dataset doesn't have a header
data = pd.read_csv('./resources/Datasets/Market_Basket_Optimisation.csv', header=None) 

In [9]:
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [11]:
len(data)

7501

### Preprocessing

In [43]:
# apyori expects list of transactions as input
transactions = []

for i in range(0, len(data)):
    transactions.append([str(data.values[i, j]) for j in range(0, 20)])

In [49]:
transactions = list(map(lambda transaction: list(map(lambda item: str(item), list(transaction))), list(data.values)))

### Training model

In [56]:
from apyori import apriori
rules = apriori(transactions, min_support=0.003, min_confidence=0.2, min_lift=3, min_length=2, max_length=2)

In [57]:
result = list(rules)
result

[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

### Visualisation and result interpretation


- ```javascript
ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]
```

> People who buy `light cream` have a 29% chance of buying `chicken`

- ```javascript
RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737
```

> The rule containing 'light cream' and 'chicken' chicken appears in 0.45% transactions