In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from apyori import apriori

In [47]:
def inspect(results):
    lhs = [tuple(result[2][0][0])[0] for result in results]
    rhs = [tuple(result[2][0][1])[0] for result in results]
    support = [result[1] for result in results]
    confidence = [result[2][0][2] for result in results]
    lift = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, support, confidence, lift))

<h1><center> ASSOCIATION RULE </h1></center>

<h3><center> APRIORI ALGORITHM </h3></center>

A well known myth in the data science is the association rule between the deepers and beer. Indeed, analizing thousands of transactions, it has discovered that association. It's interesting that some markets, after the knowledge of that association rule, has started to take deepers along with the beer in order to make easier the experience of buying. However, others decided to put far away the two products in order to verify if the association is true. Moreover, you are invitated to buy other during the path.

There are association rules which are strong and weak. We want the former ones. Examples can take into account the liked movies or the transactions. 
The strongness of association rules $X \rightarrow Y$ is measured with the **support** and **confidence**:

$$
support(X) = \frac{\# \, units \, with \, X}{\# \, units}
$$

$$
confidence(X \rightarrow Y) = \frac{\# \, units \, with \, X \, \cap \, Y}{\# \, units \, with \,X}
$$

**Lift** give the information about how much the probability of a fact'll be modified if apropri knoledge is took into account.
$$
lift(X \rightarrow Y) = \frac{confidence(X \rightarrow Y)}{support(X)}
$$
For instance, I have the probability on the population that define how many people watch ex-machina. However, the same propability, but computed with the information that the user has watched Interstellar, change. So, the lift say to us the how many the probability change if apriori knowledge is introduced.

The **Apriori algorithm** is made of the following steps:
- Set a minimum support and confidence.
- Take all the subsets in transactions having higher support and minimum support. They can be a huge amount.
- Take all the rules of these subsets having higher confidence than the minimum threshold.
- Sort the rule by deceasing lift.

Reccomended system, the ones which suggest new purchases, like Amazon or Netflix use a sophisticated algorithm like that. They rely on what you've bought and through association rules, they suggest you several products. Moreover, the deal of buy one and you'll have another free relies on these techniques.

In [48]:
data = pd.read_csv('Dataset/Market_Basket_Optimisation.csv', header = None)

# reshape to the following steps.
transactions = []
for i in range(0, data.shape[0]):
    transactions.append([str(data.values[i,j]) for j in range(0, data.shape[1])])
    
# train the apriori model.
rules = apriori(transactions = transactions, # dataset.
               min_support =  0.003, # 3 times the product is purchased in a day * 7 days/data.shape[0] (common sense)
               min_confidence = 0.2, # rule of thumb: 0.8
               min_lift = 3, # measure the quality of a rule; based on experience.
               min_length = 2, # only one X and 
               max_length= 2) # one Y

# visualize the result.
results = list(rules); 

DATA = pd.DataFrame(inspect(results), columns = ['LHS', 'RHS', 'SUPPORT', 'CONFIDENCE', 'LIFT']); DATA
DATA.nlargest(n = 10, # best rules.
             columns = 'LIFT')

# we want to find the association rules with the highest LIFT since wen know that
# if someone buy X, it'll probably buy also Y.

Unnamed: 0,LHS,RHS,SUPPORT,CONFIDENCE,LIFT
3,fromage blanc,honey,0.003333,0.245098,5.164271
0,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
8,pasta,shrimp,0.005066,0.322034,4.506672
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
6,light cream,olive oil,0.0032,0.205128,3.11471


<h3><center> ECLAT ALGORITHM </h3></center>

The support factor remains one important parameter: indeed, confidence and lift aren't used.
It's made of the following steps:
- Set a minimum support.
- Take all the subsets in transactions having higher support than minimum support.
- Sort these subsets by decresing support.

In [53]:
# THE TRUE ALGORITHM DROP THE PARAMETERS FOR THE CONFIDENCE AND THE 
# LIFT BUT WE TAKE THEM IN ORDER TO HAVE A STRONGER RULES.
# HOWEVER, FINALLY ONLY RULES WITH THE HIGHEST SUPPORT WILL RETAIN.

DATA.nlargest(n = 10, # best rules.
             columns = 'SUPPORT')[['LHS', 'RHS', 'SUPPORT']]

# we want to find the association rules with the highest LIFT since wen know that
# if someone buy X, it'll probably buy also Y.

Unnamed: 0,LHS,RHS,SUPPORT
4,herb & pepper,ground beef,0.015998
7,whole wheat pasta,olive oil,0.007999
2,pasta,escalope,0.005866
1,mushroom cream sauce,escalope,0.005733
5,tomato sauce,ground beef,0.005333
8,pasta,shrimp,0.005066
0,light cream,chicken,0.004533
3,fromage blanc,honey,0.003333
6,light cream,olive oil,0.0032
