# 5 - Association Rule Learning


## 5.1 - A Priori Rule Learning

There are situation in which two datas seem to be completely uncorrelated and yet they show some form of correlation. A priori algorithm serves to individuate a pattern in the data such that we can find evidence for sentences such as "people who did this also did that". 

Apriori algorithm has 3 parts

### Support

$$ \text{Support}(i) = \frac{\text{number of elements corresponding to } i}{ \text{total number of elements}} $$

### Confidence 

$$ \text{Confidence}(i \rightarrow j) = \frac{\text{number of elements corresponding to } i \text{ and } j}{ \text{number of elements corresponding to } i} $$

expresses the test of a rule (such as people that have seen movie a are also the people that have seen movide b).

### Lift

$$ \text{Lift}(i \rightarrow j) = \frac{\text{Confidence}}{\text{Support}}$$

If the lift is $>>1$ that shows a correlation between events $i$ and $j$ even though that might not be evident.

Step by step Procedure:

1- Set a minimum support and Confidence

2- Take all the subsets in transaction having higher support than minimum support

3- take all the rules of the subsets that have higher Confidence that minimum Confidence

4- Sort rules by decreasing Lift.

In [9]:
#importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn; seaborn.set()

#!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25ldone
[?25h  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=965479265a135f335005a99312946c84adbe3774fae3d32405354ec17d771bad
  Stored in directory: /Users/giacomolini/Library/Caches/pip/wheels/1b/02/6c/a45230be8603bd95c0a51cd2b289aefdd860c1a100eab73661
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [33]:
dataset = pd.read_csv('Part 5 - Association Rule Learning/Section 28 - Apriori/Market_Basket_Optimisation.csv', header=None)

# ====== recreate the data set with a suitable format for apriori function ===== #
transactions = []
for i in range(0,dataset.shape[0]):
  transactions.append([str(dataset.values[i,j]) for j in range (0,20) ])



In [44]:
from apyori import apriori

results = list(apriori(transactions= transactions, min_support= (21)/(7501), min_confidence= 0.2, min_lift=3, min_lenght=2, max_length=2))

print(results[0])
## min_length e max_length servono per delimitare il numero di elementi nelle liste (in questo caso il problema in question richiede di legare 1 elemento a 1 altro -> min e max =2)

RelationRecord(items=frozenset({'extra dark chocolate', 'chicken'}), support=0.0027996267164378083, ordered_statistics=[OrderedStatistic(items_base=frozenset({'extra dark chocolate'}), items_add=frozenset({'chicken'}), confidence=0.23333333333333334, lift=3.8894074074074076)])


In [49]:
def inspect(results):
    lhs = [tuple(result[2][0][0])[0] for result in results]
    rhs = [tuple(result[2][0][1])[0] for result in results]
    supports = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))

df_res = pd.DataFrame(inspect(results), columns=['LHS', 'RHS', 'Support', 'Confidence', 'Lift'])

print(df_res.sort_values(by='Lift'))

                    LHS          RHS   Support  Confidence      Lift
7           light cream    olive oil  0.003200    0.205128  3.114710
5         herb & pepper  ground beef  0.015998    0.323450  3.291994
2  mushroom cream sauce     escalope  0.005733    0.300699  3.790833
6          tomato sauce  ground beef  0.005333    0.377358  3.840659
0  extra dark chocolate      chicken  0.002800    0.233333  3.889407
8     whole wheat pasta    olive oil  0.007999    0.271493  4.122410
9                 pasta       shrimp  0.005066    0.322034  4.506672
3                 pasta     escalope  0.005866    0.372881  4.700812
1           light cream      chicken  0.004533    0.290598  4.843951
4         fromage blanc        honey  0.003333    0.245098  5.164271


## 6.2 ECLAT

Simplified version of the Apriori Model. It is a "people who do also do" kind of model as well. The simplification depends from the fact that we only have Support relative to two elements or more. It is like asking "how often the two items M = (m,n) are associated?". In this case we only consider and sort the elements according to their Support. 

In [50]:

dataset = pd.read_csv('Part 5 - Association Rule Learning/Section 29 - ECLAT/Market_Basket_Optimisation.csv', header=None)

# ====== recreate the data set with a suitable format for apriori function ===== #
transactions = []
for i in range(0,dataset.shape[0]):
  transactions.append([str(dataset.values[i,j]) for j in range (0,20) ])


In [52]:
from apyori import apriori
results = list(apriori(transactions= transactions, min_support= (21)/(7501), min_confidence= 0.2, min_lift=3, min_lenght=2, max_length=2))

def inspect_eclat(results):
    lhs = [tuple(result[2][0][0])[0] for result in results]
    rhs = [tuple(result[2][0][1])[0] for result in results]
    supports = [result[1] for result in results]
    return list(zip(lhs, rhs, supports))

df_res = pd.DataFrame(inspect_eclat(results), columns=['LHS', 'RHS', 'Support'])

print(df_res.sort_values(by='Support'))

                    LHS          RHS   Support
0  extra dark chocolate      chicken  0.002800
7           light cream    olive oil  0.003200
4         fromage blanc        honey  0.003333
1           light cream      chicken  0.004533
9                 pasta       shrimp  0.005066
6          tomato sauce  ground beef  0.005333
2  mushroom cream sauce     escalope  0.005733
3                 pasta     escalope  0.005866
8     whole wheat pasta    olive oil  0.007999
5         herb & pepper  ground beef  0.015998
