<a href="https://colab.research.google.com/github/changsin/MIU_ML/blob/main/notebooks/03.apriori.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Apriori Algorithm

The notebook is taken from [Apriori Algorithm Explained | Association Rule Mining | Finding Frequent Itemset | Edureka](https://youtu.be/guVvtZ7ZClw)

Apriori algorithm is based on the monotonicity principle which says that a subset of a frequent itemset must also be a frequent itemset.

# Association Rules

- Support(X, Y) = P(X, Y) = $ \Large \frac{freq(A, B)}{N} $ - statistical significance: the percentage of baskets where the rule was true (both the antecedent and the consequent are present) 
- Confidence(X -> Y) = P(Y|X) = $ \Large \frac{P(X, Y)}{P(X)} $ = $ \Large \frac{Support(X, Y)}{Support(X)} $ - strength of the rule: How often items X and Y occurred together based on number of X occur(left item)
- Lift(X -> Y) = $ \Large \frac{P(X, Y)}{P(X) P(Y)} = \frac{P(Y|X)}{P(Y)} = \Large \frac{Confidence(X, Y)}{Support(Y)}$ - ratio of confidence: The lift/Correlation is a value between 0 and infinity: A lift value greater than 1 indicates that the rule body and the rule head appear more often together than expected, this means that the occurrence of the rule body has a positive effect on the occurrence of the rule head. 



# Setup
Clone and import libraries

In [1]:
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [2]:
#df = pd.read_excel('Online_Retail.xlsx')
df.head()

NameError: ignored

In [None]:
df['Description'] = df['Description'].str.strip()
df.dropna(axis=0, subset=['InvoiceNo'], inplace=True)
df['InvoiceNo'] = df['InvoiceNo'].astype('str')
df = df[~df['InvoiceNo'].str.contains('C')]
df

In [None]:
basket = (df[df['Country'] == 'France']
          .groupby(['InvoiceNo', 'Description'])['Quantity']
          .sum().unstack().reset_index().fillna(0)
          .set_index('InvoiceNo'))
basket

In [None]:
def encode_units(x):
  if x <= 0:
    return 0
  if x >= 1:
    return 1

basket_sets = basket.applymap(encode_units)
basket_sets.drop('POSTAGE', inplace=True, axis=1)
basket_sets

In [None]:
frequent_itemsets = apriori(basket_sets, min_support=0.07, use_colnames=True)
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1)
rules.head()

In [None]:
rules[(rules['lift'] >= 6) & (rules['confidence'] >= 0.8)]