## Apriori Algorithm

Apriori algorithm works on the principle of Association Rule Mining.
Association rule mining is a technique to identify underlying relations between different items. This relationship can be a 
similarity between items on how frequently they are bought or how similar users bought it.

We will be looking on how the Apriori algorithm works with a python example.

In [1]:
#import required packages
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

In [2]:
#import/create data
data = {"transaction_id":["t1", "t1", "t2", "t3", "t3", "t4", "t4", "t5", "t6", "t6"], 
        "item_id":["bread", "milk", "bread", "eggs", "beer", "bread", "milk", "beer", "eggs", "bread"],
        "quantity": [1, 1, 2, 3, 1, 5, 1, 4, 2, 4]}
df = pd.DataFrame(data)
print(df)

  transaction_id item_id  quantity
0             t1   bread         1
1             t1    milk         1
2             t2   bread         2
3             t3    eggs         3
4             t3    beer         1
5             t4   bread         5
6             t4    milk         1
7             t5    beer         4
8             t6    eggs         2
9             t6   bread         4


### Data cleaning

In [3]:
#stripping any spaces in transaction_id
df['transaction_id'] = df['transaction_id'].str.strip() 
  
# Dropping the rows without any items number 
df.dropna(axis = 0, subset =['item_id'], inplace = True) 
df['item_id'] = df['item_id'].astype('str')

In [4]:
basket = df.groupby(['transaction_id', 'item_id'])['quantity'].sum().unstack().reset_index().fillna(0).set_index('transaction_id')
basket

item_id,beer,bread,eggs,milk
transaction_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
t1,0.0,1.0,0.0,1.0
t2,0.0,2.0,0.0,0.0
t3,1.0,0.0,3.0,0.0
t4,0.0,5.0,0.0,1.0
t5,4.0,0.0,0.0,0.0
t6,0.0,4.0,2.0,0.0


In [5]:
def hot_encode(x): 
    if(x<= 0): 
        return 0
    if(x>= 1): 
        return 1

basket_encoded = basket.applymap(hot_encode)
basket_encoded

item_id,beer,bread,eggs,milk
transaction_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
t1,0,1,0,1
t2,0,1,0,0
t3,1,0,1,0
t4,0,1,0,1
t5,1,0,0,0
t6,0,1,1,0


## Theory of Apriori Algorithm
There are 3 major components of Apriori algorithm.

### 1. Support
Frequency of occurence of a itemset.
Support (A) = (Transactions containing (A)) / (Total Transactions)
### 2. Confidence
It refers to the likelihood that an item B is also bought if item A is bought.

Confidence(A->B) = (Transactions containing both (A and B)) / (Transactions containing (A))
### 3. Lift
Lift refers to the increase in the ratio of the sale of B when A is sold.

Lift = (Confidence (A->B))/(Support (B))

#### Association rule by Lift
lift = 1 -> There is no association between A and B

lift < 1 -> A and B are unlikely to be bought together

lift > 1 -> greater the lift greater is the likelihood of buying both products together.

In [6]:
#Building the model
frq_items = apriori(basket_encoded, min_support = 0.05, use_colnames = True) 
  
# Collecting the inferred rules in a dataframe 
rules = association_rules(frq_items, metric ="lift", min_threshold = 1)
rules = rules.sort_values(['confidence', 'lift']) 
print(rules.head()) 

  antecedents consequents  antecedent support  consequent support   support  \
0      (eggs)      (beer)            0.333333            0.333333  0.166667   
1      (beer)      (eggs)            0.333333            0.333333  0.166667   
3     (bread)      (milk)            0.666667            0.333333  0.333333   
2      (milk)     (bread)            0.333333            0.666667  0.333333   

   confidence  lift  leverage  conviction  
0         0.5   1.5  0.055556    1.333333  
1         0.5   1.5  0.055556    1.333333  
3         0.5   1.5  0.111111    1.333333  
2         1.0   1.5  0.111111         inf  


In [7]:
#postprocessing
rules['antecedents'] = rules['antecedents'].apply(lambda x : ",".join(x))
rules['consequents'] = rules['consequents'].apply(lambda x : ",".join(x))
rules.drop_duplicates(subset=['antecedents', 'consequents'], inplace = True)
rules.reset_index(drop=True, inplace=True)
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,eggs,beer,0.333333,0.333333,0.166667,0.5,1.5,0.055556,1.333333
1,beer,eggs,0.333333,0.333333,0.166667,0.5,1.5,0.055556,1.333333
2,bread,milk,0.666667,0.333333,0.333333,0.5,1.5,0.111111,1.333333
3,milk,bread,0.333333,0.666667,0.333333,1.0,1.5,0.111111,inf


In [8]:
#Cross_sell in retail recommendation
group_by = df.groupby('transaction_id')['item_id'].apply(lambda x: ','.join(x))
indices = group_by.index.values.tolist()
frame = {"transaction_id":indices, "item_id":group_by}
result = pd.DataFrame(frame)
result.reset_index(drop=True, inplace = True)
result

Unnamed: 0,transaction_id,item_id
0,t1,"bread,milk"
1,t2,bread
2,t3,"eggs,beer"
3,t4,"bread,milk"
4,t5,beer
5,t6,"eggs,bread"


In [17]:
#cross_sell item extraction
cross_sell_items = result.merge(rules, left_on = "item_id",  right_on = "antecedents", how = "inner")
cross_sell_items.rename(columns = {"consequents":"cross_sell"}, inplace = True)
cross_sell_items

Unnamed: 0,transaction_id,item_id,antecedents,cross_sell,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,t2,bread,bread,milk,0.666667,0.333333,0.333333,0.5,1.5,0.111111,1.333333
1,t5,beer,beer,eggs,0.333333,0.333333,0.166667,0.5,1.5,0.055556,1.333333
