# Types of Associative Learning Algorithms

* Associative rule learning is basically coming up with patterns (or rules). It is said that these type of models are mostly used for __market basket problems__.
* Its basically coming up with __"People who buys this, also buys.."__

* So in this notebook we will be talking about two algorithms. That is,
    1. Apriori Algorithm
    2. Eclat Algorithm

## 1. Apriori Algorithm

As the name suggest it uses some prior knowledge to comeup with rules. That's what Apriori algorith does.This is a __Greedy Algorithm__ meaning that it will look through all the possible combinations of rules. And it will can culate `Confidence`, `Support` and `Lift`.

* `Support(A,B) = Number of transactions contains A,B / Total Number of transactions`

* `Confidence(A->B) = Number of transactions contains A,B / Total Number of transactions contains A`

* `Lift(A->B) = Support(A,B) / Support(A) * Support(B)`

* If the lift > 1 B is likely to pair together with A.

* If the lift < 1 B is unlikely to pair together with A.

Following article is really great, Try it if any confusion occurs :
https://www.kdnuggets.com/2016/04/association-rules-apriori-algorithm-tutorial.html

<img  src="images/A_1.png"/>

In [1]:
# Okay now let's implement this

# importing the usual modules
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



In [81]:
# uncomment and run this when you need or you can run this in mini conda terminal as well
#!pip install efficient-apriori
#!pip install mlxtend

Collecting mlxtend
  Downloading mlxtend-0.21.0-py2.py3-none-any.whl (1.3 MB)
     ---------------------------------------- 1.3/1.3 MB 2.3 MB/s eta 0:00:00
Installing collected packages: mlxtend
Successfully installed mlxtend-0.21.0


In [88]:
# now for this we will be using a different module, so we have to install it by running the above cell
# from efficient_apriori import apriori # this thing is not working sooo #PS this also works

from mlxtend.frequent_patterns import apriori,association_rules
from mlxtend.preprocessing import TransactionEncoder

The apriori function expects data in a one-hot encoded pandas DataFrame. So lets convert our dataset into that,

read :
https://github.com/rasbt/mlxtend/blob/master/docs/sources/user_guide/frequent_patterns/apriori.ipynb
https://github.com/rasbt/mlxtend/blob/master/docs/sources/user_guide/frequent_patterns/association_rules.ipynb

In [82]:
# importing the dataset
data = pd.read_csv('data/Apriori/Market_Basket_Optimisation.csv',header=None)
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [83]:
data.shape

(7501, 20)

In [84]:
# now we cant use this as our input to our algorithm we have to reformat this to list that contains list of strings

# prepare the data

# list of transactions with NaN values
temp_arrays = data.iloc[:7501].to_numpy()

# Boolean arrays that helps filtering
bool_arrays = np.invert(data.iloc[:7501].isna().to_numpy())

def filter_transactions(messy_arrys,bool_arrys):
    """
    When same size array and a 2D boolean array is passed array is filtered using numpy array filtering
    """
    result =[]
    
    # if arrays are not of same length, cannot be done
    if len(messy_arrys) != len(bool_arrys):
        print('Arrays should be of same size')
        return
    else:
        for i in range(len(messy_arrys)):
            temp = messy_arrys[i]
            result.append(temp[bool_arrys[i]])
        return result

# clean transactions
transactions = filter_transactions(temp_arrays,bool_arrays)
transactions[1]

array(['burgers', 'meatballs', 'eggs'], dtype=object)

In [71]:
# this is for  understanding filtering path
arr = data.iloc[1:2].to_numpy().reshape(20,)

In [72]:
x = np.invert(data.iloc[1:2].isna().to_numpy().reshape(20,))
arr[x]

array(['burgers', 'meatballs', 'eggs'], dtype=object)

In [94]:
# now we can convert the dataset into the relavent format
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)
df.head().T

Unnamed: 0,0,1,2,3,4
asparagus,False,False,False,False,False
almonds,True,False,False,False,False
antioxydant juice,True,False,False,False,False
asparagus,False,False,False,False,False
avocado,True,False,False,True,False
...,...,...,...,...,...
whole wheat pasta,False,False,False,False,False
whole wheat rice,False,False,False,False,True
yams,True,False,False,False,False
yogurt cake,False,False,False,False,False


In [98]:
# So how do we decide the minimum support
# We need to include products that atleast sold 3 times a day, 7 days a week
# so if we consider a item like that support would be (since this dataset contains list of items people bought over a week)
3*7/7500

0.0028

In [97]:
# Now lets find the frequent items

frequent_itemsets = apriori(df,min_support=0.003,max_len=2,use_colnames=True)

frequent_itemsets

Unnamed: 0,support,itemsets
0,0.020397,(almonds)
1,0.008932,(antioxydant juice)
2,0.004666,(asparagus)
3,0.033329,(avocado)
4,0.004533,(babies food)
...,...,...
896,0.003200,"(turkey, tomato juice)"
897,0.006532,"(tomatoes, turkey)"
898,0.003200,"(tomatoes, vegetables mix)"
899,0.005999,"(tomatoes, whole wheat rice)"


In [102]:
from mlxtend.frequent_patterns import association_rules
# we can change the metrics and there minimum threshold and see
association_rules(frequent_itemsets, metric="lift", min_threshold=3.0)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(cottage cheese),(brownies),0.031862,0.033729,0.003466,0.108787,3.22533,0.002392,1.08422
1,(brownies),(cottage cheese),0.033729,0.031862,0.003466,0.102767,3.22533,0.002392,1.079026
2,(light cream),(chicken),0.015598,0.059992,0.004533,0.290598,4.843951,0.003597,1.325072
3,(chicken),(light cream),0.059992,0.015598,0.004533,0.075556,4.843951,0.003597,1.064858
4,(mushroom cream sauce),(escalope),0.019064,0.079323,0.005733,0.300699,3.790833,0.00422,1.316568
5,(escalope),(mushroom cream sauce),0.079323,0.019064,0.005733,0.072269,3.790833,0.00422,1.057349
6,(pasta),(escalope),0.015731,0.079323,0.005866,0.372881,4.700812,0.004618,1.468107
7,(escalope),(pasta),0.079323,0.015731,0.005866,0.07395,4.700812,0.004618,1.062867
8,(tomato juice),(fresh bread),0.030396,0.043061,0.004266,0.140351,3.259356,0.002957,1.113174
9,(fresh bread),(tomato juice),0.043061,0.030396,0.004266,0.099071,3.259356,0.002957,1.076227


In [105]:
# lets try this one too
from efficient_apriori import apriori

itemsets, rules = apriori(transactions, min_support=0.003,  min_confidence=0.2,max_length=2)
print(rules)

[{almonds} -> {burgers}, {almonds} -> {chocolate}, {almonds} -> {eggs}, {almonds} -> {french fries}, {almonds} -> {green tea}, {almonds} -> {milk}, {almonds} -> {mineral water}, {almonds} -> {spaghetti}, {avocado} -> {chocolate}, {avocado} -> {french fries}, {avocado} -> {milk}, {avocado} -> {mineral water}, {avocado} -> {spaghetti}, {bacon} -> {mineral water}, {bacon} -> {spaghetti}, {barbecue sauce} -> {eggs}, {barbecue sauce} -> {mineral water}, {black tea} -> {eggs}, {black tea} -> {milk}, {black tea} -> {mineral water}, {black tea} -> {spaghetti}, {blueberries} -> {mineral water}, {blueberries} -> {spaghetti}, {body spray} -> {french fries}, {body spray} -> {mineral water}, {brownies} -> {eggs}, {brownies} -> {french fries}, {brownies} -> {mineral water}, {brownies} -> {spaghetti}, {burgers} -> {eggs}, {burgers} -> {french fries}, {burgers} -> {green tea}, {ham} -> {burgers}, {burgers} -> {milk}, {burgers} -> {mineral water}, {burgers} -> {spaghetti}, {butter} -> {chocolate}, {but

You can read about this algorithm in here : https://efficient-apriori.readthedocs.io/en/latest/#efficient_apriori.apriori