## Association Rule Mining

### Data set - http://cox.csueastbay.edu/~esuess/classes/Statistics_6620/Presentations/ml13/groceries.csv

### The priimary objective of the recommender system is to predict items that a customer may purchase in the future based on his/her purchases so far.
### Association rule finds combiation of items that frequently occur together in orders or baskets.(in retail context)
### Association rule considers all possible combination of items in the previous basket and computes various measures such as support, confidence and lift to identify rules with stronger association.
### Apriori algorithm  - it uses minimum support criteria to reduce the number of possible itemset combinations, to reduce computational requirements.

# Load the data set

In [7]:
all_txns = []
#open the file
with open('data\groceries.csv') as f:
    #read each line
    content = f.readlines()
    #Remove white space from the beginning and end of each line
    txns= [x.strip() for x in content]
    #Iterate through each lie and create a list of transactions
    for t in txns:
        # Each transaction will contain a list of item in the transaction
        all_txns.append(t.split(','))

In [8]:
all_txns[0:5]

[['citrus fruit', 'semi-finished bread', 'margarine', 'ready soups'],
 ['tropical fruit', 'yogurt', 'coffee'],
 ['whole milk'],
 ['pip fruit', 'yogurt', 'cream cheese', 'meat spreads'],
 ['other vegetables',
  'whole milk',
  'condensed milk',
  'long life bakery product']]

## Encoding the transactions

In [9]:
import pandas as pd
import numpy as np
from mlxtend.preprocessing import OnehotTransactions
from mlxtend.frequent_patterns import apriori, association_rules

In [16]:
import warnings
warnings.filterwarnings('ignore')

In [31]:
# initialise onehot transactions and transfor the data into one-hot encoding format
one_hot_encoding = OnehotTransactions()
one_hot_txns = one_hot_encoding.fit(all_txns).transform(all_txns).astype('int')
#Convert the matrix to Data frame
one_hot_txns_df = pd.DataFrame(one_hot_txns, columns=one_hot_encoding.columns_)



In [32]:
one_hot_txns_df.iloc[5:10,10:20]

Unnamed: 0,berries,beverages,bottled beer,bottled water,brandy,brown bread,butter,butter milk,cake bar,candles
5,0,0,0,0,0,0,1,0,0,0
6,0,0,0,0,0,0,0,0,0,0
7,0,0,1,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0


from mlxtend.preprocessing import TransactionEncoder
one_hot_encoding = TransactionEncoder()
one_hot_txns = one_hot_encoding.fit(all_txns).transform(all_txns)
#Convert the matrix to Data frame
one_hot_txns_df = pd.DataFrame(one_hot_txns, columns=one_hot_encoding.columns_)

one_hot_txns_df.head()

In [33]:
one_hot_txns_df.shape

(9835, 169)

## Generate Association rule

In [34]:
# Let's use minimum support of 0.02 means, theitemset is available atleast2% of all transactions.
frequent_itemset = apriori(one_hot_txns_df, min_support=0.02, use_colnames=True)

In [35]:
frequent_itemset.sample(10, random_state=90)

Unnamed: 0,support,itemsets
60,0.020437,"(whole milk, bottled beer)"
52,0.033859,(sugar)
89,0.035892,"(tropical fruit, other vegetables)"
105,0.021047,"(tropical fruit, root vegetables)"
88,0.03274,"(soda, other vegetables)"
16,0.058058,(coffee)
111,0.024504,"(shopping bags, whole milk)"
36,0.079817,(newspapers)
119,0.056024,"(whole milk, yogurt)"
55,0.071683,(whipped/sour cream)


### Note: The apriori algiorithm filters out itemsets which have minimum support of >2%.
### From the above table we can infer that whole milk and yogurt appear together in about 5.6% of the baskets.

In [36]:
# pass the item set to association rule
rules = association_rules(frequent_itemset, metric='lift', min_threshold=1)

In [37]:
rules.sample(5)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
80,(soda),(rolls/buns),0.174377,0.183935,0.038332,0.219825,1.195124,0.006258,1.046003
73,(whole milk),(pip fruit),0.255516,0.075648,0.030097,0.117788,1.557043,0.010767,1.047765
86,(yogurt),(rolls/buns),0.139502,0.183935,0.034367,0.246356,1.339363,0.008708,1.082825
60,(whole milk),(other vegetables),0.255516,0.193493,0.074835,0.292877,1.513634,0.025394,1.140548
75,(pork),(whole milk),0.057651,0.255516,0.022166,0.38448,1.504719,0.007435,1.20952


In [38]:
# Top ten rules
rules.sort_values('confidence', ascending=False)[0:10]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
122,"(yogurt, other vegetables)",(whole milk),0.043416,0.255516,0.022267,0.512881,2.007235,0.011174,1.52834
17,(butter),(whole milk),0.055414,0.255516,0.027555,0.497248,1.946053,0.013395,1.480817
25,(curd),(whole milk),0.053279,0.255516,0.026131,0.490458,1.919481,0.012517,1.461085
114,"(root vegetables, other vegetables)",(whole milk),0.047382,0.255516,0.023183,0.48927,1.914833,0.011076,1.457687
116,"(root vegetables, whole milk)",(other vegetables),0.048907,0.193493,0.023183,0.474012,2.44977,0.013719,1.53332
29,(domestic eggs),(whole milk),0.063447,0.255516,0.029995,0.472756,1.850203,0.013783,1.41203
109,(whipped/sour cream),(whole milk),0.071683,0.255516,0.032232,0.449645,1.759754,0.013916,1.352735
90,(root vegetables),(whole milk),0.108998,0.255516,0.048907,0.448694,1.756031,0.021056,1.350401
50,(root vegetables),(other vegetables),0.108998,0.193493,0.047382,0.434701,2.246605,0.026291,1.426693
32,(frozen vegetables),(whole milk),0.048094,0.255516,0.020437,0.424947,1.663094,0.008149,1.294636


## Findings: The probability that a customer buys (whole milk) , given he/she has bought (yogurt and other vegitables), is 0.51.
## These rules can be used to create stratergies to keep items together inthe store shelves or cross-selling