# Market Basket Analysis using Apriori 

<br>
<br>
<br>
<br>
In this section, we try to do Market Basket Analysis Ascociation Rules using Apriori. Here, we use Manual (Credit by zhaopace@foxmail.com) and compared it using Python's module <i>apyori</i>

## 1. Using Manual (Credit : zhaopace@foxmail.com) modified by Andreas M

<br><br>
First, we define each required function. Noted that in apriori, we need confidence, and support. 

In [176]:
# -*- coding: utf-8 -*-
"""
Date: Created on 2017-10-14 09:44am Thursday  
Author: zhaopace@foxmail.com
Description: 
    An Effectively Python Implementation of Apriori Algorithm for Finding Frequent 
    Sets and Association Rules   

"""

from collections import defaultdict
import csv


class Apriori(object):
    def __init__(self, minSupp, minConf):
        """ Parameters setting
        """
        self.minSupp = minSupp  # min support (used for mining frequent sets)
        self.minConf = minConf  # min confidence (used for mining association rules)

    def fit(self, filePath):
        """ Run the apriori algorithm, return the frequent *-term sets. 
        """
        # Initialize some variables to hold the tmp result
        transListSet  = self.getTransListSet(filePath)   # get transactions (list that contain sets)
        itemSet       = self.getOneItemSet(transListSet) # get 1-item set
        itemCountDict = defaultdict(int)         # key=candiate k-item(k=1/2/...), value=count
        freqSet       = dict()                   # a dict store all frequent *-items set
        
        self.transLength = len(transListSet)     # number of transactions
        self.itemSet     = itemSet
        
        # Get the frequent 1-term set
        freqOneTermSet = self.getItemsWithMinSupp(transListSet, itemSet, 
                                             itemCountDict, self.minSupp)

        # Main loop
        k = 1
        currFreqTermSet = freqOneTermSet
        while currFreqTermSet != set():
            freqSet[k] = currFreqTermSet  # save the result
            k += 1
            currCandiItemSet = self.getJoinedItemSet(currFreqTermSet, k) # get new candiate k-terms set
            currFreqTermSet  = self.getItemsWithMinSupp(transListSet, currCandiItemSet, 
                                                   itemCountDict, self.minSupp) # frequent k-terms set
            
            
        #
        self.itemCountDict = itemCountDict # 所有候选项以及出现的次数(不仅仅是频繁项),用来计算置信度啊
        self.freqSet       = freqSet       # Only frequent items(a dict: freqSet[1] indicate frequent 1-term set )
        return itemCountDict, freqSet
            
            
    def getSpecRules(self, rhs):
        """ Specify a right item, construct rules for it
        """
        if rhs not in self.itemSet:
            print('Please input a term contain in the term-set !')
            return None
        
        rules = dict()
        for key, value in self.freqSet.items():
            for item in value:
                if rhs.issubset(item) and len(item) > 1:
                    item_supp = self.getSupport(item)
                    item = item.difference(rhs)
                    conf = item_supp / self.getSupport(item)
                    if conf >= self.minConf:
                        rules[item] = conf
        return rules
        
    
    def getSupport(self, item):
        """ Get the support of item """
        return self.itemCountDict[item] / self.transLength
        
        
    def getJoinedItemSet(self, termSet, k):
        """ Generate new k-terms candiate itemset"""
        return set([term1.union(term2) for term1 in termSet for term2 in termSet 
                    if len(term1.union(term2))==k])
    
        
    def getOneItemSet(self, transListSet):
        """ Get unique 1-item set in `set` format 
        """
        itemSet = set()
        for line in transListSet:
            for item in line:
                itemSet.add(frozenset([item]))
        return itemSet
        
    
    def getTransListSet(self, filePath):
        """ Get transactions in list format 
        """
        transListSet = []
        with open(filePath, 'r') as file:
            reader = csv.reader(file,delimiter=';')
            for line in reader:
                transListSet.append(set(line))  
        print(transListSet)
        return transListSet
                
    
    def getItemsWithMinSupp(self, transListSet, itemSet, freqSet, minSupp):
        """ Get frequent item set using min support
        """
        itemSet_  = set()
        localSet_ = defaultdict(int)
        for item in itemSet:
            freqSet[item]   += sum([1 for trans in transListSet if item.issubset(trans)])
            localSet_[item] += sum([1 for trans in transListSet if item.issubset(trans)])
        
        # Only conserve frequent item-set 
        n = len(transListSet)
        for item, cnt in localSet_.items():
            itemSet_.add(item) if float(cnt)/n >= minSupp else None
        
        return itemSet_



<br><br><br>
<b>NOTED : </b>For the program can run succesfuly, your data should be delimited by <b>'  ,  '</b> not <b>'  ;  '</b> (as in our case, with delimiter='  ;  ' the data could not be read properly)
<br><br>
After we define our function, we call it using algorithm below. Noted that this function can be called using cmd prompt (just uncomment the *** section. Cause we use Jupyter Notebook, we tweak this program to be able to work properly. After we run this program, we have to give our prefered confindence, support, and filepath input.

In [177]:
# -*- coding: utf-8 -*-
"""
Created on Mon Apr  9 10:05:45 2018

@author: zbj
"""

from optparse import OptionParser    # parse command-line parameters
from apriori import Apriori




#  Below code is used if we want to use prompt

##########################################################################
# ***

# # Parsing command-line parameters
#     optParser = OptionParser()
#     optParser.add_option('-f', '--file', 
#                          dest='filePath',
#                          help='Input a csv file',
#                          type='string',
#                          default=None)  # input a csv file
                         
#     optParser.add_option('-s', '--minSupp', 
#                          dest='minSupp',
#                          help='Mininum support',
#                          type='float',
#                          default=0.10)  # mininum support value
                         
#     optParser.add_option('-c', '--minConf', dest='minConf',
#                          help='Mininum confidence',
#                          type='float',
#                          default=0.40)  # mininum confidence value    
                         
#     optParser.add_option('-r', '--rhs', dest='rhs',
#                          help='Right destination',
#                          type='string',
#                          default=None)  # 

#     (options, args) = optParser.parse_args()       
        
#     # Get two important parameters
#     filePath = options.filePath
#     minSupp  = options.minSupp
#     minConf  = options.minConf
#     rhs      = frozenset([options.rhs])

###############################################################################
# If you use Jupyter Notebook (or spyder, etc) use this instead

minSupp = float(input('Input Minimal Support : '))
minConf = float(input('Input minimal Confidence : '))
rh = input('Input item : ')
rhs = frozenset([str(rh)])
filePath = 'DataPurchase.csv'

print("""Parameters: \n - filePath: {} \n - mininum support: {} \n - mininum confidence: {} \n - rhs: {}\n""".\
          format(filePath,minSupp,minConf, rhs))

objApriori = Apriori(minSupp, minConf)
itemCountDict, freqSet = objApriori.fit(filePath)
for key, value in freqSet.items():
    print('frequent {}-term set:'.format(key))
    print('-'*20)
    for itemset in value:
        print(list(itemset))
    print()

# Return rules with regard of `rhs`
rules = objApriori.getSpecRules(rhs)
print('-'*20)
print('rules refer to {}'.format(list(rhs)))
for key, value in rules.items():
    print('{} -> {}: {}'.format(list(key), list(rhs), value))


Input Minimal Support : 0.2
Input minimal Confidence : 0.2
Input item : Camera
Parameters: 
 - filePath: DataPurchase.csv 
 - mininum support: 0.2 
 - mininum confidence: 0.2 
 - rhs: frozenset({'Camera'})

frequent 1-term set:
--------------------
['Music Pad']
['Watch']
['Soap']
['Camera']
['Racket']
['Guitar']
['']

frequent 2-term set:
--------------------
['Music Pad', 'Watch']
['Camera', 'Guitar']
['Camera', 'Music Pad']
['Camera', 'Watch']

--------------------
rules refer to ['Camera']
['Guitar'] -> ['Camera']: 0.625
['Music Pad'] -> ['Camera']: 0.7142857142857143
['Watch'] -> ['Camera']: 0.6666666666666667


<br><br>
Using this section, we can get easily list of other items that is corelated to our item (corelated here means it had a chance to bought together). This algorithm is great to use in for example retailer with a lot of items, and want to know which item corelated to their item. 
<br>
Now, we tried to compared the result with Modules Apyori

## Using Module Apyori in Python

In [109]:
# !pip install apyori

Collecting apyori
  Using cached https://files.pythonhosted.org/packages/25/fd/0561e2dd29aeed544bad2d1991636e38700cdaef9530490b863741f35295/apyori-1.1.1.tar.gz
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py): started
  Building wheel for apyori (setup.py): finished with status 'done'
  Stored in directory: C:\Users\andrea064120\AppData\Local\pip\Cache\wheels\7b\2a\35\c0c3749c1a36d4f454ea22d8396e1b854b86340d63cbbb7949
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.1


In [178]:
import pandas as pd
import numpy as np

from apyori import apriori

data=pd.read_csv('DataPurchase.csv',delimiter=',')

<br><br>
To be able to use apriori, we need to convert our data to list
<br>

In [180]:
records=[]
for i in range(data.shape[0]):
    records.append([str(data.values[i,j]) for j in range(data.shape[1])])

<br><br>
Now specify our min cofindence and min support in apriori
<br>

In [181]:
association_rules = apriori(records,min_confidence=0.5,min_support=0.2)
association_results = list(association_rules)

In [182]:
Result=pd.DataFrame(columns=['Rule','Support','Confidence','Lift'])
for item in association_results:

    # first index of the inner list
    # Contains base item and add item
    pair = item[2] 
    for i in pair:
        a=str([x for x in i[0]])
        if i[3]!=1:
            Result=Result.append({'Rule':str([x for x in i[0]])+' --> '+str([x for x in i[1]]),'Support':item[1],
                                  'Confidence':i[2],'Lift':i[3]},ignore_index=True)
Result.sort_values('Confidence',ascending=False)

Unnamed: 0,Rule,Support,Confidence,Lift
1,['Music Pad'] --> ['Camera'],0.208333,0.714286,1.071429
4,['Music Pad'] --> ['Watch'],0.208333,0.714286,1.142857
3,['Watch'] --> ['Camera'],0.416667,0.666667,1.0
0,['Guitar'] --> ['Camera'],0.208333,0.625,0.9375
2,['Camera'] --> ['Watch'],0.416667,0.625,1.0


<br><br>
Here we see that our result compared to manual is simmiliar, but in apriori we get all the frequent bought together items compared to manual section above. Of Course, we can also customize this apriori to be able to show only the prefered item. Other than that, you see that apriori has more customization and also it automaticaly show Lift scores.
<br><br>
<b>Conclusion</b>

From this discovery, we find frequently bought together items in table above. We see that Music Pad-Camera, Music Pad- Watch, Watch-Camera, Guitar-Camera has high confidence, meaning that in our data, these rule is often occuring. But we see that Lift Score is close to 1, even there is some rules that has Lift less than one. It means that these rules is not strong enough or if people for example buy Guitar, it is unlikely that they also buy Camera.