#Apriori Algorithm

---



Apriori algorith is used to identify association rules. Antescedant lead to consequent. {Ante} --> {Conse}.

Use case is used in ecommerce or retail website, for example bread with butter is good combination.

Steps for Apriori Algorithm
1. High frequency items set identification
2. Rule formation on high frequency item set



###DataSet
Market data present in sources.

Reading the data from file using pandas

In [1]:
import pandas as pd
market_data = pd.read_csv("Market_Basket_Optimisation.csv",index_col=None,header=None)

Exploring data

In [2]:
market_data.shape

(7501, 20)

In [3]:
market_data.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
5,low fat yogurt,,,,,,,,,,,,,,,,,,,
6,whole wheat pasta,french fries,,,,,,,,,,,,,,,,,,
7,soup,light cream,shallot,,,,,,,,,,,,,,,,,
8,frozen vegetables,spaghetti,green tea,,,,,,,,,,,,,,,,,
9,french fries,,,,,,,,,,,,,,,,,,,


Every row in data is a transaction with list of items bought together. We need to convert the dataset to List of Lists.

In [4]:
transactions = []

for index, transaction in market_data.iterrows():
    cleaned_transaction = transaction[~transaction.isnull()].tolist()
    transactions.append(cleaned_transaction)

In [5]:
len(transactions)

7501

Create a list of frequent items using transactions.

In [6]:
itemSet = {x for transaction in transactions for x in transaction}

In [7]:
len(itemSet)

120

In [8]:
sortedItemSet = sorted(itemSet)

Convert itemset to frozen set

In [9]:
uniqueItems = list(map(lambda x: frozenset([x]),sortedItemSet))

In [10]:
uniqueItems

[frozenset({' asparagus'}),
 frozenset({'almonds'}),
 frozenset({'antioxydant juice'}),
 frozenset({'asparagus'}),
 frozenset({'avocado'}),
 frozenset({'babies food'}),
 frozenset({'bacon'}),
 frozenset({'barbecue sauce'}),
 frozenset({'black tea'}),
 frozenset({'blueberries'}),
 frozenset({'body spray'}),
 frozenset({'bramble'}),
 frozenset({'brownies'}),
 frozenset({'bug spray'}),
 frozenset({'burger sauce'}),
 frozenset({'burgers'}),
 frozenset({'butter'}),
 frozenset({'cake'}),
 frozenset({'candy bars'}),
 frozenset({'carrots'}),
 frozenset({'cauliflower'}),
 frozenset({'cereals'}),
 frozenset({'champagne'}),
 frozenset({'chicken'}),
 frozenset({'chili'}),
 frozenset({'chocolate'}),
 frozenset({'chocolate bread'}),
 frozenset({'chutney'}),
 frozenset({'cider'}),
 frozenset({'clothes accessories'}),
 frozenset({'cookies'}),
 frozenset({'cooking oil'}),
 frozenset({'corn'}),
 frozenset({'cottage cheese'}),
 frozenset({'cream'}),
 frozenset({'dessert wine'}),
 frozenset({'eggplant'}),

Define function to calculate frequentDataSet and a map with frequent item and value as support value. Support value will be calculate as (count of the item)/(length of complete dataset)

In [11]:
def createCountDict (dataSet, itemList):
  countDict = {}
  for item in itemList:
    for data in dataSet:
      if item.issubset(data):
        if item not in countDict:
          countDict[item] = 1
        else:
          countDict[item] = countDict[item] + 1
  return countDict

def frequentItemSet (dataSet, itemList, supportValue) :
  frequentItems = []
  supportDict = {}
  countDict = createCountDict(dataSet, itemList)
  totalTransaction = len(dataSet)
  for item in itemList:
    supportVal = countDict[item]/totalTransaction
    if supportVal >= supportValue:
      supportDict[item] = supportVal
      frequentItems.append(item)
  return frequentItems, supportDict

Call function on the data set

In [12]:
frequentItems, supportDict = frequentItemSet(transactions, uniqueItems, 0.1)

In [13]:
frequentItems

[frozenset({'chocolate'}),
 frozenset({'eggs'}),
 frozenset({'french fries'}),
 frozenset({'green tea'}),
 frozenset({'milk'}),
 frozenset({'mineral water'}),
 frozenset({'spaghetti'})]

In [14]:
supportDict

{frozenset({'chocolate'}): 0.1638448206905746,
 frozenset({'eggs'}): 0.17970937208372217,
 frozenset({'french fries'}): 0.1709105452606319,
 frozenset({'green tea'}): 0.13211571790427942,
 frozenset({'milk'}): 0.12958272230369283,
 frozenset({'mineral water'}): 0.23836821757099053,
 frozenset({'spaghetti'}): 0.17411011865084655}

## Generate high order frequent items
Now we need to create high order item combinations like {Milk,Bread}, {Milk, Bread, Butter}. Here order is number of items in combination.

Item will combined only if they have order-1 number of common elements. So {Milk, Bread} and {Milk, Tea} can be combined to form a higher order combination as they have {Milk} in common so 3 order item set can be  {Milk, Bread, Tea}. Similarly {Milk, Bread, Butter} and {Milk, Bread, Tea} be combined to form {Milk, Bread, Butter, Tea}. But {Milk, Bread} and {Tea, Toast} cannot be combined.

Keeping the items sorted helps in eliminating duplicates, {Bread, Butter, Milk} and {Bread, Milk, spaghetti} will not be combined to form 4th order item sets as they should have 2 common elements but order is also important so {Bread, Butter} does not match with {Bread, Milk}.

So in the code below we are taking subset of list till k-2 where k is order of the frequent item we need to generate. Current set length would be of length k-1 and number of common item length should be of (k-1)-1.

In [30]:
def createHigherOrderItemSet(itemset, k):
  highOrderDataSet = []
  # looping over itemset and comparing each element with other elements
  for i in range(len(itemset)):
    for j in range(i+1, len(itemset)):
      #fetch the list of elements that should be common
      l1 = sorted(list(itemset[i]))[:k-2]
      l2 = sorted(list(itemset[j]))[:k-2]
      #if lists are same append the union in highOrderDataSet
      if l1 == l2:
        highOrderDataSet.append(itemset[i] | itemset[j])
  return highOrderDataSet

In [33]:
secondOrder = createHigherOrderItemSet(frequentItems,2)
secondOrder

[frozenset({'chocolate', 'eggs'}),
 frozenset({'chocolate', 'french fries'}),
 frozenset({'chocolate', 'green tea'}),
 frozenset({'chocolate', 'milk'}),
 frozenset({'chocolate', 'mineral water'}),
 frozenset({'chocolate', 'spaghetti'}),
 frozenset({'eggs', 'french fries'}),
 frozenset({'eggs', 'green tea'}),
 frozenset({'eggs', 'milk'}),
 frozenset({'eggs', 'mineral water'}),
 frozenset({'eggs', 'spaghetti'}),
 frozenset({'french fries', 'green tea'}),
 frozenset({'french fries', 'milk'}),
 frozenset({'french fries', 'mineral water'}),
 frozenset({'french fries', 'spaghetti'}),
 frozenset({'green tea', 'milk'}),
 frozenset({'green tea', 'mineral water'}),
 frozenset({'green tea', 'spaghetti'}),
 frozenset({'milk', 'mineral water'}),
 frozenset({'milk', 'spaghetti'}),
 frozenset({'mineral water', 'spaghetti'})]

In [32]:
createHigherOrderItemSet(secondOrder,3)

[frozenset({'chocolate', 'eggs', 'french fries'}),
 frozenset({'chocolate', 'eggs', 'green tea'}),
 frozenset({'chocolate', 'eggs', 'milk'}),
 frozenset({'chocolate', 'eggs', 'mineral water'}),
 frozenset({'chocolate', 'eggs', 'spaghetti'}),
 frozenset({'chocolate', 'french fries', 'green tea'}),
 frozenset({'chocolate', 'french fries', 'milk'}),
 frozenset({'chocolate', 'french fries', 'mineral water'}),
 frozenset({'chocolate', 'french fries', 'spaghetti'}),
 frozenset({'chocolate', 'green tea', 'milk'}),
 frozenset({'chocolate', 'green tea', 'mineral water'}),
 frozenset({'chocolate', 'green tea', 'spaghetti'}),
 frozenset({'chocolate', 'milk', 'mineral water'}),
 frozenset({'chocolate', 'milk', 'spaghetti'}),
 frozenset({'chocolate', 'mineral water', 'spaghetti'}),
 frozenset({'eggs', 'french fries', 'green tea'}),
 frozenset({'eggs', 'french fries', 'milk'}),
 frozenset({'eggs', 'french fries', 'mineral water'}),
 frozenset({'eggs', 'french fries', 'spaghetti'}),
 frozenset({'eggs